Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling report #20

Open
merbanan opened this issue Mar 28, 2015 · 2 comments
Open

Profiling report #20

merbanan opened this issue Mar 28, 2015 · 2 comments

Comments

@merbanan
Copy link

I tested a few samples while profiling and this is what I get on a few samples.

Master audio (XLL):

time seconds seconds calls us/call us/call name
24.42 0.21 0.21 37680 5.57 8.49 interpolate_sub32_fixed
17.44 0.36 0.15 7536 19.91 32.65 parse_frame_data
13.95 0.48 0.12 dcadec_waveout_write
12.79 0.59 0.11 602880 0.18 0.18 inverse_dct32_fixed
11.63 0.69 0.10 23145408 0.00 0.00 bits_get_signed_rice
6.98 0.75 0.06 8790544 0.01 0.01 bits_get_signed
3.49 0.78 0.03 5968286 0.01 0.01 bits_get
2.33 0.80 0.02 7536 2.65 2.65 interpolate_lfe_fixed_fir
2.33 0.82 0.02 7536 2.65 16.28 parse_frame
2.33 0.84 0.02 bits_get_unsigned_rice
1.16 0.85 0.01 1055040 0.01 0.01 bits_get_unsigned_vlc
1.16 0.86 0.01 7536 1.33 46.45 filter_hd_ma_frame
0.00 0.86 0.00 1521769 0.00 0.00 bits_get1
0.00 0.86 0.00 135648 0.00 0.00 xll_map_ch_to_spkr
0.00 0.86 0.00 120576 0.00 0.00 bits_skip
0.00 0.86 0.00 90434 0.00 0.00 ta_get_size
0.00 0.86 0.00 60900 0.00 0.00 bits_seek
0.00 0.86 0.00 45216 0.00 0.00 xll_get_lsb_width
0.00 0.86 0.00 37681 0.00 0.00 bits_init
0.00 0.86 0.00 30144 0.00 0.00 bits_check_crc
0.00 0.86 0.00 22609 0.00 0.00 bits_skip1
0.00 0.86 0.00 15073 0.00 0.01 read_frame
0.00 0.86 0.00 10528 0.00 0.00 bits_get_signed_linear
0.00 0.86 0.00 7536 0.00 0.00 bits_align1
0.00 0.86 0.00 7536 0.00 45.12 core_filter
0.00 0.86 0.00 7536 0.00 32.70 core_parse
0.00 0.86 0.00 7536 0.00 0.11 exss_parse
0.00 0.86 0.00 7536 0.00 0.00 reorder_samples
0.00 0.86 0.00 7536 0.00 0.00 xll_assemble_msbs_lsbs
0.00 0.86 0.00 7536 0.00 0.00 xll_filter_band_data
0.00 0.86 0.00 7536 0.00 16.28 xll_parse
0.00 0.86 0.00 22 0.00 0.00 ta_zalloc_size
0.00 0.86 0.00 17 0.00 0.00 ta_free
0.00 0.86 0.00 6 0.00 0.00 ta_alloc_size
0.00 0.86 0.00 5 0.00 0.00 interpolator_create

Core 96kHz:

time seconds seconds calls us/call us/call name
84.91 10.91 10.91 138360 78.86 78.86 interpolate_sub64_float
4.75 11.52 0.61 27672 22.05 28.27 parse_frame_data
4.20 12.06 0.54 dcadec_waveout_write
2.33 12.36 0.30 27672 10.84 20.08 parse_x96_frame_data
1.87 12.60 0.24 19098423 0.01 0.01 bits_get_signed_vlc
0.78 12.70 0.10 26956937 0.00 0.00 bits_get
0.31 12.74 0.04 19174096 0.00 0.00 bits_get_signed
0.23 12.77 0.03 4427520 0.01 0.01 bits_get_unsigned_vlc
0.16 12.79 0.02 8135577 0.00 0.00 bits_get1
0.16 12.81 0.02 27672 0.72 0.72 reorder_samples
0.08 12.82 0.01 166032 0.06 0.06 ta_get_size
0.08 12.83 0.01 27672 0.36 394.72 core_filter
0.08 12.84 0.01 27672 0.36 48.90 core_parse
0.04 12.85 0.01 dcadec_context_filter
0.04 12.85 0.01 dcadec_context_free_exss_info
0.00 12.85 0.00 138360 0.00 0.00 bits_skip
0.00 12.85 0.00 83016 0.00 0.00 bits_skip1
0.00 12.85 0.00 55345 0.00 0.07 read_frame
0.00 12.85 0.00 55344 0.00 0.00 bits_init
0.00 12.85 0.00 55344 0.00 0.00 bits_seek
0.00 12.85 0.00 27672 0.00 0.06 alloc_x96_sample_buffer
0.00 12.85 0.00 16 0.00 0.00 ta_zalloc_size
0.00 12.85 0.00 12 0.00 0.00 ta_free
0.00 12.85 0.00 6 0.00 0.00 ta_alloc_size
0.00 12.85 0.00 5 0.00 0.00 interpolate_sub64_float_init
0.00 12.85 0.00 5 0.00 0.00 interpolator_create

Core 48kHz:

time seconds seconds calls us/call us/call name
69.77 0.60 0.60 28884 20.77 20.77 interpolate_sub32_float
10.47 0.69 0.09 7221 12.46 21.97 parse_frame_data
9.30 0.77 0.08 dcadec_waveout_write
4.65 0.81 0.04 9415936 0.00 0.00 bits_get_signed
2.33 0.83 0.02 3399280 0.01 0.01 bits_get
1.16 0.84 0.01 1010940 0.01 0.01 bits_get1
1.16 0.85 0.01 7221 1.38 1.38 reorder_samples
1.16 0.86 0.01 dcadec_stream_read
0.00 0.86 0.00 808752 0.00 0.00 bits_get_unsigned_vlc
0.00 0.86 0.00 36105 0.00 0.00 bits_skip
0.00 0.86 0.00 36105 0.00 0.00 ta_get_size
0.00 0.86 0.00 21663 0.00 0.00 bits_skip1
0.00 0.86 0.00 14443 0.00 0.01 read_frame
0.00 0.86 0.00 14442 0.00 0.00 bits_init
0.00 0.86 0.00 7221 0.00 0.00 bits_seek
0.00 0.86 0.00 7221 0.00 83.10 core_filter
0.00 0.86 0.00 7221 0.00 22.13 core_parse
0.00 0.86 0.00 12 0.00 0.00 ta_zalloc_size
0.00 0.86 0.00 5 0.00 0.00 ta_alloc_size
0.00 0.86 0.00 4 0.00 0.00 interpolate_sub32_float_init
0.00 0.86 0.00 4 0.00 0.00 interpolator_create
0.00 0.86 0.00 4 0.00 0.00 ta_free

Not surprising the transform takes most of the time and frame parsing is next.

foo86 added a commit that referenced this issue Mar 28, 2015
Replace brute force floating point IDCT with ‘obfuscated’ version,
making floating point interpolation speed on par with fixed point.

References #20.
foo86 added a commit that referenced this issue Mar 28, 2015
Add ‘restrict’ keyword to input/output pointer types. Drop ‘inline’
keyword from static function declarations, the compiler is already
smart enough to inline them.

References #20.
@kasper93
Copy link
Contributor

0.00 0.86 0.00 7536 0.00 0.00 xll_filter_band_data

If we already sharing... On my XLL sample it spends more time there.

64-bit build:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 19.01     27.24    27.24 2918320721     0.01     0.01  bits_get_signed_rice
 16.28     50.57    23.33  4754970     4.91     8.09  interpolate_sub32_fixed
 13.94     70.54    19.97   950994    21.00    21.00  xll_filter_band_data
 10.58     85.70    15.16 76079520     0.20     0.20  idct_perform32_fixed
 10.03    100.07    14.37                             _mcount_private
  8.45    112.18    12.11                             parse_frame_data
  6.30    121.21     9.03                             __fentry__
  4.54    127.72     6.51   950994     6.85     6.85  dcadec_waveout_write
  2.21    130.88     3.16 988609488     0.00     0.00  bits_get_signed
  2.20    134.03     3.15   950994     3.31     3.31  interpolate_lfe_fixed_fir
  1.95    136.82     2.79                             filter_hd_ma_frame
  1.59    139.10     2.28                             parse_frame
  1.37    141.06     1.96 770208131     0.00     0.00  bits_get

32-bit build:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 18.70     40.89    40.89  4754970     8.60    17.00  interpolate_sub32_fixed
 18.26     80.82    39.93 76079520     0.52     0.52  idct_perform32_fixed
 17.90    119.97    39.15 2918320721     0.01     0.01  bits_get_signed_rice
 11.23    144.52    24.55   950994    25.82    25.82  xll_filter_band_data
  9.68    165.70    21.18                             _mcount_private
  8.97    185.32    19.62                             parse_frame_data
  3.48    192.94     7.62   950994     8.01     8.01  dcadec_waveout_write
  3.05    199.60     6.66 988609488     0.01     0.01  bits_get_signed
  2.58    205.25     5.65   950994     5.94     5.94  interpolate_lfe_fixed_fir
  1.99    209.60     4.35 770208131     0.01     0.01  bits_get
  1.36    212.57     2.97                             parse_frame
  1.21    215.21     2.64                             filter_hd_ma_frame

(only functions with > 1% of time)

EDIT: GCC 4.9.2, mingw-w64 3.3.0. dcadec compiled with unmodified settings apart from -pg ofc.
EDIT2: But to be honest I see no point in this issue. I'm sure foo89 can use profiler on his own.

DTS Core Audio: 5.1 ch, 48 kHz, 24 bit, 1536 kbps

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 40.78     28.16    28.16  3121245     9.02    11.30  interpolate_sub32_float
 16.03     39.23    11.07                             floor
 10.28     46.33     7.10 49939920     0.14     0.14  idct_perform32_float
  9.63     52.98     6.65                             parse_frame_data
  6.13     57.21     4.23   624249     6.78     6.78  dcadec_waveout_write
  5.36     60.91     3.70                             _mcount_private
  3.20     63.12     2.21   624249     3.54     3.54  interpolate_lfe_float_iir
  3.11     65.27     2.15 690138024     0.00     0.00  bits_get_signed
  2.94     67.30     2.03                             __fentry__
  1.59     68.40     1.10 443798127     0.00     0.00  bits_get

@ghost
Copy link

ghost commented Mar 28, 2015

Aren't compiler vendor and settings extremely important here? The project is 100% C code. So maybe you should post compiler version and optimization settings too.

Also interesting to finally see a case where the restrict keyword actually does something.

kasper93 added a commit to kasper93/dcadec that referenced this issue Mar 31, 2015
kasper93 added a commit to kasper93/dcadec that referenced this issue Mar 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants