Initial results obtained running the TDE multi array benchmark (jb/fftw/bm_time_delay_estimator_many) are not confirming our assumption:
For example:
float:aligned:single summary min=110us, p25=111us, p50=111us, p75=111us, p90=113us, p99=114us, p99.9=118us, max=198us, N=10000
float:aligned:many summary min=113us, p25=113us, p50=113us, p75=113us, p90=115us, p99=116us, p99.9=123us, max=250us, N=10000
So, further investigation is required.
We will start finding out answers for the following questions:
Are running more iterations in one case vs. the other?
Why one path is slower than the other? Use callgrind to find out
Did we make sure FFTW is given the right options with respect to memory alignment?
Is the test fair with respect to FFTW initialization?
Initial results obtained running the TDE multi array benchmark (jb/fftw/bm_time_delay_estimator_many) are not confirming our assumption:
For example:
float:aligned:single summary min=110us, p25=111us, p50=111us, p75=111us, p90=113us, p99=114us, p99.9=118us, max=198us, N=10000
float:aligned:many summary min=113us, p25=113us, p50=113us, p75=113us, p90=115us, p99=116us, p99.9=123us, max=250us, N=10000
So, further investigation is required.
We will start finding out answers for the following questions:
Are running more iterations in one case vs. the other?
Why one path is slower than the other? Use callgrind to find out
Did we make sure FFTW is given the right options with respect to memory alignment?
Is the test fair with respect to FFTW initialization?