Where to optimize?

Thank you both @sschaer and @Brohrer for sharing your input! I think I should have been clearer on what things I already noticed / tried. That was my bad! Newbie Alert :smiley: I am interested in knowing if there is any way to change the code to make the compiled code optimal. So, there might be some small details / tricks that one can use to give the performance a boost.

@sschaer

Your code performs poorly for very small arrays because of the parallelism which causes overhead.

Right! So, one way is to avoid parallel computing when the array is short but I prefer to avoid trying it for now as there might be another way to optimize the compiled code without increasing the code complexity .

I assume the sawtooth-like pattern you observe comes from the fact that your implementation is broken for cases where “log-two of length” is even.

You correctly detected it! I should have mentioned it in my post. Cannot understand “what” is broken there though.

PS: There is support for scipy.fft.rfft in Numba compiled code: Rocket-FFT — a Numba extension supporting numpy.fft and scipy.fft
However, it may be a bit slower for 1D transforms since multi-threading is only supported over multiple dimensions.

First of all, kudos on the package!! I found it when I was searching for ways to leverage Numba. What I am looking for is to run fft for 1D array in parallel manner. I can see the parameter workers in scipy.fft.ftt; but, as you said, it does not affect the performance of computing fft for 1D array.

@Brohrer
Thanks Brandon! I did a dummy run and throw it away before timing the code execution. (I should have mentioned it in my post :smiley: My bad!)

Btw, regarding:

I’ve been bitten by the just-in-time compiling penalty before.

You are not alone! It happened to me before too :slight_smile:


I definitely need to revise my question, making the code shorter, and making the title clearer.

1 Like