Thank you both @sschaer and @Brohrer for sharing your input! I think I should have been clearer on what things I already noticed / tried. That was my bad! Newbie Alert I am interested in knowing if there is any way to change the code to make the compiled code optimal. So, there might be some small details / tricks that one can use to give the performance a boost.
Your code performs poorly for very small arrays because of the parallelism which causes overhead.
Right! So, one way is to avoid parallel computing when the array is short but I prefer to avoid trying it for now as there might be another way to optimize the compiled code without increasing the code complexity .
I assume the sawtooth-like pattern you observe comes from the fact that your implementation is broken for cases where “log-two of length” is even.
You correctly detected it! I should have mentioned it in my post. Cannot understand “what” is broken there though.
PS: There is support for
scipy.fft.rfft
in Numba compiled code: Rocket-FFT — a Numba extension supporting numpy.fft and scipy.fft
However, it may be a bit slower for 1D transforms since multi-threading is only supported over multiple dimensions.
First of all, kudos on the package!! I found it when I was searching for ways to leverage Numba. What I am looking for is to run fft for 1D array in parallel manner. I can see the parameter workers
in scipy.fft.ftt; but, as you said, it does not affect the performance of computing fft for 1D array.
@Brohrer
Thanks Brandon! I did a dummy run and throw it away before timing the code execution. (I should have mentioned it in my post My bad!)
Btw, regarding:
I’ve been bitten by the just-in-time compiling penalty before.
You are not alone! It happened to me before too
I definitely need to revise my question, making the code shorter, and making the title clearer.