Rocket-FFT — a Numba extension supporting numpy.fft and scipy.fft

Dear Numba enthusiasts

a long time ago I wrote an extension for Numba to support numpy.fft and scipy.fft. Recently I rediscovered that others request this from time to time. I have never felt comfortable sharing something incomplete and imperfect. But despite this discomfort, I decided to finally make it available to everyone. You can download it on Github.

I often tend to be too verbose, so I’ll keep this short. There is a README on Github that explains a little more about the what/why/how.

If you use it, I encourage you to give me feedback and report bugs.
If the Numba development team finds it valuable to offer support for numpy.fft and scipy.fft and is looking for contributors, let me know.

2 Likes

Hi everyone

I would like to give a short update. Rocket-FFT has been freed from many limitations, compilation times have been sped up significantly and the test suite has been extended. You can now also get it from PyPI.

Unfortunately, the installation is sometimes still a little problematic. I am very happy about contributions and help with proper distribution, because I lack the experience.

1 Like

hey, even though I don’t use FFT in my daily work, I’ve seen many people asking for FFT access from numba so I’d like to say thank you for sharing this package.

Luk

Thanks Luk, Me neither. :wink: I wrote this about a year ago as a personal learning exercise. But then I found out that it could help others.

Really cool to see a fft library for numba! I was looking for something like this and used objmode so far.
Will check it out!

Dear community,

Today, I would like to announce a major update of Rocket-FFT and take this opportunity to give you a glimpse behind the scenes.

Did you know that the speed of a fast Fourier Transform (FFT) on small arrays in Python is largely dominated by overhead? For numpy.fft.fft and scipy.fft.fft, a lot of time is spent parsing the arguments within Python, and there is additional overhead from the wrapper to the underlying FFT library.

NumPy uses the lightweight C version of the PocketFFT library with a C-extension wrapper, while SciPy uses the C++ version with a relatively thick PyBind11 wrapper. With Rocket-FFT, I also use the C++ library because it has significantly better performance on large data and comes with n-dimensional transforms.

The old Rocket-FFT wrapper to the FFT library was inspired by SciPy. This was a natural choice as it didn’t require making changes to the C++ library. However, it involved copying data from Numba array structs to C++ vectors.
I now made major modifications to the PocketFFT library to allow for views on the array struct fields. With its already fast parsing of function arguments and now improved wrapper structure, Rocket-FFT has notable performance benefits over SciPy, NumPy, and numba.objmode for transformations up to around 500 data points.

Below, you can see how Rocket-FFT with its old and new interfaces compares to numpy.fft.fft and scipy.fft.fft within Python and jitted code using the object mode. The figures show the time spent performing 10,000 transforms on arrays of size 1 to 4,096 relative to the time spent with Rocket-FFT. For NumPy and SciPy, the loop was run in Python. However, the loop overhead does not significantly contribute to the runtime. For the object mode and Rocket-FFT, the loop was run within a jitted function.

7 Likes

Thanks for this, amazing work!

Is there a reason that for long FFT lengths, there is no notable change in runtime w/ and wo/ JIT?

For my applications I use FFT for sequences of lengths 2^16 ~ 2^20, and there’s no runtime improvement in those settings. This is also notable in your charts.

Hi @orzzamir

At their core, Scipy and Rocket-FFT use the same code. The difference is that Rocket-FFT reduces function call overhead (and allows to use the FFT within jitted code), which matters less when dealing with large arrays.

If you work with very large arrays, consider using mkl-fft. This is the fastest FFT you can get for CPU. You can also use it within jitted code with Numba’s objmode. The small overhead introduced doesn’t matter much on large array, but using objmode is bit less convenient.