Dear community,
Today, I would like to announce a major update of Rocket-FFT and take this opportunity to give you a glimpse behind the scenes.
Did you know that the speed of a fast Fourier Transform (FFT) on small arrays in Python is largely dominated by overhead? For numpy.fft.fft
and scipy.fft.fft
, a lot of time is spent parsing the arguments within Python, and there is additional overhead from the wrapper to the underlying FFT library.
NumPy uses the lightweight C version of the PocketFFT library with a C-extension wrapper, while SciPy uses the C++ version with a relatively thick PyBind11 wrapper. With Rocket-FFT, I also use the C++ library because it has significantly better performance on large data and comes with n-dimensional transforms.
The old Rocket-FFT wrapper to the FFT library was inspired by SciPy. This was a natural choice as it didn’t require making changes to the C++ library. However, it involved copying data from Numba array structs to C++ vectors.
I now made major modifications to the PocketFFT library to allow for views on the array struct fields. With its already fast parsing of function arguments and now improved wrapper structure, Rocket-FFT has notable performance benefits over SciPy, NumPy, and numba.objmode
for transformations up to around 500 data points.
Below, you can see how Rocket-FFT with its old and new interfaces compares to numpy.fft.fft
and scipy.fft.fft
within Python and jitted code using the object mode. The figures show the time spent performing 10,000 transforms on arrays of size 1 to 4,096 relative to the time spent with Rocket-FFT. For NumPy and SciPy, the loop was run in Python. However, the loop overhead does not significantly contribute to the runtime. For the object mode and Rocket-FFT, the loop was run within a jitted function.