I’m here to present an efficient, lightweight, directly accessible RFFT library that you can use in numba without objmode. GitHub - falseywinchnet/bfft: Bruun Fast Fourier Transform · GitHub
Caveats:
- Power of two ONLY, float/double.
- RFFT/IRFFT and ODFT(half bin)/IODFT Only. If you want complex to complex, you can easily cobble it by just using two RFFT together.
- It is a teeny tiny bit slower than FFTW and it is not RISCV optimized
Positives:
- Natively optimized for X86 and for ARM platforms. as fast as VDSP on M4.
- about 3x faster than numpy fft as-is
- Very lightweight and efficient
- MIT licensed and written by a corn-fed millennial so you know its clean
- some tests done on SFDR, correctness that satisfied top floor engineering friends
To use:
import numpy as np
from numba import njit
import bfft.numba_support as bn
from bfft.numba_support import bfft_forward, ffi
N = 4096
plan, bins, work_n, scratch_n = bn.make_plan(N) # plan is an int address
@njit(cache=True)
def rfft_into(plan, x, out_f64, work, scratch_f64):
bfft_forward(plan,
ffi.from_buffer(x), ffi.from_buffer(out_f64),
ffi.from_buffer(work), ffi.from_buffer(scratch_f64))
x = np.random.randn(N)
out = np.empty(bins, np.complex128)
work = np.empty(work_n, np.float64)
scratch = np.empty(scratch_n, np.complex128)
rfft_into(plan, x, out.view(np.float64), work, scratch.view(np.float64))
# out == numpy.fft.rfft(x)
Two rules for Numba: pass the plan as the integer address from make_plan (Numba types an int, not a raw cffi pointer), and pass complex buffers as their real view (.view(np.float64), or .view(np.float32) for single precision). All four transforms work in both precisions: bfft_forward/bfft_inverse (rfft/irfft) and bodft_forward/bodft_inverse (half-bin “odd” DFT), plus _f32 variants. Caller-owned buffers ⇒ a multi-frame loop does each FFT with zero allocations and zero Python interaction.