Where to optimize?

Hi @NimaSarajpoor

Your code performs poorly for very small arrays because of the parallelism which causes overhead.

I assume the sawtooth-like pattern you observe comes from the fact that your implementation is broken for cases where “log-two of length” is even. Run for example this:

import numpy as np 
import matplotlib.pyplot as plt 

def fft(T): 
    ... 

for i in range(5, 18):
    x = np.linspace(0, 10*np.pi, num=2**i)
    y = np.sin(x)

    plt.figure()
    plt.title(str(i))
    plt.plot(fft(y))
    plt.plot(np.fft.rfft(y), ls="--")
    plt.xlim(0, 12)
    plt.show()

PS: There is support for scipy.fft.rfft in Numba compiled code: Rocket-FFT — a Numba extension supporting numpy.fft and scipy.fft
However, it may be a bit slower for 1D transforms since multi-threading is only supported over multiple dimensions.