Question about the performance of npconvolve using numba

Hello. Everyone at numba
I am new to numba
I decided to use numba to perform numpy’s npconvolve, and compared the execution time and result value in the following three sample cases.

import numpy as np
import scipy.signal as signal
from numba import jit, njit, prange

a = np.random.randn(20, 100000).astype(np.float64)
b = np.random.randn(1, 5000).astype(np.float64)
b = np.reshape(b, b.size)

#case 1
def fnConvolve_np(fv, wa):
pk_np =
for co in range(20):
result_array = np.convolve(np.abs(fv[co, :]) ** 2, wa)
rms_array = np.sqrt(np.abs(result_array))
pk_np.append(np.max(rms_array))
return pk_np

#case 2
@njit(parallel=True, fastmath=True, cache=True)
def fnConvolve_njit(fv, wa):
pk_njit =
for co in prange(20):
result_array = np.convolve(np.abs(fv[co, :]) ** 2, wa)
rms_array = np.sqrt(np.abs(result_array))
pk_njit.append(np.max(rms_array))
return pk_njit

#case 3
def fnConvolve_signal(fv, wa):
pk_signal =
for co in range(20):
result_array = signal.oaconvolve(np.abs(fv[co, :]) ** 2, wa)
rms_array = np.sqrt(np.abs(result_array))
pk_signal.append(np.max(rms_array))
return pk_signal

tnp = time.time()
print(np.round(fnConvolve_np(a, b), 3))
print('Numpy_Convolve : ', (time.time() - tnp))
print()

tnjit = time.time()
print(np.round(fnConvolve_njit(a, b), 3))
print('Numba_Numpy_Convolve : ', (time.time() - tnjit))
print()

tsignal = time.time()
print(np.round(fnConvolve_signal(a, b), 3))
print('Scipy_Convolve : ', (time.time() - tsignal))
print()

The results are as follows.
[21.145 21.053 20.675 23.183 21.052 21.414 20.921 23.087 21.451 21.865
21.396 23.123 20.198 22.001 22.593 20.22 21.443 22.427 22.155 21.204]
Numpy_Convolve : 1.8918583393096924

[21.396 20.22 21.145 21.414 23.123 21.443 20.921 21.053 20.198 20.675
23.087 22.427 22.001 23.183 21.451 22.155 22.593 21.052 21.865 21.204]
Numba_Numpy_Convolve : 9.279757499694824

[21.145 21.053 20.675 23.183 21.052 21.414 20.921 23.087 21.451 21.865
21.396 23.123 20.198 22.001 22.593 20.22 21.443 22.427 22.155 21.204]
Scipy_Convolve : 0.22791457176208496

In terms of execution time, scipy was the fastest, and numba was the slowest.
And in the performance results, scipy and numpy showed the same results, but only numba shows different results.
I can’t figure out what the problem is.

I expected that numba, which performs parallel processing, would be the fastest, but I was wrong.

Dear @snoopy

The problem is not in the parallelization, but in the implementation of the convolution in Numba. You can achieve the same speed as Scipy by doing the convolution in the frequency domain like Scipy does in your case:

import scipy.fft 
from numba import njit

@njit
def freq_domain_convolve(x, y):
    out_len = len(x) + len(y) - 1
    n = scipy.fft.next_fast_len(out_len, real=True)
    fft_x = scipy.fft.rfft(x, n=n)
    fft_y = scipy.fft.rfft(y, n=n)
    ifft_xy = scipy.fft.irfft(fft_x * fft_y)
    return ifft_xy[:out_len]

This is a dirty and fast implementation, so double check it if you decide to use it!

Note that the scipy.fft module is not supported by Numba. However, there is a package that enables this support: rocket-fft · PyPI

Hope this helps!

Edit: Regarding the seemingly incorrect results of your jitted function: the problem is that appending the list in parallel does not guarantee the correct order. Sort resulting list and you will find that the values match the ones from NumPy and SciPy.

1 Like

Thank you for your advice. I am learning a lot from your advice…