# Numba Prange Not Working as Expected

Hello Everyone.
I have been working with my new project and i found one issue with prange feature of numba.
For the sake of reproduciblity, i have added the simple example below. setting `'parallel=False'` on` 'Main'` function yields faster result than` 'parallel=True.'`

``````from numba import jit
import numpy as np

@jit(nopython=True)
def Run(val):
N = 40
for i in range(N):
for i1 in range(N):
for i2 in range(N):
for i3 in range(N):
arr = np.asarray([1,1])

@jit(nopython=True, parallel=True)
def Main(arr):
for i in numba.prange(len(arr)):
Run(arr[i])

arr = list(range(0,30))
arr = np.asarray(arr)
Main(arr)
``````

I beleive prange is not leveraging the array slicing operation. Any help/suggestion on this.=

I think the example you provide is probably a little too simple, since nothing actually happens within the `Run` function that affects the final result. The input `val` isn’t used, neither are the loop-counters `i<x>` and `arr` is always the same regardless the iteration.

So the timings you experience might be due to the fact that the non-parallel option has more effective optimizations, since it’s easier to do for the non-parallel case. I’m just speculating about that, and have no idea what actually happens in either case.

In my experience, when creating a toy example, it’s best to make sure the inner most code actually does some calculation, and that result is returned/assigned. For example:

``````@numba.njit
def Run(val):
N = 40
for i in range(N):
for i1 in range(N):
for i2 in range(N):
for i3 in range(N):
val += val//(i3+1) + val//(i2+1)

def Main(arr):
for i in numba.prange(1, len(arr)):
Run(arr[i-1:i+1])

main_enable_par = numba.njit(parallel=False)(Main)
main_disable_par = numba.njit(parallel=True)(Main)

# compile once before running timeit
main_enable_par(arr.copy())
main_disable_par(arr.copy())

%timeit main_enable_par(arr.copy())
%timeit main_disable_par(arr.copy())
``````

This makes the parallel case run about 3x faster for me:

``````1 s ± 21.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
314 ms ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
``````

There will always be some overhead for the parallel case, so if the `Run` function becomes computationally easy (like `N=5`), the parallel case becomes 3x slower. Results probably vary depending on the amount of parallelization your hardware can do (cores/threads etc).

The above of course doesn’t rule out that prange is lacking some optimization.

1 Like

I had a similar experience just now, with numba.prange not providing the expected speedup (a factor 1.3 on a 4-core machine). With a little more testing, I find that I see a very small speedup on my laptop (even when freshly rebooted, and nothing else using significant CPU), while I get a more normal speedup on a linux desktop (a factor 5-6 on a 10-core machine).

Here is my code (a 1D diffusion solver):

``````import numpy as np
from numba import jit, prange

@jit(nopython=True, parallel=False)
def diffusion(Nt):
alpha = 0.49
x = np.linspace(0, 1, 100000000)
# Initial condition
C = 1/(0.25*np.sqrt(2*np.pi)) * np.exp(-0.5*((x-0.5)/0.25)**2)
# Temporary work array
C_ = np.zeros_like(C)
# Loop over time (normal for-loop)
for j in range(Nt):
# Loop over array elements (space, parallel for-loop)
for i in prange(1, len(C)-1):
C_[i] = C[i] + alpha*(C[i+1] - 2*C[i] + C[i-1])
C[:] = C_
return C

# Run once to just-in-time compile
C = diffusion(1)

# Check timing
%timeit C = diffusion(10)
``````