This is a contrived test case but, hopefully, it can suffice to convey the point and ask the question. Inside of an njit
function, I noticed that it is very costly to assign a locally computed value to an array element. Here are two example functions:
from numba import njit
import numpy as np
@njit
def slow_func(x, y):
result = y.sum()
for i in range(x.shape[0]):
if x[i] > result:
x[i] = result
else:
x[i] = result
@njit
def fast_func(x, y):
result = y.sum()
for i in range(x.shape[0]):
if x[i] > result:
z = result
else:
z = result
if __name__ == "__main__":
x = np.random.rand(100_000_000)
y = np.random.rand(100_000_000)
%timeit slow_func(x, y) # 177 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit fast_func(x, y) # 407 ns ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I understand that the two functions aren’t quite doing the same thing but let’s not worry about that for now. Notice that both functions are assigning result
to either x[i]
or to z
and the number of assignments are the same in both cases. However, the assignment of result
to z
is substantially faster. Is there a way to make the slow_func
as fast as the fast_func
?