Slow Array Element Assignment to Variable

This is a contrived test case but, hopefully, it can suffice to convey the point and ask the question. Inside of an njit function, I noticed that it is very costly to assign a locally computed value to an array element. Here are two example functions:

from numba import njit
import numpy as np

@njit
def slow_func(x, y):
    result = y.sum()
    
    for i in range(x.shape[0]):
        if x[i] > result:
            x[i] = result
        else:
            x[i] = result

@njit
def fast_func(x, y):
    result = y.sum()
    
    for i in range(x.shape[0]):
        if x[i] > result:
            z = result
        else:
            z = result

if __name__ == "__main__":
    x = np.random.rand(100_000_000)
    y = np.random.rand(100_000_000)

    %timeit slow_func(x, y)  # 177 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    %timeit fast_func(x, y)  # 407 ns ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I understand that the two functions aren’t quite doing the same thing but let’s not worry about that for now. Notice that both functions are assigning result to either x[i] or to z and the number of assignments are the same in both cases. However, the assignment of result to z is substantially faster. Is there a way to make the slow_func as fast as the fast_func?

hi!

I think what you’re seeing is the compiler being too clever. Since you don’t return z, it realized it can completely eliminate the operation. Add return z and you’ll see the time get much closer.
For sure they won’t be the same because x is heap allocated, and I’m guessing (not an expert here) z is stack allocated.

does it make sense?

Luk

1 Like

Yes, it does indeed. Is there any way or other tricks to make the array assignment any faster? This seems to be 2x slower than a Cython implementation.

in my limited experience, these comparisons are hard to build because different compilers could be applying different optimizations. I guess a very clever one could figure out that the if is unnecessary.

could you share the cython version?