I see a considerable performance reduction for a function with namedtuple as argument when using @njit decorator. Below is the minimum code showcase the problem.
from numba import njit
from collections import namedtuple
Batch = namedtuple("Batch", ["x", "y","z"])
batch = Batch(np.arange(0,1000000000,1),np.arange(0,1000000000,1),np.arange(0,1000000000,1))
x = batch.x
y = batch.y
z = batch.z
The time difference for executing the above function (test_numba_named_tuples) with and without @njit decorator is given below
with @njit decorator
51.1 µs ± 417 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
without @njit decorator
161 ns ± 1.24 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Can someone please help me understand the reason for the difference and possible workarounds for the same. (batch has to be immutable).
numba version : 0.55.1
Your function is way too simple to see a speedup with numba, it does virtually no work at all other than unpacking pointers from the tuple to the arrays.
I guess you used timeit to obtain the runtimes you give.
If you simply call
test_numba_named_tuples again and again (as timeit does), then numba has a lot more work to do than native python.
It always needs to switch the context from python to compiled code, which includes casting the function arguments to low level objects. Once the function has run, the result needs to be packaged up again for the python interpreter.
The actual unpacking of references is trivial, so no issue in pure Python either.
That’s why your compiled version runs slower. Generally I try to stay in “numbaland” for as long as possible before I return to native python functions. As long as you call compiled functions from other compiled functions, numba does not have to cross the threshold between interpreted and compiled code, only in the beginning and end of the compiled call chain.
You should see those timings shift once you actually do some number-crunching. (But even then numba may not outperform numpy in all cases if the code is sufficiently simple and the data is not enormous).