I am in the process of optimizing a small function that is called millions of times during execution of my code and found that returning array elements as a tuple is 3-4 times faster than returning the actual array itself. My question therefore … is this real and if so why is this the case? I found this by chance, so is there perhaps even a faster way of returning (small) array results?
Toy example:
import numpy as np
from numba import njit, types, float64
@njit("types.UniTuple(float64, 3)(float64)", cache=True)
def foo_a(s):
r = np.zeros(3, dtype = float64)
return r[0], r[1], r[2]
@njit("float64[:](float64)", cache=True)
def foo_b(s):
r = np.zeros(3, dtype = float64)
return r
%timeit foo_a(0.0)
123 ns ± 0.662 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
%timeit foo_b(0.0)
423 ns ± 4.24 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)