Hi all,
I have a question rather than an issue - I was hoping that numba on my code will work much faster than it actually is, but maybe that’s my unrealistic expectations (or I did something wrong). It all runs, all right, just slower than I want.
A bit of background:
I am co-developing python package for whole-brain simulations and currently working on heterogeneous modelling. Typically, your brain network has something like ~80 “nodes” with internal dynamics and then you have a coupling in between them. This means that usually, you write your code like this:
@njit()
for t in time:
# resolve all coupling here
for n in node:
# compute internal dynamics for each node
However, the whole idea of heterogeneity means that each node can have different internal dynamics, hence you cannot use the inner for loop. I came up with an idea that: each node defines its dynamics as symbolic mathematics (think sympy
) in a list (each equations is a symbolic equation in a list) and then when integrating, all nodes’ equations are simply summed up into one rather big list of symbolic objects. Now, what I did was to prepare a string that looks like Euler intergations scheme:
NUMBA_EULER_TEMPLATE = """
def integrate(dt, n, max_delay, t_max, y0, input_y):
y = np.empty((n, t_max + max_delay + 1))
y[:] = np.nan
y[:, :max_delay + 1] = y0
for i in range(1, t_max + 1):
dy = np.array({dy_eqs})
y[:, max_delay + i] = y[:, max_delay + i - 1] + dt*dy
return y[:, max_delay + 1:]
"""
and finally, the {dy_eqs}
got formatted using the list of symbolic equations - you just literally print it into this string using sympy
hence you get something like:
def integrate(dt, n, max_delay, t_max, y0, input_y):
y = np.empty((n, t_max + max_delay + 1))
y[:] = np.nan
y[:, :max_delay + 1] = y0
for i in range(1, t_max + 1):
dy = np.array([1.0 + 0.508152554938578*(-y[0, max_delay + i - 1] + y[2, max_delay + i - 1]) + 4.0*y[0, max_delay + i - 1]**2 - 3.0*y[0, max_delay + i - 1]**3 - 1.5*y[0, max_delay + i - 1] - y[1, max_delay + i - 1] + input_y[0, i], 0.0 + 0.05*(y[0, max_delay + i - 1] - 0.5*y[1, max_delay + i - 1]) + input_y[1, i], 1.0 + 0.219292220770658*(y[0, max_delay + i - 1] - y[2, max_delay + i - 1]) + 4.0*y[2, max_delay + i - 1]**2 - 3.0*y[2, max_delay + i - 1]**3 - 1.5*y[2, max_delay + i - 1] - y[3, max_delay + i - 1] + input_y[2, i], 0.0 + 0.05*(y[2, max_delay + i - 1] - 0.5*y[3, max_delay + i - 1]) + input_y[3, i]])
y[:, max_delay + i] = y[:, max_delay + i - 1] + dt*dy
return y[:, max_delay + 1:]
(the above example is with 2 nodes, of course, when I go to 80 the definition is much longer, but it’s still just basic mathematical operations - the hardest thing is exponential) This function is then compiled using:
numba.njit(<signature here>, nopython=True)(FunctionType(numba_string))
it works all right, but is very slow. Now, my question is why is it slow: everything that is in the function itself is either already defined (np.array, int, float) and, moreover, I am providing a proper function signature so numba knows all the types and everything. When I do %prun in jupyter, the long spending functions are: ffi.py:111(__call__)
, {built-in method builtins.isinstance}
, ir.py:313(_rec_list_vars)
, os.py:676(__getitem__)
, {built-in method _abc._abc_instancecheck}
, etc… In short 9 of the 10 longest function calls are within numba - if I am not mistaken. My question is whether I am doing something wrong here and suboptimal, or whether this slowness is to be expected when I have such a long definition of one integration step.
Thank you very much!