This is the setup of the function from a shared object (C-library) :
from numba import guvectorize, float64, float32, int32
from numba.core import types, typing
from llvmlite import binding
binding.load_library_permanently("../../fitting_library.so")
return_type = types.int32
int_ty = types.CPointer(int32)
float_ty = types.CPointer(float32)
ret_and_arg_sig = typing.signature(return_type, float_ty, int_ty, int_ty,
types.int32, float_ty, types.int32, float_ty,
float_ty, float_ty, float_ty)
fit_gauss = types.ExternalFunction("fit_gauss", ret_and_arg_sig)
@guvectorize(...)
function_that_does_a_whole_lot(...):
...
fit_gauss(first_ndarray.ctypes, second_ndarray.ctypes,..., some_integer,...etc)
This works and gives correct results.
However, it is much slower than expected. I know how long an individual call to fit_gauss takes, it is 17 microseconds. If I multiply that with the number of times it is called, I get to just less than 3s.
However, the call to function_that_does_a_whole_lot now takes 19s more than without the call.
Does the call to fit_gauss somehow break the vectorization?