Calling a function from a shared object from within guvectorized code works, but induces extra slowness

HannoSpreeuw · September 13, 2024, 3:01pm

This is the setup of the function from a shared object (C-library) :

from numba import guvectorize, float64, float32, int32
from numba.core import types, typing
from llvmlite import binding

binding.load_library_permanently("../../fitting_library.so")

return_type = types.int32
int_ty = types.CPointer(int32)
float_ty = types.CPointer(float32)
ret_and_arg_sig = typing.signature(return_type, float_ty, int_ty, int_ty,
                                   types.int32, float_ty, types.int32, float_ty,
                                   float_ty, float_ty, float_ty)
fit_gauss = types.ExternalFunction("fit_gauss", ret_and_arg_sig)


@guvectorize(...)
function_that_does_a_whole_lot(...):
     ...
     fit_gauss(first_ndarray.ctypes, second_ndarray.ctypes,..., some_integer,...etc)

This works and gives correct results.

However, it is much slower than expected. I know how long an individual call to fit_gauss takes, it is 17 microseconds. If I multiply that with the number of times it is called, I get to just less than 3s.
However, the call to function_that_does_a_whole_lot now takes 19s more than without the call.

Does the call to fit_gauss somehow break the vectorization?

HannoSpreeuw · September 16, 2024, 4:22pm

Would it make a difference if fit_gauss were registered as a first-class function - using WAP - and passed as an argument to function_that_does_a_whole_lot?

But not sure if that is possible, would a guvectorize decorator be able to handle anything different from numbers, such that you can provide a function as an argument?

HannoSpreeuw · September 18, 2024, 8:45am

I designed a test where I compiled a shared object with a function fit_gauss that does nothing except returning 0 immediately.
In that case there is no measurable delay, i.e. no extra slowness.

I tend to conclude that the slowness is not induced by any design flaw in Numba’s guvectorize when calling a function from an external library but rather that the extra computations from fit_gauss cause the CPU load to exceed some hardware specific limit.

Topic		Replies	Views
Using guvectorize inside a jitted function Support: How do I do ...?	11	1445	June 17, 2024
`guvectorized`: No performance difference between targets `cpu` and `parallel`? `cuda` even slower. `vectorize` faster? Support: How do I do ...?	5	1027	August 9, 2022
Cuda.jit vs guvectorize Support: How do I do ...?	0	785	June 17, 2021
Major slow down when adding one more layer of function Numba	3	219	September 28, 2023
Using Numpy arrays to @guvectorize function Support: How do I do ...?	0	156	March 6, 2024

Calling a function from a shared object from within guvectorized code works, but induces extra slowness

Related topics