Why my numba+numpy implementation is faster than c+cffi?

animator · August 5, 2020, 4:58pm

Hi Everyone,
I am currently working on an image processing code for a project which was previously written in C and executed in Python using cffi. My implementation uses numba+numpy and I am surprised to find it 4x faster than the current C + cffi execution.
I previously encountered similar behaviour when a numba+numpy kmeans implementation was a bit faster than scikit-learn kmeans which I did not pay much heed to as my implementation was not exactly the same.
Can someone guide me in this regard as I am unable to explain how this implementation is faster than the C one. Here are some questions which are currently going through my mind:

Is it theoretically possible than numba+numpy implementation is faster than the corresponding C/C++ codes?
Is it something to do with llvmlite?
Is there some overhead due to the foreign function interface which can lead to this execution difference?

I shall be grateful if someone can thrown some light on this matter.

Thanks and Regards,
Ankit

luk-f-a · August 5, 2020, 9:57pm

hi, interesting question!

a few things off the top of my head:

if you completely re-wrote the code, I think it’s easy to introduce differences that generate better performance.
depending how you call the code, you might be going through python and adding some overhead (I don’t know much about cffi though)
many C compilers set the default to -O0, while Numba’s default is -O3. If the C code was not compiled with optimizations enabled, it can be many times slower.
jit compilers like numba get very specific information about the objects they work with, and in some cases can use this information to generate more efficient code. Think of the difference between std::vector (variable size) and std::array (fixed size). Or between n-D arrays, and 2-d arrays. Code that can support n-D arrays is more general and less efficient that code that only supports 2-d arrays. Numba compiles functions for the exact number of dimensions of the inputs, so it can be more efficient in some cases. In other cases, the time spent in additional compilations might not pay off, and the generic code is faster than the specific code once compilation time is taken into account.

Without seeing the code it is impossible to say what the reason is, and even with the code it can be very hard. But I’m guessing you don’t care about the exact reasons, and more about potential reasons why this is happening.

Luk

animator · August 7, 2020, 12:04pm

Hi Luk,

Thanks for your answer. Really appreciate some of the insights you have put forward.

Regards,
Ankit

Topic		Replies	Views
Getting the outputted C code from a jitted function Support: How do I do ...?	1	303	February 13, 2022
Try to understand why numba is 4x faster than c++ code in the same task Community Support	4	1947	November 9, 2022
Why numpy ufunc is much faster? Support: How do I do ...?	1	264	November 16, 2023
Comparison between Numba and Fortran code Numba	3	348	December 13, 2023
A huge performance penalty for this simple function Numba	22	2444	January 28, 2023

Why my numba+numpy implementation is faster than c+cffi?

Related Topics