Changes in the behavior of numba functions?

DannyWeitekamp · October 24, 2023, 4:25pm

@Oyibo containers of heterogeneous data-types are not something you can expect out of numba. With numba you need to stick with statically typed objects, so you can’t expect to mix types within a container, at least not unless you want to get fancy like what we talked about in this discussion (although I wouldn’t recommend it):

If you want a growing sequence then typed.List is okay, but leaves a lot to be desired in terms of the speed of instantiating it, appending to it, and calling setitem / getitem. Especially if you are going to fill a typed.List or typed.Dict on the python side (outside of a jitted section) you are going to incur a lot of overhead. That being said, this might change in future versions of numba, there isn’t any permanent reason that these should be as slow as they are, the current implemented simply relies a bit too heavily on numba’s just-in-time machinery which adds more overhead than is strictly necessary.

Numpy arrays won’t grow dynamically (unless you resize them yourself like @nelson2005 suggested) but are a lot less quirky than typed.List / typed.Dict in terms of how they compile down to LLVM / machine code, meaning they tend to make for faster jitted code. They are also very fast in terms of interacting with them on the python-side so they’re a great choice when they can be used. Unfortunately since numpy arrays don’t grow dynamically they are not always a convenient all purpose solution, but in my experience 50% of the time you can know the size of your data-structures ahead of time in which case numpy arrays are a great choice. This is especially true if your project has progressed to the point where the whole program involves calling a single numba-compiled function.

As for preferring “primitive types” over records. This certainly seems like a reasonable intuition to expect pure native python data-types to be fast, but as soon as you start incorporating numba into the mix this intuition is going to get you in trouble because numba almost never uses the same data-representation as Python. As soon as you pass something into numba world it is ‘unboxed’ meaning it is translated into a numba-friendly representation, and when something is returned it is ‘boxed’ meaning that the numba-friendly representation is converted back to a Python representation. The reason this needs to happen is that in Python everything is a dynamically typed object which is a big no-no for a compiler which excels only when it has very precise data-type definitions. The reason to prefer numpy arrays as a good input/output format between numba and python is that they are kind of an exception to this. As @nelson2005 mentioned numpy arrays are already formatted in a way that is consistent with how one might allocate data in a C-program, so the boxing / unboxing is very minimal.

So to summarize, your #1 priority if you want numba to actually help performance instead of hurting it should be to reduce boxing/unboxing overhead. You need to reduce the number of calls into numba from Python. This includes calling things like typed.List.append from Python. But calling something like my_array[i][‘err’] = err from Python should be pretty minimal. After the initial process of passing data between python and numba your choice of data-structures will matter less.

As a final note, @nelson2005 has the right idea with his concat implementation above. If simple utility functions like these aren’t implemented don’t let that be a blocker. Any utility functions you write from scratch with numba are going to be about as fast as the native numpy implementation or faster.

Topic		Replies	Views
Heterogeneous data container with mutable elements in jitted code Support: How do I do ...?	12	2042	April 27, 2024
Mismatch specification error Support: How do I do ...?	23	942	October 18, 2023
What does it mean that argument types are unaligned? Support: How do I do ...?	19	622	October 20, 2023
C like struct in numba Community Support	14	3796	October 12, 2020
First class functions and structured arrays Support: What is this error message?	3	85	January 21, 2025

Changes in the behavior of numba functions?

Related topics