Changes in the behavior of numba functions?

I am rewriting the heapq to use numba with structured arrays. In one of the functions, I compare two floating point values (two errors) stored inside a structured array.

The functions work fine when I do not use @njit and I access the data using square brackets, i.e. arr['var']. When I decorate the exact same functions using @njit but this time I access the data using .dot notation, i.e. arr.var the functions behaves differently (they no longer gives me the right result).

I printed both of the values in both versions before comparing them and there are exactly the same.
The only difference I could find was is their type. Within the numba version functions the types returned were float32 in the undecorated functions the types were np.float32. Still this does not appear to explain the misbehavior of the functions.

Here is all that is needed to reproduce my results,

import numpy as np
import numba as nb

entryDtype = np.dtype([('error', 'f4'),
                      ('triangle_id', 'i8')])
entryNBtype = nb.from_dtype(entryDtype)

entry0 = np.record((0.75, 1), dtype=entryNBtype)
entry1 = np.record((0.5,  0), dtype=entryNBtype)
entry2 = np.record((-0.5, 2), dtype=entryNBtype)
entry3 = np.record((0.01, 3), dtype=entryNBtype)

def push(pq, loc, item):
    pq.append(item)
    print(pq) # debug
    _siftdown(pq, loc, 0, len(pq)-1)

def _siftdown(pq, loc, startpos, pos):
    newitem = pq[pos]
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = pq[parentpos]
        print('BEFORE: newitem.error, pos, parent.error, parentpos', (newitem['error'], pos, parent['error'], parentpos)) # debug
        if newitem['error'] < parent['error']:
            pq[pos] = parent 
            loc[parent['triangle_id']] = pos
            pos = parentpos
            print('\nAFTER: newitem.error, pos, parent.error, parentpos', (newitem['error'], pos, parent['error'], parentpos)) # debug
            continue
        break
    pq[pos] = newitem
    loc[newitem['triangle_id']] = pos
    print('\nafter ',pq) #debug

queue=[]
loc= {}
push(queue, loc, entry0)
push(queue, loc, entry1)
push(queue, loc, entry2)
push(queue, loc, entry3)

Here is the numba version that is not working properly,

import numpy as np
import numba as nb

entryDtype = np.dtype([('error', 'f4'),
                      ('triangle_id', 'i8')])
entryNBtype = nb.from_dtype(entryDtype)

entry0 = np.record((0.75, 1), dtype=entryNBtype)
entry1 = np.record((0.5,  0), dtype=entryNBtype)
entry2 = np.record((-0.5, 2), dtype=entryNBtype)
entry3 = np.record((0.01, 3), dtype=entryNBtype)

@njit
def push2(pq, loc, item):
    pq.append(item)
    print(pq) # debug
    _siftdown2(pq, loc, 0, len(pq)-1)

@njit
def _siftdown2(pq, loc, startpos, pos):
    newitem = pq[pos]
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = pq[parentpos]
        print('BEFORE: newitem.error, pos, parent.error, parentpos', (newitem.error, pos, parent.error, parentpos)) #debug
        if newitem.error < parent.error:
            pq[pos] = parent
            loc[parent.triangle_id] = pos
            pos = parentpos
            print('\nAFTER: newitem.error, pos, parent.error, parentpos', (newitem.error, pos, parent.error, parentpos)) #debug
            continue
        break
    pq[pos] = newitem
    loc[newitem.triangle_id] = pos
    print('\nafter ',pq)  #debug

queue2 = nb.typed.List.empty_list(entryNBtype)
loc2 = nb.typed.Dict.empty(nb.i8, nb.i8)
push2(queue2, loc2, entry0)
push2(queue2, loc2, entry1)
push2(queue2, loc2, entry2)
push2(queue2, loc2, entry3)

Why aren’t these two set of function behaving the same?

where is push2 used?
It might be helpful for interested folks if you include a single complete, runnable program that demonstrates your problem. By my count there are five sections of code in your post that need to be stitched together and there are no imports.

Thanks @nelson2005 that should now work well.

Didn’t run your your code but I notice that in the second version there is an if statement missing.

Thanks @DannyWeitekamp
That should be the same. I had some comments that I deleted to make it short and I must have deleted the if statement.
Now it should be correct.

Hey @MLLO ,

Numba implements numpy types like records, but their behavior may vary compared to the original type depending on the implementation. It appears that your Python function copies records by value, while the jitted function copies them by reference. As a result, your Python function can successfully swap the records, whereas the jitted function cannot.
Workarounds could be to adapt the function (i.e. copy records instead of referencing) or use basic types which will be copied by value (i.e. NamedTuple).
Here is a simple example. Swapping entries in a list works in pure Python but not with the jitted function.

import numpy as np
import numba as nb

entryDtype = np.dtype([('error', 'f4'), ('triangle_id', 'i8')])
entryNBtype = nb.from_dtype(entryDtype)

def swap_entries(entries):
    first_entry = entries[0]   # by reference or value?
    last_entry = entries[-1]  # by reference or value?
    entries[0] = last_entry
    entries[-1] = first_entry

jit_swap_entries = nb.njit(swap_entries)

entries = nb.typed.List([np.record((1.0, 1), dtype=entryNBtype),
                         np.record((9.0, 9), dtype=entryNBtype)])
print(entries)
swap_entries(entries)
print(entries)
# [(1., 1), (9., 9), ...]
# [(9., 9), (1., 1), ...]

entries = nb.typed.List([np.record((1.0, 1), dtype=entryNBtype),
                         np.record((9.0, 9), dtype=entryNBtype)])
print(entries)
jit_swap_entries(entries)
print(entries)
# [(1., 1), (9., 9), ...]
# [(9., 9), (9., 9), ...]
1 Like

Thanks @Oyibo !

This is an important behavior change!! I am still trying to figure out, how I will need to adapt my code to this
Your example illustrates exactly the heart of the issue. Though I am still trying to understand what the jitted function is actually doing (or not doing). Once I do, I hope then to find the right solution.
Changing to a namedtuple is not the solution as I would need to rework everything again (third time). I would prefer to stick to structured arrays though I was not expecting this behavior.
@Oyibo Following on your example, how would you force copy by value to occur?

It appears that numba throws a surprise at every corner.

Thanks again for pointing this out!

Hey @MLLO,

I understand your frustration with the “unexpected” behavior regarding structured arrays.
It’s important to be aware that Numpy structured array support in Numba is not “feature complete” at this time. This means that certain aspects of structured arrays in Numba may not fully align with what you might expect from native NumPy arrays, and using experimental or incomplete features may introduce inconsistencies in your code. It is your choice to use the appropriate data type.

It’s also worth noting that you’ve been working with snippets from the heapq module, which works well on primitive data types but may fail on more complex data types if it is not taken care of like in your case.
Numba, under the hood, uses array memory views to make operations more efficient, essentially creating some kind of a headless NumPy array for performance reasons. While this optimization can be powerful for operations on numerical data, there can be differences in their behavior.

I would encourage you to keep your criticism constructive. Numba is a valuable tool but it is far from perfect and sometimes even frustrating. It is your choice to use it or not.

A workaround to ensure copy-by-value semantics is to explicitly create a new structured array, fill in the data from the source object and extract the record if that is what you need.

Thanks @Oyibo
I imagine my last message sounded a bit more like a harsh criticism. I apologize for that.
Many of the examples provided in the documentation are rather basic so it is really challenging to have expectations about the time investment that some project might take. It is thanks to your generosity of time and that of others that one ultimately has to rely on (one cannot help but feeling that such generosity is often bused given the limited documentation). Regardless, this change of behavior in the referencing, seems to me as pretty important and it would be good to flag that.

Again, thanks for all your help!

Hey @MLLO ,

No worries.
Have you checked if maintaining a typed list of records is suitable for your heap queue algorithm.
It seems to me that there is too much overhead involved.
Is Numba required here? Can you use primitive types instead of records?

Edit:

  • The pure Python approach is slower by factor 2x (not too shabby)
  • Numba List of Tuples and List of Records are on the same level
  • The structured Array approach is not competitive yet. I have not found a way to concatenate the existing array and the new item inplace.
# =============================================================================
# Import
# =============================================================================
from typing import NamedTuple
import numpy as np
import numba as nb
import numba.types as nbt

# =============================================================================
# generate simulation data
# =============================================================================
entryDtype = np.dtype([('error', 'f4'), ('triangle_id', 'i8')])
entryNBtype = nb.from_dtype(entryDtype)
entryTup = nbt.Tuple((nbt.f4, nbt.i8))


class Entry(NamedTuple):
    error: np.float32
    triangle_id: np.int64


def get_random_data(size):
    error = np.random.default_rng(seed=123).normal(size=size).astype(np.float32)
    triangle_id = np.arange(size, dtype=np.int64)
    return error, triangle_id


def get_entries_pytup(size: int) -> list[tuple[np.float32, np.int64]]:
    entries = []
    for err, tri in zip(*get_random_data(size)):
        entries.append(Entry(err, tri))
    return entries


def get_entries_tup(size: int) -> list[tuple[np.float32, np.int64]]:
    entries = nb.typed.List.empty_list(entryTup)
    for err, tri in zip(*get_random_data(size)):
        entries.append((err, tri))
    return entries


def get_entries_listrec(size) -> nbt.List(entryNBtype):
    entries = nb.typed.List.empty_list(entryNBtype)
    for err, tri in zip(*get_random_data(size)):
        entries.append(np.record((err, tri), dtype=entryNBtype))
    return entries


def get_entries_struct(size) -> nbt.List(entryNBtype):
    return np.array(list(zip(*get_random_data(size))), dtype=entryNBtype)


# =============================================================================
# heap push in Python tuple
# =============================================================================
def pytup_push(pq, loc, item):
    pq.append(item)
    pytup_siftdown(pq, loc, 0, len(pq)-1)


def pytup_siftdown(pq, loc, startpos, pos):
    _error = 0
    _triangle_id = 1
    newitem = pq[pos]
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = pq[parentpos]
        if newitem[_error] < parent[_error]:
            pq[pos] = parent
            loc[parent[_triangle_id]] = pos
            pos = parentpos
            continue
        break
    pq[pos] = newitem
    loc[newitem[_triangle_id]] = pos

# =============================================================================
# heap push numba tuple
# =============================================================================


@nb.njit
def tup_push(pq, loc, item):
    pq.append(item)
    tup_siftdown(pq, loc, 0, len(pq)-1)


@nb.njit
def tup_siftdown(pq, loc, startpos, pos):
    _error = 0
    _triangle_id = 1
    newitem = pq[pos]
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = pq[parentpos]
        if newitem[_error] < parent[_error]:
            pq[pos] = parent
            loc[parent[_triangle_id]] = pos
            pos = parentpos
            continue
        break
    pq[pos] = newitem
    loc[newitem[_triangle_id]] = pos


# =============================================================================
# heap push in numba for list of records
# =============================================================================
@nb.njit
def listrec_push(pq, loc, item):
    pq.append(item)
    listrec_siftdown(pq, loc, 0, len(pq)-1)


@nb.njit
def listrec_siftdown(pq, loc, startpos, pos):
    newitem = copy_entry(pq[pos])
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = pq[parentpos]
        if newitem['error'] < parent['error']:
            pq[pos] = parent
            loc[parent['triangle_id']] = pos
            pos = parentpos
            continue
        break
    pq[pos] = newitem
    loc[newitem['triangle_id']] = pos


@nb.njit
def copy_entry(entry):
    dummy = np.empty(1, dtype=entryDtype)
    dummy['error'][...] = entry.error
    dummy['triangle_id'][...] = entry.triangle_id
    return dummy[0]


# =============================================================================
# heap push in numba for list of records
# =============================================================================
# @nb.njit
# def struct_push(pq, loc, item):
#     pq = concat_array_scalar(pq, item)
#     return struct_siftdown(pq, loc, 0, len(pq)-1)  # update pq


# @nb.njit
# def struct_siftdown(pq, loc, startpos, pos):
#     newitem = copy_entry(pq[pos])
#     while pos > startpos:
#         parentpos = (pos - 1) >> 1
#         parent = pq[parentpos]
#         if newitem['error'] < parent['error']:
#             pq[pos] = parent
#             loc[parent['triangle_id']] = pos
#             pos = parentpos
#             continue
#         break
#     pq[pos] = newitem
#     loc[newitem['triangle_id']] = pos
#     return pq


@nb.njit
def struct_push(pq, loc, item):
    pq = concat_array_scalar(pq, item)
    startpos = 0
    pos = len(pq)-1
    newitem = copy_entry(pq[pos])
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = pq[parentpos]
        if newitem['error'] < parent['error']:
            pq[pos] = parent
            loc[parent['triangle_id']] = pos
            pos = parentpos
            continue
        break
    pq[pos] = newitem
    loc[newitem['triangle_id']] = pos
    return pq


# @nb.njit
# def copy_entry(entry):
#     dummy = np.empty(1, dtype=entryDtype)
#     dummy['error'][...] = entry.error
#     dummy['triangle_id'][...] = entry.triangle_id
#     return dummy[0]


# @nb.njit
# def concat_arrays(a, b):
#     size = a.size + b.size
#     res = np.empty(size, dtype=entryDtype)
#     res[:a.size] = a
#     res[a.size:] = b
#     return res


@nb.njit
def concat_array_scalar(a, b):
    size = a.size + 1
    res = np.empty(size, dtype=entryDtype)
    res[:a.size] = a
    res[a.size] = b
    return res


# =============================================================================
# simulate
# =============================================================================
def simulate_pytup(data):
    entries = []
    loc = {}
    for item in data:
        pytup_push(entries, loc, item)
    return entries


@nb.njit
def simulate_tup(data):
    entries = nb.typed.List.empty_list(entryTup)
    loc = nb.typed.Dict.empty(nbt.i8, nbt.i8)
    for item in data:
        tup_push(entries, loc, item)
    return entries


@nb.njit
def simulate_listrec(data):
    entries = nb.typed.List.empty_list(entryNBtype)
    loc = nb.typed.Dict.empty(nb.i8, nb.i8)
    for item in data:
        listrec_push(entries, loc, item)
    return entries


@nb.njit
def simulate_struct(data):
    entries = np.empty(0, dtype=entryDtype)
    loc = nb.typed.Dict.empty(nb.i8, nb.i8)
    for item in data:
        entries = struct_push(entries, loc, item)
    return entries


# =============================================================================
# check data & warm-up
# =============================================================================
size = 10
data_pytup = get_entries_pytup(size)
data_tup = get_entries_tup(size)
data_listrec = get_entries_listrec(size)
data_struct = get_entries_struct(size)
print('\nInput')
print(np.array(data_pytup))
print(np.array(data_tup))
print(np.array(data_listrec).reshape(-1, 1))
print(data_struct.reshape(-1, 1))
res_pytup = simulate_pytup(data_pytup)
res_tup = simulate_tup(data_tup)
res_listrec = simulate_listrec(data_listrec)
res_struct = simulate_struct(data_struct)
print('\nResult')
print(np.array(res_pytup))
print(np.array(res_tup))
print(np.array(res_listrec).reshape(-1, 1))
print(np.array(res_struct).reshape(-1, 1))

# =============================================================================
# timeit
# =============================================================================
size = 10_000
data_pytup = get_entries_pytup(size)
data_tup = get_entries_tup(size)
data_listrec = get_entries_listrec(size)
data_struct = get_entries_struct(size)
%timeit simulate_pytup(data_pytup)
%timeit simulate_tup(data_tup)
%timeit simulate_listrec(data_listrec)
%timeit simulate_struct(data_struct)
# 10.2 ms ± 109 ”s per loop (mean ± std. dev. of 7 runs, 100 loops each)  Python List of NamedTuples
# 4.08 ms ± 19.1 ”s per loop (mean ± std. dev. of 7 runs, 100 loops each) Numba List of Tuples
# 4.44 ms ± 25.7 ”s per loop (mean ± std. dev. of 7 runs, 100 loops each) Numba List of Records
# 48.9 ms ± 545 ”s per loop (mean ± std. dev. of 7 runs, 10 loops each)   Numba Structured Array
# => Implementation of "concat_array_scalar" function is too slow and kills the Structured Array approach.
#    Room for improvement!!!
# => Pure Python list of NamedTuple is 2-3x slower than the Numba versions. The gap is not too big.
# => List of Numba Tuples is slightly faster thean Records but both solutions are on the same level.

I see multiple mistakes in your benchmark, which when fixed show that the numba implementation is ~2x faster.

  1. simulate_nb is calling push instead of jit_push meaning you are just running the python version of the algorithm but with typed.List and typed.Dict which are indeed slow if you use them outside of a jitted sections of code. When you fill or retrieve from these kinds of data-types from native python every single call to setitem and getitem incurs overhead, because numba needs to infer the types from the arguments and then check for a jitted implementation.

  2. simulate_nb should have an @njit applied to it. There is no good reason that in a real application you would not apply @njit to this, because you would be preventing the #1 thing that can be sped up with a compiler out of compiled sections of code: doing lots of stuff repeatedly in loops.

  3. data_nblrec should really be a numpy array. Filled like this:

def get_entries_nblrec(size):
    entries = np.empty(size, dtype=entryDtype)
    for i, (err, tri) in enumerate(zip(*get_random_data(size))):
        entries[i]['error'] = err
        entries[i]['triangle_id'] = tri
    return entries

Again, as per my comment from a previous thread, it is very, very slow to fill a typed.List outside of a jitted context. Which is why I’ve recommended using numpy arrays. Numpy is designed for atomic interactions between its internals and native python so the overhead of filling a numpy array from python is going to be much less than a typed.List. This change may not considerably change your benchmark but it would have an effect in any realistic application of this.

1 Like

Hey @DannyWeitekamp ,

Thank you very much for the observation.
I have corrected the code above. For simplicity reasons the comparison is now just between tuples and records. Unfortunately, there was not enough time to adjust the namedtuple.

  1. simulate_rec (before it was called simulate_nb) should have used the jitted push version. That was a typo. Missed that.
  2. The speed check was about the versions of the push function for tuples vs records. I thought the overhead might be too much using records compared to primitive types. Both simulate functions are now jitted functions. That should be a fairer comparison.
  3. I don’t know what the input looks like that’s why same input as output.

What data structures for efficiently managing a growing array with heterogeneous elements would you recommend? A list of records works but appears to be an unusual choice.

You can use a numpy array that’s grown like c++ std::vector, resulting in amortized constant time insertions.

1 Like

Hey @nelson2005 ,

That sounds great!
If mutuability of fields is not a big issue, what data structure would be your favorite regarding performance and ease of use in such a case (structref, structured array, named tuples
)?

If there’s more than one element, I really like structured array for a kind of ‘C array of struct’. It offers great memory locality if you’re processing the items sequentially. A structured array with just one element works pretty well for working around the limitations you noted regarding making a single value, and could be used as the temporary storage for a hypothetical swap function that was discussed in this thread.

Hey @Nelson,
So you are a structured array fan. Thanks for the feedback.
How do you utilize a ‘C array of struct’, via Cython?
I want to add a few more (small) pros about using Numpy structured arrays in Numba:
Numpy structured arrays provide basic join operations through numpy.lib.recfunctions. The features are not great but at least you could do joins, updates and upsert operations out of the box without using another library. Are you aware if Numba has tools for relational operations?
The Structured arrays are as close to dataframes as you can get. You can easily transition between structured arrays and i.e. Pandas DataFrames, simplifying your workflow.
They are easy to use and the produced code has a good readability.
I see where you are coming from.
I have not been able to find a good solution yet to concatenate structured arrays efficiently. Do you have a proposal that works more or less inplace?

I don’t use Cython, I guess my comment about a ‘C array of struct’ was a misguided attempt to describe the use-case of an array of efficiently packed structures. Sequential ‘row’ operations are about as fast as you’ll find.
I avoid Pandas in performance-sensitive work and haven’t used the recfunctions much. I also haven’t had need to concatenate them, though I imagine it’d be fairly straightforward. Maybe something like this would do:

import numba
import numpy as np
entryDtype = np.dtype([('error', 'f4'), ('triangle_id', 'i8')])

arr1 = np.zeros(2, dtype=entryDtype)
arr1[1] = (2.0, 3.0)
@numba.jit
def concat(a, b):
    res = np.empty(len(a) + len(b), dtype=a.dtype)
    lena = len(a)
    for i in range(len(res)):
        res[i] = a[i] if i < lena else b[i - lena]
    return res

print(concat(arr1, arr1))
1 Like

@Oyibo containers of heterogeneous data-types are not something you can expect out of numba. With numba you need to stick with statically typed objects, so you can’t expect to mix types within a container, at least not unless you want to get fancy like what we talked about in this discussion (although I wouldn’t recommend it):

If you want a growing sequence then typed.List is okay, but leaves a lot to be desired in terms of the speed of instantiating it, appending to it, and calling setitem / getitem. Especially if you are going to fill a typed.List or typed.Dict on the python side (outside of a jitted section) you are going to incur a lot of overhead. That being said, this might change in future versions of numba, there isn’t any permanent reason that these should be as slow as they are, the current implemented simply relies a bit too heavily on numba’s just-in-time machinery which adds more overhead than is strictly necessary.

Numpy arrays won’t grow dynamically (unless you resize them yourself like @nelson2005 suggested) but are a lot less quirky than typed.List / typed.Dict in terms of how they compile down to LLVM / machine code, meaning they tend to make for faster jitted code. They are also very fast in terms of interacting with them on the python-side so they’re a great choice when they can be used. Unfortunately since numpy arrays don’t grow dynamically they are not always a convenient all purpose solution, but in my experience 50% of the time you can know the size of your data-structures ahead of time in which case numpy arrays are a great choice. This is especially true if your project has progressed to the point where the whole program involves calling a single numba-compiled function.

As for preferring “primitive types” over records. This certainly seems like a reasonable intuition to expect pure native python data-types to be fast, but as soon as you start incorporating numba into the mix this intuition is going to get you in trouble because numba almost never uses the same data-representation as Python. As soon as you pass something into numba world it is ‘unboxed’ meaning it is translated into a numba-friendly representation, and when something is returned it is ‘boxed’ meaning that the numba-friendly representation is converted back to a Python representation. The reason this needs to happen is that in Python everything is a dynamically typed object which is a big no-no for a compiler which excels only when it has very precise data-type definitions. The reason to prefer numpy arrays as a good input/output format between numba and python is that they are kind of an exception to this. As @nelson2005 mentioned numpy arrays are already formatted in a way that is consistent with how one might allocate data in a C-program, so the boxing / unboxing is very minimal.

So to summarize, your #1 priority if you want numba to actually help performance instead of hurting it should be to reduce boxing/unboxing overhead. You need to reduce the number of calls into numba from Python. This includes calling things like typed.List.append from Python. But calling something like my_array[i][‘err’] = err from Python should be pretty minimal. After the initial process of passing data between python and numba your choice of data-structures will matter less.

As a final note, @nelson2005 has the right idea with his concat implementation above. If simple utility functions like these aren’t implemented don’t let that be a blocker. Any utility functions you write from scratch with numba are going to be about as fast as the native numpy implementation or faster.

1 Like

For what it’s worth, I chose to interpret this as 'many copies of the same heterogeneous data type (struct) in a container (like in a numpy structured array as the container) as opposed to random types in a container like a python list.

Hey @nelson2005 ,
thank you very much for the proposal. I tried a similar approach (see function simulate_struct using concat_array_scalar in the benchmark ). However, I found that the repetitive generation of arrays may be too costly and slower when compared to simply appending to lists.
@DannyWeitekamp, I initially had concerns that using a list of records might result in significantly slower performance compared to a list of tuples. The benchmark, however, contradicted my intuition, and it turns out my worries were unfounded.
@MLLO, your choice of “list of records” is completely fine. What this discussion has taught me is that it’s essential to minimize the interaction between Numba and Python data types to prevent unexpected behavior arising from boxing/unboxing overhead.
Thank you guys. Happy Weekend!