Heterogeneous data container with mutable elements in jitted code

Hi all,

In my scientific computing code I often deal with statistical models with a large set of variables (say 20 to 50) that need to be passed into jitted functions. For maintainability and readability of my code I’d like to use an object/structure that acts as a data container that allows me to just pass the container to a function instead of the separate variables.

My minimum requirements for such a container are:

  • Works in nopython mode
  • Using the container instead of separate variables should not significantly impede performance (for example, want to avoid issues such as Structured arrays 10 times slower than two dimensional arrays in nopython mode · Issue #1067 · numba/numba · GitHub)
  • Access the elements in the container using keys, either as container.key or container[key] (slight preference for the former)
  • Container should allow for heterogeneous data types that are each supported in nopython mode (e.g. strings, floats, ints, numpy arrays)
  • The elements that correspond to a numerical type (float or numpy array) should be mutable, but the keys of the container itself can be fixed in advance

And the nice-to-haves:

  • Container should ideally be relatively stable (no experimental feature of Numba)
  • Container should ideally not be Numba-specific (so should work in vanilla Python code)
  • Container should be pickable/unpickable

So far the namedtuple checks virtually all of these boxes, except for mutable elements. As most of the elements that I want to change in my container are numpy arrays (of fixed dimension), this is often still not a real problem as I can replace the values of these numpy arrays in-place. To replace scalar float elements I need to jump through some more hoops but it is possible:

  • I can call the namedtuple’s _replace method, but that returns a new instance of the named tuple (not optimal inside a loop) and the _replace method does not work within jitted code, which is a no go.
  • I can replace the scalar float element with a 0d numpy array that I can access and update as container.scalar = np.array(0.0), container.scalar[()] = 3.14. However, this does feel like a bit of a hack.

My question is, is there a “better” alternative that checks all the boxes?

I am aware of jitclass and Typed Dict but both are still experimental, correct? In the past I also have experimented with the numpy structured arrays/record array, but wonder if Structured arrays 10 times slower than two dimensional arrays in nopython mode · Issue #1067 · numba/numba · GitHub is still an issue. If there are other pros and cons, or alternatives, for any of these containers then I would be interested to learn about those as well.

1 Like

I do this in my day job and have found structref works perfectly. Yes, it’s ‘experimental’ but it seems likely to be the future of such functionality based on my decidedly unscientific reading of numba traffic over the last 18 months or so.

Avoid jitclass as it’s the ‘past’ and doesn’t support caching. @DannyWeitekamp is the OG of structref, there’s a useful conversation about jitclass and structref here

@brunojacobs , I use recarrays a lot and you should take a few things into account:

  • the issue you link to is from 2015. I re-ran the code there and got a much smaller difference:
%timeit sub_plain(a, b, result)
371 ns ± 5.54 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit sub_rec_array(rec3)
291 ns ± 1.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit sub_array(vec3)
208 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
  • I also use named tuples regularly and you need to factor in that the unboxing is much slower than for arrays (2d and record). I suggest that you adapt the benchmark code in the 2015 issue to work with namedtuples to get a good comparison. The fact that someone reported an issue in 2015 does not mean named tuples are faster now.
  • structref is the future, although its API cannot be considered final at this stage. If performance is important to you, I suggest you also run the benchmark code. I suspect it won’t be faster than the 2d array, although I’d be curious to see the comparison with record arrays.
  • if you want the container to hold numpy arrays, this might become the defining feature. Record arrays can only hold fixed sized arrays, while structrefs can hold variable size arrays.

Thank you both @nelson2005 and @luk-f-a

Regarding the structref: I completely missed this development, thanks for pointing this out!


Regarding namedtuple: Was not aware that it had a large unboxing overhead, thanks for pointing that out.


Regarding the numpy recarray performance:

@luk-f-a interesting I am not able to replicate your results. I have pulled the code from Comparing the speed of accessing record arrays and two dimensional arrays using numba · GitHub and the only thing I have changed is replace @jit by @njit (and remove all the nopython arguments) and changed the Python2 print style statements to Python3. My updated gist can be found here: Extension to https://gist.github.com/ufechner7/95db14f734edd51dcd9b that includes namedtuple and structref benchmarks (and made Python3 compatible) · GitHub

For me the record array is indeed slower (about a factor 6.5) in this simple benchmark:

%timeit sub_plain(a, b, result)  # no Numba, separate input variables
617 ns ± 12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit sub_rec_array(rec3)  # Numba, numpy recarray as input with attribute lookup
1.75 µs ± 60.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit sub_array(vec3)  # Numba, numpy 2D array as input
257 ns ± 4.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit sub_rec_array_alt(rec3)  # Numba, numpy recarray as input with dict lookup
1.77 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is on numba 0.54.0, Python 3.9.6 and the other relevant libraries on their most recent versions. I work on a Macbook Pro. I wonder what is causing the discrepancy in our results…

I also added a namedtuple benchmark as you suggested:

from collections import namedtuple

NT = namedtuple('NT', ['v_wind', 'v_wind_gnd', 'result'])
nt3 = NT(
    v_wind=np.array([12., .0, .0]),
    v_wind_gnd=np.array([8., .0, .0]),
    result=np.array([4., .0, .0])
)
%timeit sub_namedtuple(nt3)  # Numba, namedtuple as input
1.55 µs ± 29.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

So also relatively slow compared to a regular numpy array (probably because of the unboxing), but still somewhat faster than the numpy record array on my machine.

Finally, added a structref:

%timeit sub_structref(sf3)  # Numba, structref as input
806 ns ± 23.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

So the structref is approximately 3 times as slow as the numpy array, but faster than both the namedtuple and recarray.


Some personal remarks:

  • For my applications I should be able to define the dimensions of all of my numpy arrays in advance, but not having to do so (like with a namedtuple and a structref) is a nice albeit minor benefit. In the end I could always write a wrapper function around the recarray constructor that takes as input my list of numpy arrays, and dynamically creates the corresponding recarray constructor.

  • If I can figure out and solve why attribute access using a recarray is indeed slower on my machine, I think that the recarray would be a viable alternative to the namedtuple as it solves some of the issues I have with the namedtuple (primarily mutability of scalars) and it is also more stable than the structref currently is.

  • Finally regarding performance: It is somewhat important to me, but code that is readable, manageable and stable is much more important to me. For example, if my code slows down by 5% but becomes easier to manage then I would take that trade off without thinking twice. In that light it is difficult for me to assess how switching from a namedtuple to a recarray would affect my total runtime.

it’s quite strange that your timings are different. I ran your gist, only modifying the print statements to see more easily which line is which, and got the following

print('plain', timeit.timeit(lambda: sub_plain(a, b, result), number=1000000))
print('rec array', timeit.timeit(lambda: sub_rec_array(rec3), number=1000000))
print('array', timeit.timeit(lambda: sub_array(vec3), number=1000000))
print('named tuple', timeit.timeit(lambda: sub_namedtuple(nt3), number=1000000))
print('rec array alt', timeit.timeit(lambda: sub_rec_array_alt(rec3), number=1000000))
print('struct ref', timeit.timeit(lambda: sub_structref(sf3), number=1000000))

plain 0.4252996889408678
rec array 0.35568208200857043
array 0.26795112900435925
named tuple 1.3743177538271993
rec array alt 0.4017717600800097
struct ref 0.9375882970634848

It’s great that you added the namedtuples and struct ref. I don’t think I’ve seen a direct comparison between all these methods before.

Regarding the difference between record array and plain array, I had a look at the code generated by both examples, and even though I am not too good at reading LLVM I think the speed difference might be due to the fact that plain arrays are of uniform type (for example all float64) and therefore item positions are easier to calculate (and maybe easier to cache) than rec arrays where each field has a specific offset in respect to the first element.

Luk

Thanks for confirming, I’ve updated my gist to reflect your modified print statements and also added print statements for the versions of Python, numba, numpy. These are my timings:

Python: 3.9.6 (default, Aug 18 2021, 12:38:10) 
[Clang 10.0.0 ]
Numba: 0.54.0
Numpy: 1.20.3
plain 0.6843872109999998
rec array 1.9209382970000002
array 0.3137645659999997
named tuple 1.7651555309999996
rec array alt 1.9435516949999991
struct ref 0.8691741549999996

Again, this runs on a MacBook Pro with macOS High Sierra (10.13.6) in a conda virtual environment.

All my timings are slower than yours (which is fine, different machines), but for struct ref my timing is actually faster. Don’t want to claim that the difference is significant, but it is surprising.

Your potential explanation re: difference in timings between a record array and a regular array makes sense to me.


What currently is unclear to me is how I would create a record numpy array (or structured numpy array) that holds numpy arrays of different sizes and that also works in Numba.

For example: create a record/structured numpy array named x with one field a that is a (2 x 2) numpy array and another field b that is a (3 x 3) numpy array and subsequently access its fields inside a Numba jitted function like x['a'] or x.b.

Based on your first post I thought this would be possible as long as I pre-specify the sizes of the arrays, but I run into errors trying to create this myself. I think your post in this thread is also related to this: Numba failure with np.zeros in a static method of a class - #6 by luk-f-a

So, is this currently not possible in Numba?

Edit: After researching this some more I stumbled upon this thread: Structured arrays with nd-fields which seems to exactly describe my problem. In that case I cannot currently use a record/structured numpy array as a data container, unless I am missing something. I have to decide between my namedtuple approach (with the 0d numpy array hack to make the “scalars” mutable) or the structref approach, which is still experimental.

sorry, I was focused on the speed question and I forgot to mention that there’s only partial support in Numba for placing arrays inside structs. Since this seems to be important for your use case, then I guess struct ref is the best solution for you.
I’m curious to see if I could manage to improve the nested array support. Some things in Numba are not hard to extend, they just haven’t been done yet, while others have not been done because they are quite hard or they are blocked by something else.

Luk

Thanks for the clarification, Luk.

I have worked some more on this and extended the gist: Extension to https://gist.github.com/ufechner7/95db14f734edd51dcd9b that includes namedtuple and structref benchmarks (and made Python3 compatible) · GitHub

I cleaned up the benchmark code a little and added:

  • benchmarks for the jitclass and Typed Dict
  • benchmarks that actually push the loop inside numba jitted code. If I understand it correctly this means that unboxing happens only once for that set of benchmarks. This is in contrast to the original benchmark, which called jitted functions from a Python loop using the timeit.timeit function.

Results are below:

"""
Python: 3.9.6 (default, Aug 18 2021, 12:38:10) 
[Clang 10.0.0 ]
Numba: 0.54.0
Numpy: 1.20.3

n_repetitions: 1000000

loops outside numba jitted code (unboxing happens each iteration)

separate input variables 0.6867476439999995
numpy array 0.3249390510000003
numpy rec array (attribute style) 1.9444237969999998
numpy rec array (dict style) 1.8849335660000008
named tuple (attribute style) 1.6423331920000006
numba structref (attribute style) 0.8521991450000002
numba jitclass (attribute style) 0.6022664849999995
numba typeddict (dict style) 1.6010483410000003

loop within numba jitted code (unboxing happens once)

numpy array 0.0012557506561279297
numpy rec array (attribute style) 0.0011591911315917969
numpy rec array (dict style) 0.0011630058288574219
named tuple (attribute style) 0.0011470317840576172
numba structref (attribute style) 0.0028722286224365234
numba jitclass (attribute style) 0.0020322799682617188
numba typeddict (dict style) 0.22297120094299316
"""

If the unboxing happens only once the performance difference for all different types of containers virtually disappears, except for the Typed Dict which is approximately 100-200x as slow as the other containers.

What can we take away from this? (Note: I purposefully exclude the numpy record array from these conclusions as the benchmark results between me and @luk-f-a are not consistent and it is not able to hold nested numpy arrays inside jitted code)

  • If your loops are inside numba jitted code, performance differences are negligible, except for the Typed dict which seems to be significantly slower
  • If your loops are outside of numba jitted code (ideally they really should not be), the containers can be grouped in roughly two groups based on performance. The slower group contains the named tuple and Typed dict. The faster group contains the structref and jitclass

In my personal application (which has the loops inside jitted code) this resolves the question regarding performance differences and basically implies that I should not be concerned about this, as long as I am not using a Typed dict.


What I learned from this thread and the simulation is that we have the following options:


My personal take on this: At the moment it comes down to the namedtuple vs. structref (given that the jitclass has known issues with caching):

  • namedtuple:

    • Mutability: numpy arrays of fixed size are OK. scalars need a not so elegant hack that transform them into a 0d array
    • Stability: Support for the namedtuple seems to be stable
    • Caching: Unknown (to me) if there are any issues with caching. Can anyone chime in on this?
    • Other: Requires a small amount of boilerplate code to setup the namedtuple class
  • structref

    • Mutability: Fields are completely mutable
    • Stability: Currently an experimental feature, not stable
    • Caching: Should not have any issues with caching
    • Other: Currently requires more boilerplate code to setup (somewhat circumvented by the approach at the bottom of Caching a function that takes in a jitclass instance · Issue #6522 · numba/numba · GitHub). It also seems that the structref allows for more flexibility (i.e. creating structref-specific methods) that the namedtuple of course is lacking.

In sum, the structref seems to provide a “mutable” namedtuple (and more) but is currently still experimental.


Having said that, I would be interested in hearing from the Numba dev team what they consider the “future” of a Numba compatible data container that allows for heterogeneous and mutable data. I can imagine that many Numba applications would benefit from such a data container.

@brunojacobs that’s a great summary.

over the weekend I did some work on adding better support for nested arrays. It still needs more tests, but I was able to get some common use cases running. These are the tests cases that are currently passing: first batch of tests · luk-f-a/numba@cb728ff · GitHub

Thanks for following up on this, that looks interesting! However, I have no experience contributing to Numba, so a clarification question: Did you make changes in the Numba code base but only link the corresponding tests?

I noticed some other threads popping up the last few days on Numba’s Discourse and Github pages regarding issues with the record array in Numba 0.54 and I noticed that you already submitted a pull request: Extend support for nested arrays inside numpy records by luk-f-a · Pull Request #7359 · numba/numba · GitHub</t

Is what we discussed in this topic related to that?

yes, I expanded the support in Numba, and also added tests. I linked to the tests so you could see what kind of operations this PR makes possible.

Issue 7359 came from a change in 0.54. It was a coincidence that it showed up the same week as we were discussing this. While I was working on the PR I found out that I had to fix that issue anyway, so it was two birds one stone.