Hi all,
In my scientific computing code I often deal with statistical models with a large set of variables (say 20 to 50) that need to be passed into jitted functions. For maintainability and readability of my code I’d like to use an object/structure that acts as a data container that allows me to just pass the container to a function instead of the separate variables.
My minimum requirements for such a container are:
- Works in nopython mode
- Using the container instead of separate variables should not significantly impede performance (for example, want to avoid issues such as Structured arrays 10 times slower than two dimensional arrays in nopython mode · Issue #1067 · numba/numba · GitHub)
- Access the elements in the container using keys, either as container.key or container[key] (slight preference for the former)
- Container should allow for heterogeneous data types that are each supported in nopython mode (e.g. strings, floats, ints, numpy arrays)
- The elements that correspond to a numerical type (float or numpy array) should be mutable, but the keys of the container itself can be fixed in advance
And the nice-to-haves:
- Container should ideally be relatively stable (no experimental feature of Numba)
- Container should ideally not be Numba-specific (so should work in vanilla Python code)
- Container should be pickable/unpickable
So far the namedtuple checks virtually all of these boxes, except for mutable elements. As most of the elements that I want to change in my container are numpy arrays (of fixed dimension), this is often still not a real problem as I can replace the values of these numpy arrays in-place. To replace scalar float elements I need to jump through some more hoops but it is possible:
- I can call the namedtuple’s _replace method, but that returns a new instance of the named tuple (not optimal inside a loop) and the _replace method does not work within jitted code, which is a no go.
- I can replace the scalar float element with a 0d numpy array that I can access and update as container.scalar = np.array(0.0), container.scalar[()] = 3.14. However, this does feel like a bit of a hack.
My question is, is there a “better” alternative that checks all the boxes?
I am aware of jitclass and Typed Dict but both are still experimental, correct? In the past I also have experimented with the numpy structured arrays/record array, but wonder if Structured arrays 10 times slower than two dimensional arrays in nopython mode · Issue #1067 · numba/numba · GitHub is still an issue. If there are other pros and cons, or alternatives, for any of these containers then I would be interested to learn about those as well.