Complex Structured Inputs

(re-posted under comm. support)

I’m embarking on an optimization of a long-running scipy.optimize model function, and hoping to use numba to help speed things up. The optimizers allow passing additional arguments to the model function. Since the model depends on quite a few externally configured scalars and arrays, I am hoping to package these up into a structured object, like a dict or namedtuple (I see dataclass is not yet supported).

As an analogy, an input to a numba-compiled function might look like:

param_map = {"arr1": some_array1, "arr2": {"upper": some_array2, "lower": some_array3}, "arr3": {"val": 1.0, "other": some_array4}} 

i.e. a nested list of scalars and arrays, where the some_array’s can be simple 1D numpy arrays or structured arrays.

Note that param_map is not constant, i.e. it is initialized outside of numba before being passed into the njit’d model function via scipy, and some of its individual values can be updated during the run.

Is there a recommended best practice for this situation, where you have dozens of individual scalars and numpy (structured) arrays you need to pass in as an argument to a numba-njit’d function, for reading/writing?

1 Like

I’d suggest structref. If you need lots of them, it may be worth writing a code generator to create them.
There’s a lot of good info about them in this thread, and @DannyWeitekamp’s CRE library has lots of outstanding examples.

Thanks for the suggestion. I neglected to mention that I’m aiming to have numba be run-time optional, so I was hoping to stick to plain python/numpy data types that numba could natively work with. The actual structure is static, i.e. all arrays and values are pre-populated in size, length, and type, before entering numbs-treated code.

I suppose I could just go with a long list of arguments (array1, array2, val, val, val, array3, …) and unpack that, but that will get ugly quick!

One other thought occurred: there are precious few bits of the structure that need updating. I could separate those into their own arguments. Is there a good static struct type for passing as read-only into an njitted function?

Edit: a jitclass also looks quite reasonable, as it could fall back to a normal Python object if numba is not installed.

I went down the jitclass route for some of my data structures, though regretted it a bit when I realised that this creates havoc for cacheing.

I am very interested in this question and what others suggest. Some sort of named / typed tuple would seem like a good solution if there is a way to make this work.

So far I’ve found structref a bit too complex to get my head around / implement in code, at which point it basically becomes easier just to go back to multiple individual function parameters…

Thanks songolo. Can you mention more about the caching issues you encountered? An advantage of jitclass is you can overload it if numba isn’t installed and it will work the same.

Basically just that the compiled JIT function can’t be cached between invocations, so it becomes painful at times waiting for the function to compile each time you run it.

Thanks. I gathered from this issue that a workaround for the non-caching behavior is to interact with jitclass objects only in njitted/cached wrapper functions. I’m not totally clear if that’s a requirement for the full lifecycle of a jitclass object (create, set attributes/properties, access/update those properties), or just part of the lifecycle.

For the part of the data that is nested collections of arrays, another good alternative is to use awkward arrays. The library comes with all the plumbing needed for it to be used in numba jit routines.

@ananis25 are there any examples of how awkward works with Numba?

While I don’t remember any right now, this blog post is an example.

Thanks; awkward arrays looks indeed very interesting, especially when you don’t/can’t know in advance the degree of nesting, length of sub-lists, etc.

For my case, it’s much simpler. I’m really just trying to neatly package otherwise simple inputs that numba already knows how to deal with.

I.e., rather than:

my_numba_func(x, y, array_1, array_2, string1, string2, array_3, string3, string4, array_4, array_5, array_5b, array_5c, string_array1, ...)

I want:

my_numba_func(x, y, params)

where params is a suitable readable/writable object that numba can compile operations on down into object code.