Structured arrays with nd-fields

Hi :slight_smile:

Not sure if I am misunderstanding the docs, or if I am doing something wrong, so bear with me please.

Scalar types
Numba supports the following Numpy scalar types:

[...]

Structured scalars: structured scalars made of any of the types above and arrays of the types above

The following scalar types and features are not supported:

[...]

Nested structured scalars the fields of structured scalars may not contain other structured scalars

(from Supported NumPy features — Numba 0.50.1 documentation)

To me this reads as if it should be possible to have a custom numpy dtype, where one or more fields are not themselves scalar but arrays, i.e. something like

import numpy as np

x = np.array([(1, (2,2)), (3, (1,4))], dtype=[('foo', 'i8'), ('bar', 'f4', 2)])

If I try to access the scalar field foo everything is fine

import numba as nb

@nb.njit
def foo(x):
    return x.foo

foo(x)

returns array([1, 3]).

If I try to access the array field bar things crash and burn

@nb.njit
def bar(x):
    return x.bar

bar(x)

results in

---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
<ipython-input-18-5600b61d894a> in <module>
----> 1 bar(x)

~/anaconda3/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
    412                 e.patch_message(msg)
    413 
--> 414             error_rewrite(e, 'typing')
    415         except errors.UnsupportedError as e:
    416             # Something unsupported is present in the user code, add help info

~/anaconda3/lib/python3.7/site-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
    355                 raise e
    356             else:
--> 357                 raise e.with_traceback(None)
    358 
    359         argtypes = []

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Internal error at resolving type of attribute "bar" of "x".
Buffer dtype cannot be buffer, have dtype: nestedarray(float32, (2,))
During: typing of get attribute at <ipython-input-17-c963215583c1> (3)
Enable logging at debug level for details.

File "<ipython-input-17-c963215583c1>", line 3:
def bar(x):
    return x.bar
    ^

The error message suggests that I am vioating the nesting mentioned in the docs. However I got the impression that nesting should be fine as long as the nested type is not itself a structured type.

Thanks for any help :slight_smile:

Polite ping @luk-f-a - structured array mastermind :wink:

hi @Hannes , happy to share what I know. I didn’t build any of this, but I use it a lot. In numpy, placing arrays inside structures is allowed. Numba seems to have only partial support for those. You can pass those structures-with-arrays as inputs, but many operations fail when applied to those variables.
It seems to me that it’s a missing feature, rather than a fundamental problem or a bug.

Luk

I see, thanks for the quick response! :slight_smile: Might try and have a look at the numba implementation, but this likely exceeds my understanding of the internals.

For what I am planning to do the structured arrays may not be optimal anyway (due to varying array sizes which require excessive padding), but I got curious to see if and what performance advantages one can gain over namedtuples as containers, and how well structured arrays work with numba’s prange.

regarding namedtuples, they are a good option in the right circumstances. I can tell you that the unboxing of an array is orders of magnitude faster. So, if you cross the python-numba boundary many times with a tuple (ie calling a jitted function from python) the performance will be much worse with tuples than with arrays.
Also, a nested array is guaranteed to be contiguous in memory with the rest of the struct, while a namedtuple of arrays will be (I think) pointers to the individual memory locations of those arrays. That can be good or bad: creating them does not require a full memory copy, but also data locality will be worse.

this likely exceeds my understanding of the internals.
don’t worry, every PR I’ve done has exceeded my understanding of the internals :laughing: you just need to make sure that it’s only 1 level beyond your current understanding, otherwise it’s very painful.

Yes so far the namedtuples are working a charm, but I am aware of the boundary overhead, and since this is about solving ODEs there are many Python → numba calls involved, might have a go at solving that through closures (in my specific case that should be an option since the tuples are used to parameterise the differential equations).
The other limitation comes from prange only playing nicely with arrays. I am aware that there are plans to overhaul numba’s parallel abilities, but I guess that it may take a while before we see the results of that.

Also, a nested array is guaranteed to be contiguous in memory with the rest of the struct, while a namedtuple of arrays will be (I think) pointers to the individual memory locations of those arrays. That can be good or bad: creating them does not require a full memory copy, but also data locality will be worse.

Just to avoid a misunderstanding: Arrays do not get copied when stepping into the numba world, correct? The underlying buffer is the same as on the Python side, or did I get that terribly wrong? :smiley:

that’s correct, the buffer is the same, they are not copied.

1 Like

There does indeed seem to be an issue, just found this in the tracker Array assignment to a structure's nested array · Issue #6473 · numba/numba · GitHub