Accessing structured array scalars by index

This code works in plain-python mode, but borks in jit mode. I don’t believe this is the same thing as another similar post but I wouldn’t be surprised if there’s some relationship under the covers. Numpy docs indicate this should be legal, and Numba supported features don’t seem to mention it.

from numba import njit
import numpy as np
arr = np.array([(1, 2., 3.)], dtype='i, f, f')
# @njit 
def get_elem(arr, idx, member_idx):
    return arr[idx][member_idx]

print(get_elem(arr, 0, 1))

returns

No implementation of function Function() found for signature:
getitem(Record(f0[type=int32;offset=0],f1[type=float32;offset=4],f2[type=float32;offset=8];12;False), int64)
There are 22 candidate implementations:

  • Of which 22 did not match due to:
    Overload of function ‘getitem’: File: : Line N/A.
    With argument(s): ‘(Record(f0[type=int32;offset=0],f1[type=float32;offset=4],f2[type=float32;offset=8];12;False), int64)’:

Hi @nelson2005,

The issue here is that Numba cannot work out what the return type of the second getitem should be based solely on the input types. What Numba “sees” is roughly this:

def get_elem(arr :<some array>, idx :<some int>, member_idx:<some other int>):
   # This is the problem, `arr` has a dtype of `i, f, f` and
   # depending on `member_idx` which is `<some other int>`,
   # the `__getitem__` returns an `int` or a `float`, and so
   # type inference fails as it can't possibly resolve it.
    return arr[<some int>][<some other int>]

You’ll note that this works:

In [20]: from numba import njit
    ...: import numpy as np
    ...: arr = np.array([(1, 2., 3.)], dtype='i, f, f')
    ...: @njit
    ...: def get_elem(arr, idx):
         # get a member of arr, and look up specifically 'f1', which is
         # known to be a float (look in the arr.dtype).
    ...:     return arr[idx]['f1'] 
    ...: 
    ...: print(get_elem(arr, 0))
2.0

Hope this helps?

It’s not making sense to me yet, but usually that’s because I’m a bit thick :slight_smile:

If I print the numba type, like

print(numba.from_dtype(arr.dtype))

returns

Record(f0[type=int32;offset=0],f1[type=float32;offset=4],f2[type=float32;offset=8];12;False)

Specifying the ‘f1’ literally seems to be addressing a different problem… that the value of member_idx is unknown at compile time. However, in that case I’d expect that a literal column number would be sufficient, like

@njit
def get_elem(arr, idx, member_idx):
    return arr[idx][1]

Since the member index is a literal, similar to your example with

def get_elem(arr, idx):
    return arr[idx]['f1']

I did (of course :slight_smile: ) try to look up the name from an index using your second example with a literal member index and received almost the same error as in the original posting. It’s quite possible my implementation had an error.

import numba
from numba import njit
import numpy as np
from numba.extending import overload

def names_tuple(val):
    if isinstance(val, np.dtype):
        val = numba.from_dtype(val)
        return tuple([name for name, field in val.fields.items()])
    if hasattr(val, 'dtype'):
        return names_tuple(val.dtype)
    raise RuntimeError(f"FIXME: I don't know how to handle type '{val}'")

def structured_name(array, member_idx):
    pass

@overload(structured_name)
def ol_structured_name(array, member_idx):
    names = names_tuple(array)
    print("names=", names)
    def _(array, member_idx):
        return names[member_idx]
    return _

arr = np.array([(1, 2., 3.)], dtype='i, f, f')
@njit
def get_elem(arr, idx, member_idx):
    return arr[0][structured_name(arr, 1)]

print(get_elem(arr, 0, 1))

output is

names= (‘f0’, ‘f1’, ‘f2’)
…snip…
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function() found for signature:

getitem(Record(f0[type=int32;offset=0],f1[type=float32;offset=4],f2[type=float32;offset=8];12;False), unicode_type)

There are 22 candidate implementations:

  • Of which 22 did not match due to:
    Overload of function ‘getitem’: File: : Line N/A.
    With argument(s): ‘(Record(f0[type=int32;offset=0],f1[type=float32;offset=4],f2[type=float32;offset=8];12;False), unicode_type)’:
    No match.

so the problem is not the literal vs non-literal look up, but the fact that numba requires a string which will be matched against the field names, and does not accept an integer to be solved by offset? Did I get that right?
If all your fields are of the same dtype, you can create a view into the original array and then pass that to numba instead of the original array. The view should be built as a normal array of the common dtype. Then the function will work.

I think there’s potentially a mix of two things going on:

  1. getitem on structured data needs a compile time constant index to resolve.
  2. That arr[idx][<const integer>] is seemingly unsupported: Record staticgetitem with integer index not implemented. · Issue #6655 · numba/numba · GitHub

Then there’s also something complicated going on with a getitem that can be resolved as static only after type inference has run but by then it’s too late. This [structured_name(arr, 1)] is a getitem as it depends on type, but then becomes static (i.e. const) once resolved.

This sort of thing seems to work:

import numba
from numba import njit, literally, types
import numpy as np
from numba.extending import overload

def names_tuple(val):
    if isinstance(val, np.dtype):
        val = numba.from_dtype(val)
        return tuple([name for name, field in val.fields.items()])
    if hasattr(val, 'dtype'):
        return names_tuple(val.dtype)
    raise RuntimeError(f"FIXME: I don't know how to handle type '{val}'")

def structured_name(array, member_idx):
    pass

@overload(structured_name)
def ol_structured_name(arr, member_idx):
    if not isinstance(member_idx, types.Literal):
        # force member_idx to be a literal value if it can be and it's not
        # already
        return lambda arr, member_idx: literally(member_idx)
    else:
        # member_idx is a literal int, use it to do the look up
        names = names_tuple(arr)
        index = member_idx.literal_value
        assert index < len(names)
        lookup = names[index]
        def impl(arr, member_idx):
            return lookup
        return impl

arr = np.array([(1, 2., 3.)], dtype='i, f, f')

@njit
def get_elem(arr, idx, member_idx):
    return arr[0][structured_name(arr, member_idx)]

[print(get_elem(arr, 0, x)) for x in (0, 1, 2,)]

Thanks, that’s helpful. And thanks for filing the bug report in github.
@luk-f-a I hadn’t thought of a view, but will keep that one filed away for a possible time when the fields are the same dtype.
I ended up combining techniques to get the virtual table backed by a structured array working, using byte offsets into the structure to identify the ‘column’ positions. It also involved some of @DannyWeitekamp 's kit. This has been quite a learning process for me, and I couldn’t have done it without your help!

@luk-f-a thanks! Creating a view when the fields are homogeneous in dtype is a good workaround for getting a lot more “array” support in compiled code.

@nelson2005 no problem, structured arrays are a less well trodden path and have numerous, sometimes surprising, ways of accessing them. Feel free to open issues for things that seem like they ought to work but don’t!

you’re welcome. Structured arrays are underused in Python (and probably in Numba too), but for fast calculations they are hard to beat. I use them a lot so I’ve picked up a few tricks. It’s a shame Numpy does not allow creation of stand alone records (and therefore Numba doesn’t either), because records are all the jitclass (or named tuples) most people actually need.

@luk-f-a agreed, they’re excellent for information that needs to be accessed as a group. Much better to have a structure with 100 elements than 100 different numpy arrays, all independently indexed.

Speaking of that, it does seem possible to pass them around- is there a _Utils for these structref that allows grabbing the pointer to a passed-in structured array element? Something that corresponds to this inside an intrinsic, but for structured array elements.

  def codegen(context, builder, sig, args):
        val_ty, = sig.args
        val, = args

        utils = _Utils(context, builder, val_ty)
        dataptr = utils.get_data_pointer(val)
        ret = builder.ptrtoint(dataptr, cgutils.intp_t)
        return ret