In a sense, it is supported. The Awkward Array library has the same columnar data types as Arrow, so an Arrow → Awkward conversion is zero-copy[1], and Awkward Arrays are a registered Numba extension, so you can iterate over them in JIT-compiled functions.
Here’s an example, starting from a Pandas ArrowStringArray:
>>> import numpy as np
>>> import pandas as pd
>>> import pyarrow as pa
>>> import awkward as ak
>>> import numba as nb
>>> x = pd.array(
... ['This is', 'some text', None, 'data.'], dtype="string[pyarrow]"
... )
>>> pa.array(x)
<pyarrow.lib.ChunkedArray object at 0x7fea6c47d770>
[
[
"This is",
"some text",
null,
"data."
]
]
>>> ak.from_arrow(pa.array(x))
<Array ['This is', 'some text', None, 'data.'] type='4 * ?string'>
>>> @nb.njit
... def f(strings):
... out = np.zeros(len(strings), dtype=np.int64)
... for i, s in enumerate(strings):
... if s is not None:
... out[i] = ord(s[0])
... return out
>>> f(ak.from_arrow(pa.array(x)))
array([ 84, 115, 0, 100])
Strings are a very specific data type; Arrow can represent nested structs, variable length lists, missing data, and heterogeneous data, too. These data types can be zero-copy converted[1:1] to Awkward Arrays and those Awkward Arrays can be iterated over in Numba, which means iterating over the original Arrow buffers. In the Numba function, Arrow structs (Awkward records) appear as objects with attributes, variable-length lists appear as sequence types, and missing values as None.[2]
Going the other way—producing non-rectilinear data structures in Numba and sending them to an Arrow buffer—can be done using Awkward’s ArrayBuilder and ak.to_arrow, but it’s not as slick and it requires more data copies.
This capability doesn’t reside within the Numba project; a user has to piece these things together with different libraries, but it is possible.
-
Well, there’s exactly one exception to zero-copy conversion from Arrow: Arrow’s sparse unions have to be converted into dense unions before conversion, so one index buffer per sparse union needs to be allocated and filled. ↩︎ ↩︎
-
Awkward Arrays with union types can’t be iterated over in Numba-compiled functions, so that’s another limitation, also related to unions. Generally speaking, union-types are the rough edge of support. ↩︎