Extending Numba with a "convertible to" type?

Using Numba’s extension API, is it possible to extend all functions and builtin operators that expect type Y with type X by inserting an X → Y conversion function?

Here’s the use-case: hundreds of functions taking NumPy arrays have been defined in core Numba. I have extended an array type that is sometimes convertible to NumPy arrays (I can raise exceptions in the conversion function if it is not). I would like to implement all of these functions and any that may be added in the future by registering an implicit conversion, similar to Scala’s implicit conversions.

For example, I can implement sums over contents of Awkward Arrays like this:

>>> import numpy as np
>>> import numba as nb
>>> import awkward1 as ak

>>> @nb.njit
... def f(input):
...     output = np.zeros(len(input), np.float64)
...     for i, x in enumerate(input):
...         for y in x:
...             output[i] += y
...     return output

>>> f(ak.Array([[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]]))
array([ 3.,  0.,  7.,  5., 30.])

because input is an iterable Awkward Array yielding Awkward Arrays x and x is an iterable Awkward Array yielding numbers y, and output[i] += y knows how to add numbers to items of an array.

But suppose I want to write it like

>>> @nb.njit
... def f(input):
...     output = np.zeros(len(input), np.float64)
...     for i, x in enumerate(input):
...         output[i] = np.sum(x)
...     return output
... 
>>> f(ak.Array([[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]]))

As before, x is an Awkward Array. I can lower a conversion function to_numpy that converts the Awkward Array into a (lowered) NumPy array or raise an exception trying. However, I need

  • Numba’s typing pass to recognize all functions, not just np.sum, that take a NumPy array as also being open to a signature with an Awkward Array in place of the NumPy array, and
  • Numba’s lowering pass to insert my conversion function in those places.

Is there anything I can do about this? The use-case described above is scikit-hep/awkward-1.0#509, but this would also enable scikit-hep/awkward-1.0#174, implementing union-typed Awkward Arrays, because that also has to overload an open-ended set of functions. (If the union array can have values of type float or bool, then it should be registered as convertible to both float and bool, but if a value wants to resolve to bool and the particular datum happens to be float, the conversion function would raise an exception. That implements dynamic type-checking for a fixed set of types.)

The convertible-to-array case (issue #509) would be simpler to implement and is a more immediate need than union arrays (issue #174), but they both depend on this capability.

Is this already possible in Numba or would something have to change?

I forgot, I was going to give the exception raised by the second code example. However, it’s not a mystery why it fails—array_sum is implemented for types.Array, not Awkward Array’s ArrayView.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/numba/core/dispatcher.py", line 415, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/numba/core/dispatcher.py", line 358, in error_rewrite
    reraise(type(e), e, None)
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/numba/core/utils.py", line 80, in reraise
    raise value.with_traceback(tb)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function sum at 0x7fd7ffa6e430>) found for signature:
 
 >>> sum(awkward1.ArrayView(awkward1.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()))
 
There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'Numpy_method_redirection.generic': File: numba/core/typing/npydecl.py: Line 370.
    With argument(s): '(awkward1.ArrayView(awkward1.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()))':
   Rejected as the implementation raised a specific error:
     TypeError: array does not have a field with key 'sum'
   
   (https://github.com/scikit-hep/awkward-1.0/blob/0.4.4/src/awkward1/_connect/_numba/layout.py#L325)
  raised from /home/jpivarski/irishep/awkward-1.0/awkward1/_connect/_numba/layout.py:323

During: resolving callee type: Function(<function sum at 0x7fd7ffa6e430>)
During: typing of call at <stdin> (5)


File "<stdin>", line 5:
<source missing, REPL/exec in use?>

Ah, one last thing: it looks like typeconv does this, but for scalar types. I wonder if the mechanism applies/can be applied more generally?

Also, there’s an ArrayCompatible abstract type that I could make Awkward Arrays inherit from. That can solve the convertible-to-array problem (my issue #509) but not the union type problem (issue #174), but that’s okay because the first is more immediately relevant.

However, functions in arraymath.py, for example, have concrete Array in their signatures, not ArrayCompatible. So this wouldn’t work, would it?

I also found a few notes about abstracting an “ArrayLike” (numba/numba#3855 and the Jan 15, 2019 minutes).

if this purely a typing problem, your custom type can have a can_convert_to method that would allow the type conversion. I used it to implement subtyping here https://github.com/numba/numba/pull/5579 and here https://github.com/numba/numba/pull/5560

If the lowering should be different (ie np.sum would have to be re-compiled for the custom type) then I don’t know how to do it.

Thanks! I just read through those PRs and I see that they’re about the subtyping that was talked about at last Tuesday’s meeting. What I need involves both typing and lowering:

  • Typing to recognize an Awkward Array (ak._connect._numba.arrayview.ArrayViewType) everywhere that a NumPy array is in a signature (nb.types.Array);
  • Lowering to insert the Awkward Array → NumPy conversion function (ak._connect._numba.arrayview.ArrayViewModelnb.core.datamodel.models.ArrayModel) as a first step in evaluating a function with a NumPy array in its signature but an Awkward Array passed in.

I started by thinking about the general solution, Scala-like implicit conversions, which would let extension developers make their objects masquerade as any type—concrete or abstract—without modifications to the core library. If that isn’t available, there are other options.

For my most immediate need, scikit-hep/awkward-1.0#509, only two types are involved: Awkward Arrays and NumPy arrays. There is a mechanism for this particular target: nb.core.types.abstract.ArrayCompatible. Just as making my arrays inherit from nb.types.IterableType was sufficient to get enumerate and zip for free, inheriting from nb.core.types.abstract.ArrayCompatible could be enough to get a lot of functions in arraymath.py if they were modified to take ArrayCompatible as argument types instead of Array and then call the conversion function as a first step. In fact, it looks like all the ufuncs in npydecl.py already accept ArrayCompatible instead of Array, so without any changes to Numba, I might at least get ufuncs for free.

So as a first step, perhaps I should try making Awkward Arrays inherit from ArrayCompatible and see if I can get all the ufuncs. (The original motivation was a non-ufunc, but one thing at a time.)

Looking more closely at ArrayCompatible, it is an abstract type that I can subclass (good) as long as I implement as_array, which returns the nb.core.types.Buffer type that this corresponds to (easy). But that’s just typing. Where do I implement the lowering that converts a concrete ak._connect._numba.arrayview.ArrayViewModel into a concrete nb.core.datamodel.models.ArrayModel? The lowered ufuncs need an ArrayModel to run and I can write the lowered conversion that turns my model into your model, but I need Numba to insert it in the right places.

How do I do this? Or am I wrong and is ArrayCompatible a promise that my concrete objects have the same memory layout as ArrayModel (in which case, it’s not really abstract)?

What about lower_cast? Is this the general, implicit “converible to” that I was asking about above?

Direct ping for @jpivarski and @luk-f-a RE: @DrTodd13’s post for the open meeting next Tuesday: Public Numba Dev Meeting, Tuesday December 8 2020 Wanted to make sure this stands out in all the discourse traffic in case you want to attend!

Thanks for the heads-up! I intend to be there. (I’m attempting to join regularly, though the event on my calendar had been out of phase.)

I’m particularly interested in @DrTodd13’s new PR, hoping that it or part of it will be at the right level of abstraction. (If the interface is too confining, I won’t be able to use it, so I’ll be looking closely at it. Awkward Array isn’t strictly a NumPy subclass, though it can be typed as one and raise exceptions at runtime if it can’t be converted.)

I think we’re talking about this one: https://github.com/numba/numba/pull/6148 I don’t see a new type defined there, but I can ask about it and follow the discussion next Tuesday.

Yes, that is the right PR. The PR allows you to define Numpy subclasses outside of Numba but does not itself introduce a new Numpy subclass. Siu had asked for an example of how to do that so if you look at one of my comments on the PR then you can see some sample code of how it could be done.

Todd

Ah, that sounds like something I won’t be able to take advantage of.

But anyway, I still need to follow up on lower_cast.

Thanks for clarifying!

Hi Jim,

I’d like to understand why you say that. Can you explain a bit more?

thanks!

Todd

I have a data type that cannot be a NumPy subclass (Awkward Arrays), but when the data happen to be rectilinear, I want them to act like NumPy arrays.

>>> import awkward as ak
>>> import numpy as np
>>> ragged = ak.Array([[1, 2, 3, 4, 5], [1, 2, 3], [1, 2, 3, 4]])
>>> ragged
<Array [[1, 2, 3, 4, 5], ... 3], [1, 2, 3, 4]] type='3 * var * int64'>

>>> # Can't convert
>>> np.asarray(ragged)
Traceback (most recent call last):
...
ValueError: in ListOffsetArray64, cannot convert to RegularArray because
subarray lengths are not regular

>>> # Can convert
>>> ragged[:, :3]
<Array [[1, 2, 3], [1, 2, 3], [1, 2, 3]] type='3 * var * int64'>
>>> np.asarray(ragged[:, :3])
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

In a Numba context, we’re often looping over the elements of these arrays, which is fine:

>>> import numba as nb
>>> @nb.njit
... def f(ragged):
...   for row in ragged:
...     print("new row")
...     for col in row:
...       print(np.sin(col))
... 
>>> f(ragged)
new row
0.8414709848078965
0.9092974268256817
0.1411200080598672
-0.7568024953079282
-0.9589242746631385
new row
0.8414709848078965
0.9092974268256817
0.1411200080598672
new row
0.8414709848078965
0.9092974268256817
0.1411200080598672
-0.7568024953079282

But users often assume they can also pass a whole row into a function that expects a NumPy array, since a single row is, by itself, rectilinear.

>>> @nb.njit
... def f(ragged):
...   for row in ragged:
...     print(np.sin(row))
... 
>>> f(ragged)
Traceback (most recent call last):
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<ufunc 'sin'>) found for signature:
 
 >>> sin(ak.ArrayView(ak.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()))
 
There are 2 candidate implementations:
    - Of which 2 did not match due to:
    Overload in function 'Numpy_rules_ufunc.generic': File: numba/core/typing/npydecl.py: Line 96.
      With argument(s): '(ak.ArrayView(ak.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()))':
     Rejected as the implementation raised a specific error:
       TypingError: can't resolve ufunc sin for types (ak.ArrayView(ak.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()),)
  raised from /home/jpivarski/miniconda3/lib/python3.8/site-packages/numba/core/typing/npydecl.py:102

During: resolving callee type: Function(<ufunc 'sin'>)
During: typing of call at <stdin> (4)

That’s why, in this thread, I’ve been looking for a way to call an Awkward Array a subtype of a NumPy array when in the Numba context, and raise an exception if it can’t be converted at runtime. Or maybe just to call a one-dimensional Awkward Array a subtype of NumPy in the Numba context. However, none of these things can be NumPy subclasses outside of the Numba context.

It’s that last point that makes me think that your PR doesn’t apply, since you’re not introducing a new Numba type—you’re just identifying when something is a NumPy subclass and using that fact to improve Numba’s fidelity to Python. Unless I misunderstood you or my brief scan of your PR.

| jpivarski
December 5 |

  • | - |

I have a data type that cannot be a NumPy subclass (Awkward Arrays), but when the data happen to be rectilinear, I want them to act like NumPy arrays.

>>> import awkward as ak

>>> import numpy as np

>>> ragged = ak.Array([[1, 2, 3, 4, 5], [1, 2, 3], [1, 2, 3, 4]])

>>> ragged

<Array [[1, 2, 3, 4, 5], ... 3], [1, 2, 3, 4]] type='3 * var * int64'>

>>> # Can't convert

>>> np.asarray(ragged)

Traceback (most recent call last):

...

ValueError: in ListOffsetArray64, cannot convert to RegularArray because

subarray lengths are not regular

>>> # Can convert

>>> ragged[:, :3]

<Array [[1, 2, 3], [1, 2, 3], [1, 2, 3]] type='3 * var * int64'>

>>> np.asarray(ragged[:, :3])

array([[1, 2, 3],

       [1, 2, 3],

       [1, 2, 3]])

In a Numba context, we’re often looping over the elements of these arrays, which is fine:

>>> import numba as nb

>>> @nb.njit

... def f(ragged):

...   for row in ragged:

...     print("new row")

...     for col in row:

...       print(np.sin(col))

... 

>>> f(ragged)

new row

0.8414709848078965

0.9092974268256817

0.1411200080598672

-0.7568024953079282

-0.9589242746631385

new row

0.8414709848078965

0.9092974268256817

0.1411200080598672

new row

0.8414709848078965

0.9092974268256817

0.1411200080598672

-0.7568024953079282

But users often assume they can also pass a whole row into a function that expects a NumPy array, since a single row is, by itself, rectilinear.

>>> @nb.njit

... def f(ragged):

...   for row in ragged:

...     print(np.sin(row))

... 

>>> f(ragged)

Traceback (most recent call last):

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)

No implementation of function Function(<ufunc 'sin'>) found for signature:

 

 >>> sin(ak.ArrayView(ak.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()))

 

There are 2 candidate implementations:

    - Of which 2 did not match due to:

    Overload in function 'Numpy_rules_ufunc.generic': File: numba/core/typing/npydecl.py: Line 96.

      With argument(s): '(ak.ArrayView(ak.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()))':

     Rejected as the implementation raised a specific error:

       TypingError: can't resolve ufunc sin for types (ak.ArrayView(ak.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()),)

  raised from /home/jpivarski/miniconda3/lib/python3.8/site-packages/numba/core/typing/npydecl.py:102

During: resolving callee type: Function(<ufunc 'sin'>)

During: typing of call at <stdin> (4)

That’s why, in this thread, I’ve been looking for a way to call an Awkward Array a subtype of a NumPy array when in the Numba context, and raise an exception if it can’t be converted at runtime. Or maybe just to call a one-dimensional Awkward Array a subtype of NumPy in the Numba context. However, none of these things can be NumPy subclasses outside of the Numba context.

It’s that last point that makes me think that your PR doesn’t apply, since you’re not introducing a new Numba type—you’re just identifying when something is a NumPy subclass and using that fact to improve Numba’s fidelity to Python. Unless I misunderstood you or my brief scan of your PR.

Okay. I think I understand your situation and it still might be possible that this PR could be of use to you. If you look at https://github.com/IntelPython/numba-dppy/pull/50/files then you can see how we adapt our Numpy subclass to Numba. We have DPArrayType deriving from types.Array and if your data structure is array-like then you could potentially also inherit the type from types.Array. If that is true, then you’d get the new array_ufunc capability. What happens for you now if you combine an Awkward array with a regular array…do you get another awkward array or a regular array? Would you like to be able to control that? What happens if you have an operation with 2 awkward arrays…what type do you get back now? I guess you could implement rules for this the hard way but array_ufunc I think is a bit easier. In terms of how to provide array like typing and lowering for array-like things, our Numpy subclass is in another repo and we do some tricks where we inspect Numpy classes and functions and duplicate those in our version but with changing the typing of resulting ndarrays. If you want to see that code I can point you to it.