NumPy 2.x support community update

TL;DR

Numba’s work to fully support NumPy 2.x is currently in the state of “Numba is binary compatible with NumPy 1.x and NumPy 2.x”. We (the Numba maintainers) have decided to “hold” Numba in this state as the impact of NumPy 2.x changes on the user base seems lower than first anticipated. Doing this allows us to free up resources to explore a new, more capable, compiler toolchain (the one we’ve been building to support AOT compilation etc). We’re continuing to update Numba against new NumPy 2.x (and new Python) releases, but are not committing to providing full support for the NEP-050 semantics introduced by NumPy 2.0 within the current Numba code base until we’re more sure about the impact, cost and benefits.

What happened?

A while ago @stuartarchibald (who is also writing this!) wrote a plan (Communicating NumPy 2.0 Changes to Numba Users) for what we thought Numba would need to change to support NumPy 2.x, along with an estimated timeline for delivery. The work outlined in this plan still stands as an approach to NumPy 2.x support. Essentially, Numba would need a new “split” type system capable of separately representing Python and NumPy types and then Numba would need to be “fixed” to accommodate this change.

@kc611 and @stuartarchibald undertook the work to develop the split type system, it took a while and required multiple attempts, it turned out to be quite complicated to get right. The key observation was that the new type system needed to not only separately represent Python and NumPy types, but also must better reflect their properties and inheritance patterns.
Once the type system had been split, work focussed on how to support fundamental operations (like addition) given NEP-050 rules. The Numba code base has historically used a “template” mechanism to describe the interactions between types for any given fundamental operation; this has been just about feasible to manage given the historical type system is quite small and ignores a lot of nuances.

With the new split type system, two things quickly became apparent. The first is that the use of templates for things like operator.add would end up with a combinatorial explosion, and the second is that templates aren’t necessarily capable of capturing all the subtleties of the interactions between types. To try and overcome this @stuartarchibald figured out that a combination of recent enhancements to the Numba compiler pipeline meant that Python “protocols” could be used to implement the fundamental operators. For example operator.add could be implemented with a combination of __add__ and __radd__ functions on the types themselves and a prototype was written that did this. From there @kc611 expanded this prototype to fully working code for the new split type system such that operator.add is implemented via __add__ and __radd__ on both the Python and NumPy types. It was at this point things got interesting…

Four important observations came out of this research:

It became very apparent that Numba currently has no restriction over what functions can be overloaded and at the same time has no way of easily extending the type system to work with existing overloads. Imagine you have a new Python type class MyInt(int):…, in Python you can customise e.g. addition by implementing add and radd on the type, in Numba you’d have to somehow augment the overload system to add in information about operator.add for your new type and also accommodate how it interacts with existing types. There would also be no way of reusing the existing implementations etc. We’ve seen these sorts of problems in projects that try to extend and reuse Numba’s ufunc mechanism too.
The amount of effort required to update the Numba code base to accommodate a new type system is very large, it would amount to rewriting most of Numba itself (remember Numba has a reimplementation of both NumPy and quite a lot of Python core language internally, this making up a lot of the code base) . A lot of compiler transforms are type aware too, which means these would need rewriting and adjusting.
Changing the way fundamental operators work and how they and the type system can be extended would likely break a lot of extensions to Numba (and we don’t want to do that).
We’ve been doing a lot of research on technology for a new Numba-like compiler and supporting a more protocol based approach towards extension seems both technologically feasible and very desirable for users/extension writers (essentially, make it work the way it does in Python already, via protocols and inheritance).

The choice we face:

From the above, we face a choice… do we rewrite most of Numba to accommodate NEP-050/NumPy 2.0 and potentially break extensions etc. or do we make Numba support NumPy 2.x in the way most projects do, and as Numba does now, by ignoring certain aspects of the type interaction semantics?

We don’t know the answer to the question above, but the current view of the maintainers is that the advantage of the latter approach is that it doesn’t break existing extension projects and also frees up resources. We’d use these additional resources to continue to research and build a new generation of technology that will be able to accommodate NEP-050 and type system extensions much more readily. Rest assured that either way we’ll continue to update and maintain the Numba code base (a lot of things rely on it, there’s 16+ million downloads a month!).

The outcome:

The plan we had for NumPy 2.x support is paused for now. Within the next 6 months our new technology is likely to be in a better shape so as to help inform the decision. Until then, feedback from the community on this matter is welcomed!

1 Like