Numba vision and the problem of compiling Python

hi everyone,

In yesterday’s discussion on the vision of Numba, towards the end we were talking about the relationship between Numba and Python.
I think we agreed that, to avoid excessive abstraction and generality, Numba is a project solely focused on Python. Llvmlite, however, it’s more general.
After that we were talking about Python semantics not supported by Numba, and I wonder if there was broad agreement on this point. Since it’s important for the vision, I thought I would bring it up here in case anyone wants to share their thoughts in an async way.

Numba aims to be a high-performance compiler and there was also agreement that Python is too dynamic to be able to keep all semantics and produce high-performance code. That means that Numba can only efficiently compile a subset of Python. Open to discussion is what to do with the rest of the language: should be compiled in some kind of “slow mode” or not compiled at all? should the vision say anything about that?

Luk

1 Like

There are dynamic features of Python that are very understandably never going to be supported by Numba (e.g., exec, eval). What I find a little annoying is that if I have a class (particularly one in another package) that is very straightforward and I want to use it in Numba then that isn’t plug-n-play. The dict representation of objects is perhaps one of those dynamic features that we don’t want to support and people don’t in general use __slots __ but we can do best effort given the dict representation and discover the fields of the objects and collect the types and then dynamically wrap it with something like the existing jitclass. If the code we want to use stays in Python then all this seems doable but if it goes to C then we get the unenviable position of having to reimplement the whole API by hand and miss a bunch of it like we do with NumPy API.

I am happy with having to partly rewrite some code to make it more palatable to numba. Compiling to “slow mode” creates a bad user experience in my opinion: sure, compilation doesn’t give any errors, but the code isn’t any faster (or is even slower!). On the other hand, an “unforgiving” compilation mode is a bit more challenging to get write, but the benefits are much higher. Since I started spreading the word about numba for real I have recommended everyone to set nopython=True no matter what (or use numba.njit directly of course). I got the impression at some point that object mode would be deprecated?

I was happy even when I had to initialize arrays outside of JITted functions. Seemed a bit FORTRANesque, and forced me to rewrite more code? Sure, but it was not that hard, and the benefit was immediate. In that sense, when numba supported initializing memory inside JITted functions, I saw it as a win, but at the same time, I’m sure it has increased the complexity of the code and made maintenance more difficult.

I get that there are several projects to “compile Python” or generally make it faster and it’s tough for users to pick one of them. But I think it’s the price to pay for having a dynamic language that, in addition, is evolving faster than ever. Setting reasonable boundaries and expectations around that would make numba more powerful, not less (by focusing development time on making it even more awesome for the use cases that are already supported, rather than trying to please everyone, everywhere).