Proposal: Numba 2023 MVP

As a follow up to the 2023H1 Development focus document, we presented a MVP for the new components in today’s public meeting. We are proposing to complete the described MVP this year. Please see details in the linked document here:

Your feedbacks are welcomed.

7 Likes

This all sounds great! Very excited to see where this plan is headed. I have a few questions:

  1. Could this be a path for improving the performance of typed List and Dict which currently have C implementations that are called from precompiled libraries but aren’t directly inlined?

  2. How much would this improve compile times? A simple printout of every compile call in dispatcher.py leads me to believe that there is currently quite a bit of redundant compilation in numba. Would this allow for more code reuse? Like if List[int64].getitem() is called in one function could I expect the compilation of that subroutine in another function to be streamlined?

  3. Is there any possibility that the new PIXIE format could be used to build WebAssembly packages?

3 Likes

I will ask @stuartarchibald to answer 1 and 3 as he is working on the PIXIE part.

As for the compile time improvement question, you are right that numba currently does a lot of redundant compilation. It is a result of the typing phase directly triggering compilation during type inference. This is one of the flaw that we want to address when we redo the compiler pipeline.

2 Likes

Thanks for your interest @DannyWeitekamp. Answering 1. and 3.

Yes, hopefully! Also the Numba runtime, which would help with things like allocation and reference counted types on e.g. the CUDA target.

Maybe. I’ve been thinking a little about making various parts of the compiler extensible, including the toolchain for linking, and the “instructions” that can be packaged into the format. I’ll definitely keep this case in mind and try and not do anything to preclude it.

1 Like

Awesome! Thank you both for your answers @sklam @stuartarchibald! It sounds like this proposal is going to improve things on all fronts—I have even more to look forward to than I initially thought.

To lend some context to my interest in the WebAssembly use case, it would be incredibly powerful to be able to write things in Python and run them with C-like speed in the browser. Of course one can write things in C/C++ and compile with emscripten, but at the cost of managing a great deal of boilerplate code, and much less concise, easy to interpret code in general. In an academic setting the conciseness and ease of use of Python is very important. It makes collaboration and the onboarding of new graduate students much easier. I’ve seen all too many academic projects fizzle out because their FORTRAN, LISP, Java or C/C++ codebases were too much for the next grad student to handle. In my own field Human-Computer Interaction, we have the added concern of building interfaces for our tools, which of course opens a whole new can of worms with the frontend JavaScript bits not being interoperable with everything else we write. Being able to run our Python sourced tools efficiently (by compiling them with numba) and also directly in the browser would simplify things astronomically, and eliminate the additional headaches involved with hosting and maintaining a server layer for tools that interact with the client at the transaction level. For instance, in my own projects we want to run expert-system like programs (i.e. production rules) in the browser to use in educational technology. Being able to run a numba compiled rule engine in the browser would simplify maintaining the source, while enabling us to serve millions of students with a simple CDN instead of handling every student transaction through a custom server setup.

1 Like

We presented the following slides for the Numba 2023 MVP today during Compiler Research Group’s CaaS Monthly Meeting.

Link to slides: compiler_research_org_presentation_combined_slides.pdf - Google Drive

2 Likes

Where the slides say “packager pays the compilation time to generate a library targeting potentially multiple ISAs.”, does that mean the generated PIXEL library is platform agnostic or do we still have to build a bunch of platform specific binary distributions (wheel files)?

@sklam I’m curious where this feature now falls on the priority list. I understand that numpy 2.0 support has taken priority recently. I’m wondering if I should hold out hope for these new AOT features being available in the next few months.

2 Likes

Due to ongoing efforts around NumPy 2.0, requests for newer LLVM versions, and Python 3.13 support, I cannot say for certain about the exact timelines for new AOT features. But there are active development:

There will be a talk at SciPy 2024 (PIXIE: Blending Just-in-time and Ahead-of-time compilation in scientific Python applications :: SciPy 2024 :: pretalx) that will showcase PIXIE. @stuartarchibald has been refactoring the PIXIE codebase to get it ready for more eyes. The features described in the PIXIE MVP document will be highlighted in the talk. In parallel, @esc is developing a Python AST-based front-end in numba-rvsdg to prepare the AST for a RVSDG-based middle-end. Over the summer, I’ll be experimenting with a RVSDG-based middle-end and looking into leveraging EGraph/EqSat (See egglog — egglog Python documentation for ideas).

2 Likes