Using Numba to compile a Python algorithm into target assembly

thk123 · June 23, 2023, 9:42am

I am thinking of using Numba to turn a Python algorithm, currently implementing using NumPy and SciPy to something that can run on some embedded hardware, running without requiring Python.

The goals would be:

Produce machine code for a given target cpu, e.g. arm
The machine code be standalone (i.e. not require python to run)
The machine code being performant (at least in terms of memory usage).
Ideally don’t modify the Python algorithm itself, only provide implementations of NumPy/SciPy functions

My plan to do this is:

Use numba to output LLVM IR on the target functions
1a. Potentially implement numpy and scipy functions that don’t have support in Numba to be njit-able
Take the IR and compile it using clang
Write a wrapper C program that calls the compiled function
Link the wrapper C program, with the compiled bitcode from (2) and the Numba runtime (NRT)

I have a proof of concept working, but wanted to run it by some Numba experts in case there is any glaring problems that I might run in to. Do you think this plan will work?

If this works, I hope to be able to contribute back any additional support for NumPy that I add (and SciPy if useful).

Some specific questions that came up in my prototype, if you do believe this has legs:

Docs say the Ahead of Time compilation is deprecated - what is planned to replace it?
What is the motivation for some of the Numba runtime being generated through LLVM IR, rather than simply including them in nrt.cpp (e.g. NRT_MemInfo_data_fast)?
Is there a way to get the full LLVM IR for a function and everything it calls (currently using the command line --dump-llvm and then splitting the output into each module).

guilherme · June 26, 2023, 3:18pm

Hi @thk123,

I maintain a tool called RBC that does something along the lines of what you described. It also ships with a partial implementation of the Numba Runtime written in LLVM IR.

AOT will be replaced by PIXIE

What is the motivation for some of the Numba runtime being generated through LLVM IR, rather than simply including them in nrt.cpp (e.g. NRT_MemInfo_data_fast)?

I guess it is because LLVM can inline this function.

Is there a way to get the full LLVM IR for a function and everything it calls (currently using the command line --dump-llvm and then splitting the output into each module).

Programatically, yes!

from numba import njit

@njit('int32(int32)')
def incr(a):
    return a + 1

print(incr.inspect_llvm(incr.signatures[0]))

Topic		Replies	Views
Numba for microcontroller such as Cortex-M Community Support	6	64	January 24, 2025
Possible to generate "standalone" C callback LLVM IR? llvmlite	2	205	December 22, 2023
Contributing to Numba with no compiler or LLVM experience Development	11	1821	January 11, 2025
Compile without compiler Support: How do I do ...?	6	799	July 27, 2021
How to execute Numba generated IR? Support: How do I do ...?	0	53	June 30, 2024

Using Numba to compile a Python algorithm into target assembly

Related topics