Using Numba to compile a Python algorithm into target assembly

I am thinking of using Numba to turn a Python algorithm, currently implementing using NumPy and SciPy to something that can run on some embedded hardware, running without requiring Python.

The goals would be:

  • Produce machine code for a given target cpu, e.g. arm
  • The machine code be standalone (i.e. not require python to run)
  • The machine code being performant (at least in terms of memory usage).
  • Ideally don’t modify the Python algorithm itself, only provide implementations of NumPy/SciPy functions

My plan to do this is:

  1. Use numba to output LLVM IR on the target functions
    1a. Potentially implement numpy and scipy functions that don’t have support in Numba to be njit-able
  2. Take the IR and compile it using clang
  3. Write a wrapper C program that calls the compiled function
  4. Link the wrapper C program, with the compiled bitcode from (2) and the Numba runtime (NRT)

I have a proof of concept working, but wanted to run it by some Numba experts in case there is any glaring problems that I might run in to. Do you think this plan will work?

If this works, I hope to be able to contribute back any additional support for NumPy that I add (and SciPy if useful).

Some specific questions that came up in my prototype, if you do believe this has legs:

  • Docs say the Ahead of Time compilation is deprecated - what is planned to replace it?
  • What is the motivation for some of the Numba runtime being generated through LLVM IR, rather than simply including them in nrt.cpp (e.g. NRT_MemInfo_data_fast)?
  • Is there a way to get the full LLVM IR for a function and everything it calls (currently using the command line --dump-llvm and then splitting the output into each module).
1 Like

Hi @thk123,

I maintain a tool called RBC that does something along the lines of what you described. It also ships with a partial implementation of the Numba Runtime written in LLVM IR.

AOT will be replaced by PIXIE

What is the motivation for some of the Numba runtime being generated through LLVM IR, rather than simply including them in nrt.cpp (e.g. NRT_MemInfo_data_fast)?

I guess it is because LLVM can inline this function.

Is there a way to get the full LLVM IR for a function and everything it calls (currently using the command line --dump-llvm and then splitting the output into each module).

Programatically, yes!

from numba import njit

@njit('int32(int32)')
def incr(a):
    return a + 1

print(incr.inspect_llvm(incr.signatures[0]))