I am thinking of using Numba to turn a Python algorithm, currently implementing using NumPy and SciPy to something that can run on some embedded hardware, running without requiring Python.
The goals would be:
- Produce machine code for a given target cpu, e.g. arm
- The machine code be standalone (i.e. not require python to run)
- The machine code being performant (at least in terms of memory usage).
- Ideally don’t modify the Python algorithm itself, only provide implementations of NumPy/SciPy functions
My plan to do this is:
- Use numba to output LLVM IR on the target functions
1a. Potentially implement numpy and scipy functions that don’t have support in Numba to be njit-able - Take the IR and compile it using clang
- Write a wrapper C program that calls the compiled function
- Link the wrapper C program, with the compiled bitcode from (2) and the Numba runtime (NRT)
I have a proof of concept working, but wanted to run it by some Numba experts in case there is any glaring problems that I might run in to. Do you think this plan will work?
If this works, I hope to be able to contribute back any additional support for NumPy that I add (and SciPy if useful).
Some specific questions that came up in my prototype, if you do believe this has legs:
- Docs say the Ahead of Time compilation is deprecated - what is planned to replace it?
- What is the motivation for some of the Numba runtime being generated through LLVM IR, rather than simply including them in nrt.cpp (e.g.
NRT_MemInfo_data_fast
)? - Is there a way to get the full LLVM IR for a function and everything it calls (currently using the command line --dump-llvm and then splitting the output into each module).