Compile without compiler

I want to compile functions ahead of time to avoid compilation times, but I do not want to install a compiler. The bundled llvm should be sufficient since it also works for JIT.

There is a StackOverflow answer by @jpivarski llvm - Marshaling object code for a Numba function - Stack Overflow which does exactly that, but I can only get half of it to work.

Compiling a function with scalar parameters to a byte array, then loading it again with llvm and calling it as a C-function works perfectly. However, I would also like to use NumPy Arrays and I do not know the API for that. On Stackoverflow, it was suggested to call the cpython wrapper, which segfaults for me. Maybe the API changed since then?

This is the code so far which is mostly due to @jpivarski and @stuartarchibald so thanks to both :slight_smile:

from numba import njit
import numpy as np
import ctypes
import llvmlite.binding as llvm


def compile_function_to_bytes():
    # Function to compile
    @njit
    def foo(x):
        return x * 2

    # Trigger JIT
    foo(12)

    # Find function signature, look it up in JITed library and return bytes
    sig = foo.signatures[0]
    lib = foo.overloads[sig].library
    bytes = lib._get_compiled_object()
    cfunc_name = foo.overloads[sig].fndesc.llvm_cfunc_wrapper_name
    cpython_name = foo.overloads[sig].fndesc.llvm_cpython_wrapper_name

    return bytes, cfunc_name, cpython_name

def main():
    bytes, cfunc_name, cpython_name = compile_function_to_bytes()

    # Initialize llvm
    llvm.initialize()
    llvm.initialize_native_target()
    llvm.initialize_native_asmprinter()

    def create_execution_engine():
        target = llvm.Target.from_default_triple()
        target_machine = target.create_target_machine()
        backing_mod = llvm.parse_assembly("")
        engine = llvm.create_mcjit_compiler(backing_mod, target_machine)
        return engine

    # Load bytes into llvm and retrieve cfunc from it
    obj = llvm.ObjectFileRef.from_data(bytes)
    engine = create_execution_engine()
    engine.add_object_file(obj)
    cfunc_ptr = engine.get_function_address(cfunc_name)
    cpython_ptr = engine.get_function_address(cpython_name)

    # Convert pointer to cfunc
    foo_cfunc = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_int)(cfunc_ptr)

    result = foo_cfunc(100)

    print("Calling as cfunc. This should work!")
    print("result:", result)

    class PyTypeObject(ctypes.Structure):
        _fields_ = ("ob_refcnt", ctypes.c_int), ("ob_type", ctypes.c_void_p), ("ob_size", ctypes.c_int), ("tp_name", ctypes.c_char_p)

    class PyObject(ctypes.Structure):
        _fields_ = ("ob_refcnt", ctypes.c_int), ("ob_type", ctypes.POINTER(PyTypeObject))

    PyObjectPtr = ctypes.POINTER(PyObject)

    foo_cpython = ctypes.CFUNCTYPE(PyObjectPtr, PyObjectPtr, PyObjectPtr, PyObjectPtr)(cpython_ptr)

    # Assuming this API:
    # https://docs.python.org/3/c-api/structures.html#c.PyCFunctionWithKeywords
    def foo_wrapped(*args, **kwargs):
        closure = ()
        return foo_cpython(
            ctypes.cast(id(closure), PyObjectPtr),
            ctypes.cast(id(args), PyObjectPtr),
            ctypes.cast(id(kwargs), PyObjectPtr))

    print("Calling as cpython wrapper. This does not work yet.")
    result = foo_wrapped(100) # <------- Segfault here

    print("result:", result)


main()

This crashes when calling foo_wrapped. Here is a stacktrace (on WSL2, crashes when calling the wrapper on other OSes):

Calling as cfunc. This should work!
result: 200
Calling as cpython wrapper. This does not work yet.

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x000000000055d33c in PyErr_SetString ()
(gdb) bt
#0  0x000000000055d33c in PyErr_SetString ()
#1  0x00007ffff75b510b in cpython::__main__::compile_function_to_bytes::$3clocals$3e::foo$241(long long) ()
#2  0x00007ffff7474ff5 in ?? () from /lib/x86_64-linux-gnu/libffi.so.7
#3  0x00007ffff747440a in ?? () from /lib/x86_64-linux-gnu/libffi.so.7
#4  0x00007ffff7792316 in _ctypes_callproc ()
   from /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so
#5  0x00007ffff7792af7 in ?? () from /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so
#6  0x00000000005f3010 in _PyObject_MakeTpCall ()
#7  0x000000000056fd36 in _PyEval_EvalFrameDefault ()
#8  0x0000000000568d9a in _PyEval_EvalCodeWithName ()
#9  0x00000000005f5b33 in _PyFunction_Vectorcall ()
#10 0x000000000056aadf in _PyEval_EvalFrameDefault ()
#11 0x0000000000568d9a in _PyEval_EvalCodeWithName ()
#12 0x00000000005f5b33 in _PyFunction_Vectorcall ()
#13 0x000000000056aadf in _PyEval_EvalFrameDefault ()
#14 0x0000000000568d9a in _PyEval_EvalCodeWithName ()
#15 0x000000000068cdc7 in PyEval_EvalCode ()
#16 0x000000000067e161 in ?? ()
#17 0x000000000067e1df in ?? ()
#18 0x000000000067e281 in ?? ()
#19 0x000000000067e627 in PyRun_SimpleFileExFlags ()
#20 0x00000000006b6e62 in Py_RunMain ()
#21 0x00000000006b71ed in Py_BytesMain ()
#22 0x00007ffff7ded0b3 in __libc_start_main (main=0x4ef190 <main>, argc=2, argv=0x7fffffffe248,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe238)
    at ../csu/libc-start.c:308
#23 0x00000000005f96de in _start ()

hi @99991 , how does your use case differ from this one Compiling code ahead of time — Numba 0.52.0.dev0+274.g626b40e-py3.7-linux-x86_64.egg documentation?
It seems easier to use Numba’s AOT functionality, but you might be trying to do something that AOT can’t?

@luk-f-a To my knowledge, AOT compilation as in this example requires a compiler. On Windows, this is a no-go because users will have to download more than 4 GB of Visual Studio and click through various menus, which negatively impacts the adaptation of our library.

Point 4 of the limitation section is also relevant (code not optimized for CPU architecture), although not quite as pressing. Alternatively, we could compile code for every platform, but that would be a maintenance nightmare.

The requirements are:

  • Easy to install (AOT and cython fail here due to compiler requirement).
  • Fast to import (@jit and even jit(cache=True) with signature fail here because of several seconds startup time)
  • Fast to execute (AOT currently does not support parallelism)
  • Easy to maintain (too many of the issues posted to our repository are related to Numba AOT compilation)

@99991 why do you think Numba requires an external compiler for AOT instead of using the bundled LLVM which Numba uses for JIT compilation? I haven’t looked at the code so I cannot be 100% sure that it doesn’t, but I cannot find any reference in the documentation about requiring an external compiler, and it sounds odd that a project that bundles a compiler would require a second one to be installed.

How stable are the types in your application? Since you are looking to compile ahead of time, they must be, otherwise it would be impossible. You could, as part of the package installation, run all your functions with cache=True with representative inputs of the correct type. That would populate the cache at installation time, and then the users would not have to wait at startup time.
Alternatively, if you don’t want to compile at installation time, you could distribute the cache Notes on Caching — Numba 0.55.0.dev0+69.g6ecb81577.dirty-py3.7-linux-x86_64.egg documentation

I’m not saying these are the solutions to your use case, I’m just laying out some options because personally I would try to avoid going the route you described in the original post.

cheers,
Luk

@luk-f-a:

@99991 why do you think Numba requires an external compiler for AOT instead of using the bundled LLVM which Numba uses for JIT compilation? I haven’t looked at the code so I cannot be 100% sure that it doesn’t, but I cannot find any reference in the documentation about requiring an external compiler

The reference is a bit hidden:
Installation — Numba 0.50.1 documentation

Optional runtime are: […]

  • Compiler toolchain mentioned above, if you would like to use pycc for Ahead-of-Time (AOT) compilation

This comment by @sklam is even clearer:
Compiler not found: /tmp/tmp bug · Issue #7218 · numba/numba · GitHub

You’ll need a compiler in your system to use the AOT support.

@luk-f-a:

it sounds odd that a project that bundles a compiler would require a second one to be installed.

I agree. @jpivarski found a way around it, but it does not work anymore for Numpy data types (or at least I can’t get it to work). llvm - Marshaling object code for a Numba function - Stack Overflow It works fine for basic data types, so this seems like a promising alternative at the moment.

How stable are the types in your application? Since you are looking to compile ahead of time, they must be, otherwise it would be impossible. You could, as part of the package installation, run all your functions with cache=True with representative inputs of the correct type. That would populate the cache at installation time, and then the users would not have to wait at startup time.

Just import numba alone takes almost one second. We used cache=True in a previous version of our library, but there was still considerable overhead, which added up to several seconds for just a few functions.

We then switched to AOT where import numba is not necessary after initial compilation, so startup is much faster (excluding the few hours it takes to install a compiler on Windows and various Github issues where users are met with weird error messages or have trouble installing compilers).

@luk-f-a:

I’m not saying these are the solutions to your use case, I’m just laying out some options because personally I would try to avoid going the route you described in the original post.

Thanks for the advice. In fact, those were exactly the options we evaluated. Unfortunately, we could not find a completely satisfactory solution yet.

@99991 thanks for the details!!! I definitely learned a lot :slight_smile: I hope I didn’t waste your time, I haven’t had the need to do what you’re trying to do but I’m interested to see how this end, whether you manage to get something AOT compiled using LLVM.
It seems that you have more demanding users than I do, in my framework 1 second import time would not be noticed :slight_smile:

Good luck!
Luk

Just a quick note: in 0.54 various side-effect of importing Numba have been removed, so this should be a lot less in future.