Making C functions available to the cgutils.get_or_insert_function intrinsic?

There are quite a few places inside numba intrinsics where (I assume) a C function is being is directly linked into the compiled LLVM. For instance:

fn = cgutils.get_or_insert_function(mod, fnty, "NRT_MemInfo_alloc_dtor_safe")

It’s easy enough to see that “NRT_MemInfo_alloc_dtor_safe” is defined in nrt.c. By why is this particular method visible to cgutils.get_or_insert_function()? Could I make functions from a custom C library visible to this method? I’ve had some success with calling C functions inside njitted coded via cffi, but the cffi approach has the drawback of not being cache-able since it seems to compile an external module and then expose the address of the target function to numba instead of compiling it into the calling jitted function.

(Caveat - I may be remembering incorrectly here, so let me know if this appears not to be correct)

It is because the symbol has been added to LLVM’s JIT: numba/nrt.py at 468647dddde27ee8af124c97dfcd20c35c4a2bc6 · numba/numba · GitHub

The dict of addresses of these functions is initialised by a C extension: numba/_nrt_pythonmod.c at 468647dddde27ee8af124c97dfcd20c35c4a2bc6 · numba/numba · GitHub

So, if you can get the address of a function, you can make it available for get_or_insert_function() by calling ll.add_symbol(fname, address)

Wow that worked like a charm thanks! Just to share for others that might find this useful (@nelson2005) I was able to implement the following for my package called ‘cre’.

setup.py :

from setuptools import setup, find_packages, Extension 
def get_ext_modules():
    import numba
    numba_path = numba.extending.include_path()
    cre_c_funcs = Extension(
        name='cre_cfuncs', 
        sources=['cre/cfuncs/cre_cfuncs.c'],
        include_dirs=[numba_path]
    )
    return [cre_c_funcs]

setup(
...
ext_modules = get_ext_modules()
...
)

A function meminfo_copy_unsafe() defined in cre/cfuncs/cre_cfuncs.c :

#include <stdio.h>
#include <Python.h>
#include "numba/core/runtime/nrt_external.h"
#include "numba/core/runtime/nrt.h"

/* MemInfo is not exposed by nrt.h so we need to redefine it if we want to use it. */
struct MemInfo {
    size_t            refct;
    NRT_dtor_function dtor;
    void              *dtor_info;
    void              *data;
    size_t            size;    /* only used for NRT allocated memory */
    NRT_ExternalAllocator *external_allocator;
};

// Copies a meminfo and the data it points to
NRT_MemInfo* meminfo_copy_unsafe(NRT_api_functions *nrt, NRT_MemInfo *mi) {
    struct MemInfo* new_mi;
    struct MemInfo* old_mi;
    if(mi){
        old_mi = (struct MemInfo*) mi;
        new_mi = (struct MemInfo*) nrt->allocate(mi->size);

        memcpy(new_mi->data, mi->data, mi->size);
        //Copy everything except refct and data
        new_mi->refct = 1;
        new_mi->dtor = old_mi->dtor;
        new_mi->dtor_info = old_mi->dtor_info;
        new_mi->external_allocator = old_mi->external_allocator;
        return (NRT_MemInfo*) new_mi;
        
    }else{
        return NULL;
    }
}

/*** START : ext_methods  ***/
static PyMethodDef ext_methods[] = {
#define declmethod(func) { #func , ( PyCFunction )func , METH_VARARGS , NULL }
#define declmethod_noargs(func) { #func , ( PyCFunction )func , METH_NOARGS, NULL }
    declmethod(meminfo_copy_unsafe),
    { NULL },
#undef declmethod
};
/*** END : ext_methods  ***/

/*** START : build_c_helpers_dict() ***/
static PyObject *
build_c_helpers_dict(void)
{
    PyObject *dct = PyDict_New();
    if (dct == NULL)
        goto error;

#define _declpointer(name, value) do {                 \
    PyObject *o = PyLong_FromVoidPtr(value);           \
    if (o == NULL) goto error;                         \
    if (PyDict_SetItemString(dct, name, o)) {          \
        Py_DECREF(o);                                  \
        goto error;                                    \
    }                                                  \
    Py_DECREF(o);                                      \
} while (0)

#define declmethod(func) _declpointer(#func, &NRT_##func)
#define declmethod_internal(func) _declpointer(#func, &func)

declmethod_internal(meminfo_copy_unsafe);

#undef declmethod
#undef declmethod_internal
    return dct;
error:
    Py_XDECREF(dct);
    return NULL;
}
/*** END : build_c_helpers_dict() ***/


// Module Definition struct
static struct PyModuleDef cre_cfuncs = {
    PyModuleDef_HEAD_INIT,
    "cre_cfuncs",
    "Test Module",
    -1,
    ext_methods
};

// Initializes module using above struct
PyMODINIT_FUNC PyInit_cre_cfuncs(void)
{
    PyObject *m = PyModule_Create(&cre_cfuncs);
    PyModule_AddObject(m, "c_helpers", build_c_helpers_dict());
    return m;
}

Then after compiling via pip install -e . I can expose my c function to llvmlite and call it with an intrinsic

import numba
from llvmlite import ir
from llvmlite import binding as ll
from numba.core import types, cgutils, errors
from numba import njit, i8, generated_jit
from numba.extending import intrinsic
from numba.core.imputils import impl_ret_borrowed
import cre_cfuncs
# Make methods in cre_cfuncs.c_helpers available to LLVM
for name, c_addr in cre_cfuncs.c_helpers.items():
    ll.add_symbol(name, c_addr)

def _meminfo_copy_unsafe(builder, nrt, meminfo):
    mod = builder.module
    fnty = ir.FunctionType(cgutils.voidptr_t, [cgutils.voidptr_t, cgutils.voidptr_t])
    fn = cgutils.get_or_insert_function(mod, fnty, "meminfo_copy_unsafe")
    fn.return_value.add_attribute("noalias")
    return builder.call(fn, [builder.bitcast(nrt, cgutils.voidptr_t), builder.bitcast(meminfo, cgutils.voidptr_t)])

@intrinsic
def _memcpy_structref(typingctx, inst_type):    
    def codegen(context, builder, signature, args):
        val = args[0]
        ctor = cgutils.create_struct_proxy(inst_type)
    
        dstruct = ctor(context, builder, value=val)
        meminfo = dstruct.meminfo
        nrt = context.nrt.get_nrt_api(builder)
        new_meminfo = _meminfo_copy_unsafe(builder, nrt, meminfo)

        inst_struct = context.make_helper(builder, inst_type)
        inst_struct.meminfo = new_meminfo

        return impl_ret_borrowed(context, builder, inst_type, inst_struct._getvalue())

    sig = inst_type(inst_type)
    return sig, codegen

## USAGE:
from cre.fact import define_fact
# A shortcut for defining structrefs for CRE
BOOP = define_fact("BOOP", {"A":i8,"B":i8})

@njit(cache=True)
def foo():
    a = BOOP(1,2)
    b = _memcpy_structref(a)
    a.A,a.B = 7,7
    print(a, b)

foo()
# >> BOOP(A=7, B=7) BOOP(A=1, B=2)

The nice thing about this approach is that the C function gets compiled at package installation. Combined with cache=True nothing needs to be recompiled between executions.

1 Like

This is really cool, and way above my pay grade. I did watch it with interest when you first posted :slight_smile:

There had been some previous discussion about packing structrefs into contiguous memory… does this kit open any possibilities there?

Sorry this is going to be a bit of a stream of consciousness post…

I imagine this trick would enable allocating structrefs into contiguous memory, since you can do whatever you want at the C level. There would certainly be a few tricky things to work out though. For instance you would probably want to incref the structref an extra time when it gets instantiated into your contiguous buffer because if it ever hits a refcount of 0 it will try to free a chunk of memory that it doesn’t own (I’m not actually sure what would happen in that case… probably some segfaulty error). But keeping around this extra reference would make elements in the buffer a bit of a memory leak risk. Their deconstructors would never get called and so they would keep an active reference to all of their members. This isn’t the end of the world if you use it conservatively.

The most python friendly implementation would have the buffer itself be a refcounted object as well. One option is that every time an object is instantiated from the buffer it acquires a reference to the buffer, but this would cause a reference loop… which is a no-no because the NRT doesn’t have a cycle detecting garbage collector.

Its a bit tricky to think of a clean setup that doesn’t involve modifying the NRT a bit. But I bet it could work if the user is forced to explicitly destroy the buffer, and suffer the consequences if in doing so they deallocate the data for an object that should still be active.

1 Like