Any numba equivalent for casting a raw pointer to a StructRef, Dict, List etc?

DannyWeitekamp · January 4, 2022, 1:11pm

Hey @nelson2005 sorry for the late reply. Currently Travelling, so I might have a more detailed answer in a couple weeks or so, but here is a start.

So the repo you linked is stale. The same stuff is all moved here and is much more worked out (turns out ‘numbert’ was a terrible name for a framework because people immediately think it is related somehow to the BERT language model):

**note the dev branch is the best place to look for now
You should poke through utils.py there are many useful intrinsics in there that can help you craft workarounds. Plus structref.py has some nice shortcuts for making structrefs. Lots of examples of structref usage throughout.

In my own projects I’ve come up with a lot of tricks for keeping different types in the same data structures. There are a few key considerations:

Since your typed Dict or List needs to have an established data type you need to have a way of upcasting to a common type. You can either do this manually (the _cast_structref function I shared previously is one way to do this), or register a upcast (if a type is passed as an argument when no overload exists for it then numba will try valid upcasts for that type). For instance here is a snippet from one of my projects:

# from utils.py
def _obj_cast_codegen(context, builder, val, frmty, toty, incref=True):
    ctor = cgutils.create_struct_proxy(frmty)
    
    dstruct = ctor(context, builder, value=val)
    meminfo = dstruct.meminfo
    if(incref and context.enable_nrt):
        context.nrt.incref(builder, types.MemInfoPointer(types.voidptr), meminfo)

    st = cgutils.create_struct_proxy(toty)(context, builder)
    st.meminfo = meminfo
    
    return st._getvalue()

# In another file... Allow any specialization of MatchIteratorType to be upcast to GenericMatchIteratorType
@lower_cast(MatchIteratorType, GenericMatchIteratorType)
def upcast(context, builder, fromty, toty, val):
    return _obj_cast_codegen(context, builder, val, fromty, toty)

The above function makes it so that if I had a function with signature i8(GenericMatchIteratorType) (perhaps to determine the length of the iterator) then numba won’t try to specialize that function for other MatchIteratorTypes (which might be specialized for various kinds of structref types that I’ve defined). It’s often best in these cases like these to explicitly provide the types to njit so that overloads for more generic types get compiled first.

It is possible to produce a raw pointer for an object as a 64-bit integer (see ‘_raw_ptr_from_struct’) which is useful if you want to keep a pointer to an NRT allocated object in a numpy array, compare pointers, use pointers as dict keys etc… Although this form of a pointer isn’t refcounted so I would recommend not using this as the only reference to an object that you are trying to keep as a member of a structref, since otherwise you’ll need to manually incref/decref the raw pointer, which I wouldn’t recommend since it you’ll spend a lot of time struggling with segfaults and memory leaks. (If you went this route in principle you would want to make a custom deconstructor for your structref to decref any raw pointers. This isn’t currently possible to my knowledge… or at least for now I’m too lazy to try to write an intrinsic to do it.)
You cannot have custom member functions quite in the same way that you do in python, since in a compiled context a method is just a syntactic alias for a hard-coded subroutine. Inside your Dict/List all of your types will be upcast to the same type so they will all share the same statically defined methods as defined with @overload_method. There are two ways around this however:
a) First-class functions are implemented now, so you can implement dynamic methods by having a structref attribute take a FunctionType. I’ve struggled to find a clean way to implement this approach however, since you typically need to pass the function as an argument to the constructor of the structref (or reconstruct the function from its address). If I’m recalling correctly, I haven’t had much luck with assigning functions that are globally defined, at least I doubt that it will cache properly if you care about that.
b) You can keep an attribute that uniquely defines the type of the object, and implement your method statically with if-else statements to pick the correct implementation. Each particular implementation can down-casts the types as needed.
**The above is all especially relevant for implementing hash() and __eq__() for objects that you want to use as dictionary keys. I have an example of this here: Cognitive-Rule-Engine/dynamic_exec.py at dev · DannyWeitekamp/Cognitive-Rule-Engine · GitHub

Keep in mind that all nrt allocated objects have a meminfo that points to their underlying data and counts references to them (when the refcounts hits 0 they are freed). As long as you can keep around a upcasted version of the object, the objects’ meminfo, or the address of the meminfo, then you can recast these back into the original object. So this should give you lots of storage options.

Topic		Replies	Views
How do I dynamically call a function? Support: How do I do ...?	15	264	January 5, 2025
How to specify the type signature of funtion pointers to jitclass member function Support: How do I do ...?	10	949	October 9, 2023
How to build complex data structures with working JIT cache? Support: How do I do ...?	13	373	December 8, 2024
Exposing StructRef method to Python side Numba	14	1300	March 28, 2025
First-class function in structref error when other member exists Support: What is this error message?	3	449	December 16, 2023

Any numba equivalent for casting a raw pointer to a StructRef, Dict, List etc?

Related topics