Any numba equivalent for casting a raw pointer to a StructRef, Dict, List etc?

Moving from here: https://github.com/numba/numba/issues/6493

I’m trying to make a knowledge base class with numba that can utilize a number of user defined types (structrefs or maybe namedtuples). This is of course challenging since numba doesn’t use dynamically typed data structures. I’m hoping to find some way to work around this though.

So my idea is to have my KnowledgeBase just hold untyped pointers to different type specific storage objects and then overload the IO of the knowledgebase so that those pointers get casted to the appropriate types given the input type. For example I might want to declare/assert facts to the knowledgebase in a njitted context.

@njit  
def right_hand_side_of_rule(kb,...):
   ...
   kb.declare('point1',Point(1,2))
   ...

So then to make this work I would have kb.raw_pointers = Dict.empt(unicode_type,i8) and do something like:

@overload(KnowledgeBaseType.declare)
def kb_declare(...):
   #some type stuff is resolved
   typ_str = ....
   storage_object_type = ...

   def impl(...,name,x):
      storage_object = _cast_ptr_to_obj(kb.raw_pointers[typ_str],storage_object_type)
      storage_object.declare(name, x)
   return impl

I noticed that a lot of numba types have a meminfo object. Is there any way to write a _cast_ptr_to_obj(ptr,obj_type) that just takes in the meminfo.data as an integer and pops out a structref (or Dict, List, etc.) that can be used in an njitted function?

From @gmarkall recommendation:

#Assuming an typed structref MyStruct w/ A: i8 and B: unicode_type
from numba.extending import intrinsic
from numba.core import types, cgutils
from numba import njit

@intrinsic
def _struct_from_meminfo(typingctx, struct_type, meminfo):
    inst_type = struct_type.instance_type

    def codegen(context, builder, signature, args):
        _, meminfo = args

        st = cgutils.create_struct_proxy(inst_type)(context, builder)
        st.meminfo = meminfo

        return st._getvalue()

    sig = inst_type(struct_type, types.MemInfoPointer(types.voidptr))
    return sig, codegen


from numba.typed import Dict

@njit
def foo(d):
    meminfo = d[0]
    struct = _struct_from_meminfo(MyStructType,meminfo)
    print(struct.A, struct.B)
    struct.A += 1

s = MyStruct(1,"IT EXISTS")

d = Dict.empty(i8,types.MemInfoPointer(types.voidptr))
d[0] = s._meminfo

print(s.A)
foo(d)
print(s.A, s.B)
foo(d) 

This seems to work, wanted to put it here if others have the same issue. There is some segfaulty weirdness if ._meminfo is passed directly from python and the struct is used a second time, but if you put the meminfos in a Dict first then it seems to work okay. Will update if this bugs out on me down the line.

Glad you got something working @DannyWeitekamp. I think credit goes to @gmarkall for the recommendation in the original issue, thanks @gmarkall, good suggestion! :slight_smile:

Hi, I think the segfault issues are due to the fact you’d need to nrt.incref the meminfo when you store it onto the struct in st.meminfo = meminfo.

Another approach might be to memcopy the value as the implementations for typed.(Dict|List)do. This would work for non-memory managed types as well. I could share a tagged union implementation based on this approach.

Hey @asodeur. Thanks for the tip. That worked perfectly!

st = cgutils.create_struct_proxy(inst_type)(context, builder)
st.meminfo = meminfo
context.nrt.incref(builder, types.MemInfoPointer(types.voidptr), meminfo)

What’s the lifecycle of that incref? Is there any danger of causing a memory leak like this?

I would be very grateful if you could share your tagged union implementation. I have a few use cases that could benefit from something like that.