I am testing a possible migration away from jitclass to structref so that I can take advantage of caching and AOT compilation. I’m wondering if this makes sense from a Numba roadmap perspective? Is there a plan for jitclass to support caching? Is structref here to stay?
Also, in my testing of structref I’ve encountered an issue that I’m not sure how to solve. I’ve managed to expose methods that are accessible in jit’d functions via @overload_method, but I can’t figure out how to expose them to the Python side?
What you have looks pretty reasonable to me. @DannyWeitekamp’s CRE contains an absolute trove of great examples, including a structref generator that I used as a starting point for my own generator.
Yes, looking around CRE is definitely a good way to find good patterns for using structref, many others have found it helpful. The devs can speak to their long-term plans for structref / jitclass, which I’m eager to hear, but the current state of things leaves something to be desired. Structref pretty much gives you free reign in terms of customizing functionality on both the python and jit side, but requires contending with an annoying amount of boiler plate that isn’t well documented, something that @nelson2005 and I have worked around in our own projects with specialized structref definition machinery. Jitclass has a bit of a friendlier syntax, but isn’t AOT or cache compatible which is a complete deal-breaker in larger projects. I’d love to see a solution that captures the best of both—or at least has the properties of being easy to pick up, but not limited in terms of customization, and AOT/cache friendliness.
A couple of conceptual notes to keep in mind going forward which I think the docs really ought to cover, because I’ve found the way the docs encourage you to use structref a bit limiting.
There are two pieces you need to define: 1) the subclass to types.StructRef (MyClassType in your case) is like a TypeClass or TypeTemplate, basically a meta-class for instantiating grounded types with fixed fields. You can often use this directly when writing @overload functions 2) The subclass of structref.StructRefProxy, which is the class for python side “Proxy” objects.
When you call define_proxy you are 1) internally calling define_constructor() which let’s your Proxy work as a constructor, and 2) calling define_boxing() which defines how your Proxy can be converted to a numba structref and vis versa. In many places in CRE I call one or both of these directly because I don’t need both, or want to customize one or the other.
The docs don’t actually cover how to specialize a TypeClass into a grounded type, but doing so can save you a lot of headaches, since you can control the data layout of the type instead of having numba infer it from a constructor call, which can have you scratching your head when it mysteriously generates multiple struct types when you thought you just had one. Basically just pass the fields to it in the same format you would use to define a jitclass: MyType = MyTypeClass([ ('name', unicode_type), ('x', f8), ('y', f8) ] )
Then you can have a bit more control, as in jitclass, over the precise types you’re using, and can even write constructors that are a bit more verbose. For instance, here is a jitted constructor that I wrote in CRE:
@njit(GenericFactIteratorType(MemSetType,i8[::1]),cache=True)
def generic_fact_iterator_ctor(ms, t_ids):
st = new(GenericFactIteratorType)
st.memset = ms
st.t_ids = t_ids
st.curr_ind = 0
st.curr_t_id_ind = 0
return st
Note the use of new which is from numba.experimental.structref. It makes an empty structref instance. This method allows us to be explicit about the signature (helpful for AOT) since the output type is well defined, and we can cache the jitted constructor. A point of warning: if we didn’t set all of the member types in this constructor then we could in principle dereference an empty object field which would cause a null pointer exception (which you would see as a segfault). This is why the docs don’t encourage this usage pattern.
One last note. I don’t know if @nelson2005 has found a better way, but I’ve found that in practice the only reliable way to cache your structref definitions is to always have them written to a file so the source is always well defined. So if you were planning on making some kind of custom structref generator, and you want that code to cache or AOT properly, you’ll also need some kind of file cache machinery so that your generated code always lives somewhere concrete. Feel free to snag what I have in CRE for that. Sorry all of this isn’t simpler! I’d really love to see some more attention given to this in the future. The good news is all of the pieces you’ll need exist if you’re willing to piece them together. (I think this covers 2,3,4 on your list but maybe not 1)
I haven’t found a better way but it hasn’t really been an issue for my use case. My generated structref definitions aren’t particularly dynamic so I can generate them once and then keep them in the git repo like any other file.
Ok, so trying out the method you suggested for concrete types but for some reason I can’t use the constructor on the Python side…what am I doing wrong?
@structref.register
class MyClassType(types.StructRef):
pass
MyType = MyClassType([("x",f8[:])])
@njit(MyType(f8[:]))
def my_class_ctor(x):
st = structref.new(MyType)
st.x = x
return st
class MyClass(structref.StructRefProxy):
def __new__(cls, x):
self = my_class_ctor(x)
return self
@property
def x(self):
return _x(self)
@njit(cache=True)
def _x(self):
return self.x
@overload(MyClass)
def overload_MyClass(x):
def impl(x):
return my_class_ctor(x)
return impl
structref.define_boxing(MyClassType, MyClass)
# This works...
@njit
def test():
my_instance = MyClass(np.array([1.0, 2.0, 3.0]))
print(my_instance.x) # prints [1. 2. 3.]
test()
# But this gets:
# TypeError: cannot convert native
# numba.MyClassType(('x', array(float64, 1d, A)),) to Python object
my_instance = MyClass(np.array([1.0, 2.0, 3.0]))
If you are going to give your _ctor() function an explicit signature then you need to move it to after the call to define_boxing (it will still work fine in __new__() if you do that). The reason is that the boxing/unboxing machinery is compiled directly into the function at compile time so it needs to be set up before the _ctor() function is compiled. When you provide a signature the function it is compiled with that signature at definition instead of just-in-time.
I’m still working on it, but here is what I have so far that works.
import numpy as np
from numba.pycc import CC
from numba import njit, f8
from numba.experimental import structref
from numba.core.extending import overload_method, overload
# This is boilerplate that can be imported from elsewhere and re-used
def create_type_template(cls):
source = f"""
from numba.core import types
from numba.experimental import structref
@structref.register
class {cls.__name__}Type(types.StructRef):
pass
"""
glbs = globals()
exec(source, glbs)
return glbs[f"{cls.__name__}Type"]
# Create a Python class that will be a proxy for the Numba class - the actual
# implementation is not defined here.
class MyClass(structref.StructRefProxy):
def __new__(cls, x):
self = my_class_constructor(x)
return self
@property
def x(self):
return _x(self)
@property
def y(self):
return _y(self)
def acc(self, a=1):
return _acc(self, a)
# Create a type for the class and define Numba-to-Python interfacing (boxing)
MyClassTemplate = create_type_template(MyClass)
MyType = MyClassTemplate([("x", f8[:]), ("y", f8[:])])
structref.define_boxing(MyClassTemplate, MyClass)
# Define the typed constructor implementation
@njit(MyType(f8[:]), cache=True)
def my_class_constructor(x):
self = structref.new(MyType)
self.x = x
self.y = 2 * x
return self
# Overload the Python constructor with our Numba implementation
@overload(MyClass)
def overload_MyClass(x):
def implementation(x):
return my_class_constructor(x)
return implementation
# Implementations of getters/setters/methods
@njit(cache=True)
def _x(self):
return self.x
@njit(cache=True)
def _y(self):
return self.y
@njit(cache=True)
def _acc(self, a=1):
return self.x.sum() + a + self.y
# Extra step required for methods to expose in jit-code
@njit(cache=True)
@overload_method(MyClassTemplate, "acc")
def overload_acc(self, a=1):
python_implementation = _acc.py_func
return python_implementation
# Test:
# running this script prints
# [12. 14. 16.]
# [2. 4. 6.]
# [1. 2. 3.]
# [11. 13. 15.]
# [12. 14. 16.]
# [2. 4. 6.]
# [1. 2. 3.]
# [11. 13. 15.]
# AOT compiling
# [12. 14. 16.]
if __name__ == "__main__":
# Create a function that operates on an instance of our new class and
# ahead-of-time (AOT) and JIT compile it
cc = CC("my_module")
@njit(f8[:](MyType), cache=True)
@cc.export("add", f8[:](MyType))
def add(instance):
return instance.acc(4)
# Create a an instance of MyClass and test it in Python
my_instance = MyClass(np.array([1.0, 2.0, 3.0]))
print(add(my_instance))
print(my_instance.y)
print(my_instance.x)
print(my_instance.acc(3))
# Test in jit-code
@njit(cache=True)
def test_jit_code():
my_instance = MyClass(np.array([1.0, 2.0, 3.0]))
print(add(my_instance))
print(my_instance.y)
print(my_instance.x)
print(my_instance.acc(3))
test_jit_code()
# Test it in AOT code
print("AOT compiling")
cc.compile()
def test_aot_code():
import my_module
print(my_module.add(my_instance))
test_aot_code()
import numpy as np
import numba as nb
from numba.experimental import structref
from numba.core.extending import overload_method
def build_type(fields):
@structref.register
class PointStruct(nb.types.StructRef):
def preprocess_fields(self, _):
return tuple((name, nb.types.unliteral(nb.from_dtype(typ))) for name, typ in fields)
class Point(structref.StructRefProxy):
def __new__(cls, x, y):
return structref.StructRefProxy.__new__(cls, x, y)
for key, tp in fields:
setattr(Point, key, property(nb.njit(lambda self, i=key: getattr(self, i))))
structref.define_proxy(Point, PointStruct, [i[0] for i in fields])
return Point
Point = build_type([('x', np.float32), ('y', np.float32)])
p = Point(1,2)
@nb.njit(cache=True)
def print_point(p):
print(p.x, p.y)
To achieve some customization, I want to build a structref inside a function, inject relevant attributes and functions, and then return it. However, I found that once I return the structref from the function, njit’s cache becomes invalid again. A new cache file is generated every time. Does anyone know how the caching mechanism works?
@yxdragon I’ve long since forgotten how I solved this problem, but I’m pretty sure I solved the issue that you’re having at some point. There is a PreciseCache and cache_safe_exec in this code that I had previously linked above. Those might point you in the right direction.
Perhaps those solutions didn’t get me far though, because what I definitely ended up doing in the end was to make code generators using string templates. When a new struct is needed the generated code would be written to a temp directory and then immediately imported, that way every structref implementation lives in a permanent place, which avoids lots of bugs that occur when you try to generate them dynamically. The solutions I had for that, which are in the repo I linked above, did prove to be relatively portable between OSs. Nonetheless, that trick was one of several nonstandard things that numba forced me to do to get it to work like a normal compiled language. I’ve since moved on from numba to writing C++ extensions for Python which is much less finicky (and compiles considerably faster). Good luck!
I’ve mined lots of good code from @DannyWeitekamp 's CRE. There’s also a discussion of caching here that has some comments about figuring out what’s going on in the cache loader.