Serializing @jitclass for Arrow Plasma Store (or other in-memory cache)

I’m using Numba nad jitclasses extensively in a project (thanks for all the great features!).

I’m looking to store instances of jitclasses in an in-memory store like the Plasma Store. Is there a way to do this?

If I try to do this naively like I would with other types, this is what happens:

import pyarrow.plasma as plasma
import numba

import numba

class_spec = [('int_variable', numba.int32)]

@numba.experimental.jitclass(class_spec)
class TestClass:
    """ Minimal jitclass """
    def __init__(self):
        self.int_variable = 0
        
t = TestClass()

client = plasma.connect("/tmp/plasma")
client.put(t)

I get this error:

---------------------------------------------------------------------------
SerializationCallbackError                Traceback (most recent call last)
<ipython-input-18-aa436983a803> in <module>
----> 1 client.put(t)

~/.venv/cenv/lib/python3.6/site-packages/pyarrow/_plasma.pyx in pyarrow._plasma.PlasmaClient.put()

~/.venv/cenv/lib/python3.6/site-packages/pyarrow/serialization.pxi in pyarrow.lib.serialize()

~/.venv/cenv/lib/python3.6/site-packages/pyarrow/serialization.pxi in pyarrow.lib.SerializationContext._serialize_callback()

SerializationCallbackError: pyarrow does not know how to serialize objects of type <class 'numba.experimental.jitclass.boxing.TestClass'>.

All my jitclasses are fairly straightforward, with all instance variables being NumPy scalars or arrays.

I haven’t been able to serialize jitclass instances, I don’t know if it’s possible. If your class is simple you could use a namedtuple instead, I’ve managed to use them as objects that can be jitted and serialized.

I do not know the Plasma Store, but if I am not mistaken you can write your own custom pickle-classes, in which you could possibly manually work your way around numba (and maybe Arrow Plasma offers similar possibilities)

This is not necessarily fun or very efficient, but it should be possible to use some kind of Python dummy class (maybe even the one you decorated with jitclass) to pickle and unpickle, and then you need some kind of “translation layer” between Python class and jitclass (which could probably be included transparently in the custom serialisation logic).

I am not certain how to deal with the fact that jitclasses are re-compiled in every interpreter session though (or if that would even be a problem)

TTBOMK jitclasses can not be pickled: https://github.com/numba/numba/issues/1846

Thanks for the suggestions. It seems like I’d be better off moving the state of my jitclass into something like a namedtuple or structured array and having that be serialised instead.

@harald you can try the newly introduced StructRef https://numba.pydata.org/numba-doc/dev/extending/high-level.html instead of jitclass

Interesting, thanks @rpopovici! I wasn’t aware of StructRef, I’ll have a look at that.

Is it possible to at least get the sizeof of the jitclass data? Like the number of bytes you’d have to malloc in order to hold the data for a jitclass instance?