We have a large kernel that takes complex data structures. (We are using cooperative groups so don’t need individual kernels for sync purposes.) Currently data structures are passed as nested NamedTuples. However, we have hit a " Formal Parameter Space Overflowed error.
Researching this, it would seem that namedtuples are passed by value. Even if we keep within the bounds (512 elements total), copying 512 elements per call is hardly ideal. And we would like to avoid having to add complexity by obscuring our datastructures.
Is there any way to avoid this? Especially as these are immutable, it seems very wasteful to copy the structures when a simple pointer would suffice.
We have tried to use a custom structref built object
However, even for trivial examples, we get an error when we invoke a kernel with our structref object as an argument - eg -
NRT required but not enabled
During: lowering "$18load_attr.3 = getattr(value=instance, attr=x)"
I suspect that structref is insisting on refcounting our object, which is unavailable in CUDA. Can we simply get a pointer and avoid ref counting? Could we … create a c structure through ctypes and pass it to the kernel perhaps?
Grateful for suggestions!