Hi!
I’ve written a ray tracing code using Numba with Structrefs to hold a number of variables and methods (e.g. object intersects for different shapes and material properties). It’s an embarrassingly parallelizable code (and does work fine with OpenMP and MPI), but I want to try running the code on a GPU. I’ve done some testing, confirmed I can run kernel and device functions on my GPU, but now I’m having difficulties with using Structrefs on CUDA. Maybe the answer is you can’t, but I wanted to ask if anyone has managed to use them before with CUDA?
Here is the code I’m using to test things:
from numba import cuda
from numba import njit
from numba.experimental import structref
from numba.core.extending import overload_method
from numba.core import types
import numpy as np
use_cuda = True
if use_cuda:
JIT = cuda.jit
DEV_JIT = cuda.jit(device=True)
else:
JIT = njit
DEV_JIT = njit
# Create a basic object here
@structref.register
class PhotonType(types.StructRef):
def preprocess_fields(self, fields):
return tuple((name, types.unliteral(typ)) for name, typ in fields)
# Now the big class definition
class Photon(structref.StructRefProxy):
def __new__(cls, ray):
return structref.StructRefProxy.__new__(cls, ray)
@property
def ray(self):
return _ray(self)
@ray.setter
def ray(self, ray):
return set_ray(self, ray)
@JIT
def _ray(self):
return self.ray
@JIT
def set_ray(self, ray):
for ii in range(3):
self.ray[ii] = ray[ii]
structref.define_proxy(
Photon, PhotonType,
["ray"
]
)
@DEV_JIT
def create_Photon(ray):
return Photon(ray)
dtype = np.float32
Nphoton = 10_000
rays = np.arange(0, 3*Nphoton, dtype=dtype)
if not use_cuda:
@JIT
def test_thing(Nphoton, rays):
for photon_idx in range(Nphoton):
this_ray = rays[3*photon_idx:3*photon_idx+3]
photon = create_Photon(this_ray)
to_run = test_thing
else:
@JIT
def test_thing(Nphoton, rays):
photon_idx = cuda.grid(1)
if photon_idx < Nphoton:
this_ray = rays[3*photon_idx:3*photon_idx+3]
photon = create_Photon(this_ray)
threadsperblock = 32
blockspergrid = (Nphoton + (threadsperblock - 1)) // threadsperblock
to_run = test_thing[blockspergrid, threadsperblock]
to_run(Nphoton, rays)
print("I ran!")
I’ve tried with the setter to see if this was causing issues with creating the object. If you set use_cuda=False it should run fine. Swapping DEV_JIT with JIT for create_Photon gives the same error message (as far as I can see).
This is the error message I get with use_cuda=True:
Traceback (most recent call last):
File "/home/tg/raytrace/.dev_trials/gpu/numba_gpu.py", line 92, in <module>
to_run(Nphoton, rays)
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 700, in __call__
return self.dispatcher.call(
^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 1022, in call
kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 1030, in _compile_for_args
return self.compile(tuple(argtypes))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 1296, in compile
kernel = _Kernel(self.py_func, argtypes, **self.targetoptions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 145, in __init__
cres = compile_cuda(
^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/compiler.py", line 749, in compile_cuda
cres = compile_extra(
^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/compiler.py", line 590, in compile_extra
return pipeline.compile_extra(func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler.py", line 125, in compile_extra
return self._compile_bytecode()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler.py", line 193, in _compile_bytecode
return self._compile_core()
^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler.py", line 172, in _compile_core
raise e
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler.py", line 161, in _compile_core
pm.run(self.state)
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler_machinery.py", line 399, in run
raise patched_exception
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler_machinery.py", line 386, in run
self._runPass(idx, pass_inst, state)
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler_machinery.py", line 338, in _runPass
mutated |= check(pss.run_pass, internal_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler_machinery.py", line 292, in check
mangled = func(compiler_state)
^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/typed_passes.py", line 145, in run_pass
typemap, return_type, calltypes, errs = type_inference_stage(
^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/typed_passes.py", line 124, in type_inference_stage
errs = infer.propagate(raise_errors=raise_errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tg/raytrace/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/typeinfer.py", line 1154, in propagate
raise errors[0]
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Failed in cuda mode pipeline (step: nopython frontend)
No implementation of function Function(<class '__main__.Photon'>) found for signature:
>>> Photon(array(float32, 1d, C))
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function 'ctor': File: ../../../../.dev_trials/gpu/<unknown> (built from string?): Line 0.
With argument(s): '(array(float32, 1d, C))':
Rejected as the implementation raised a specific error:
NumbaRuntimeError: Failed in cuda mode pipeline (step: cuda native lowering)
NRT required but not enabled
During: lowering "st = call $6load_global.0($16load_deref.2, func=$6load_global.0, args=[Var($16load_deref.2, <string>:5)], kws=(), vararg=None, varkwarg=None, target=None)" at <string> (5)
During: Pass cuda_native_lowering
raised from /home/tg/raytrace/.venv/lib/python3.12/site-packages/numba/core/runtime/context.py:42
During: resolving callee type: Function(<class '__main__.Photon'>)
During: typing of call at /home/tg/raytrace/.dev_trials/gpu/numba_gpu.py (57)
File "numba_gpu.py", line 57:
def create_Photon(ray):
return Photon(ray)
^
During: Pass nopython_type_inference
During: resolving callee type: type(CUDADispatcher(<function create_Photon at 0x7096601fc7c0>))
During: typing of call at /home/tg/raytrace/.dev_trials/gpu/numba_gpu.py (81)
File "numba_gpu.py", line 81:
def test_thing(Nphoton, rays):
<source elided>
this_ray = rays[3*photon_idx:3*photon_idx+3]
photon = create_Photon(this_ray)
^
During: Pass nopython_type_inference
I understand NRT required but not enabled is to with dynamic memory allocation, which isn’t possible with CUDA, so this might mean it’s just not possible to do. Nevertheless, I’m hoping someone has some experience doing this and can offer some advice.
Thank you very much in advance! ![]()