@ed-o-saurus mentioned this already, but I put together a small example using pynvjitlink to embed the CUDA C++ code in the Python source:
from numba import cuda, int32
from pynvjitlink import patch
patch.patch_numba_linker()
cu_functions = cuda.CUSource('''
extern "C" __device__ int foo(int* return_value){
*return_value = 42;
return 0;
}
''')
foo = cuda.declare_device('foo', int32())
@cuda.jit(link=[cu_functions])
def kernel():
print(foo())
kernel[1, 1]()
cuda.synchronize()
I think extending cppyy
’s numba_ext
to support the CUDA target will also require some source-code processing to handle the differences between the usual C++ ABI and that used by Numba internally, along with qualifying functions with the __device__
keyword. There are maybe other things I haven’t thought about yet.
To answer your original question about how to determine the current target, I think you can do:
from numba.core.target_extension import current_target
target = current_target()
and target
will then be either "cuda"
or "cpu"
depending on the current target.