Extending Numba for CUDA inside Cppyy

@ed-o-saurus mentioned this already, but I put together a small example using pynvjitlink to embed the CUDA C++ code in the Python source:

from numba import cuda, int32
from pynvjitlink import patch

patch.patch_numba_linker()

cu_functions = cuda.CUSource('''
extern "C" __device__ int foo(int* return_value){
  *return_value = 42;
  return 0;
}
''')

foo = cuda.declare_device('foo', int32())


@cuda.jit(link=[cu_functions])
def kernel():
    print(foo())


kernel[1, 1]()
cuda.synchronize()

I think extending cppyy’s numba_ext to support the CUDA target will also require some source-code processing to handle the differences between the usual C++ ABI and that used by Numba internally, along with qualifying functions with the __device__ keyword. There are maybe other things I haven’t thought about yet.

To answer your original question about how to determine the current target, I think you can do:

from numba.core.target_extension import current_target
target = current_target()

and target will then be either "cuda" or "cpu" depending on the current target.

1 Like