The CUDA Array Interface facilitates passing data between different CUDA aware libraries, and is supported by CuPy, Numba, and others. I have an example of implementing the interface for Runtime API-allocated pointers, which I’ll post later - I’m on a personal machine at the moment and I don’t have it handy.
Got the example now - it is:
from numba import cuda
# Get access to the CUDA Runtime API via ctypes
from ctypes import CDLL, POINTER, byref, c_void_p, c_size_t
cudart = CDLL('libcudart.so')
cudaMalloc = cudart.cudaMalloc
cudaMalloc.argtypes = [POINTER(c_void_p), c_size_t]
cudaFree = cudart.cudaFree
cudaFree.argtypes = [c_void_p]
# Create a class to provide the CUDA Array interface.
#
# Simple example for read-write contiguous data.
FLOAT32_SIZE = 4
class MyFloatArray:
"""An array of floats that allocates memory on the device when constructed,
and frees it when it is deleted.
"""
def __init__(self, size):
self._size = size
self._typestr = 'f4'
self._ptr = c_void_p()
alloc_size = size * FLOAT32_SIZE
ret = cudaMalloc(byref(self._ptr), alloc_size)
if ret:
raise RuntimeError(f'Unexpected return code {ret} from cudaMalloc')
def __del__(self):
cudaFree(self._ptr)
@property
def __cuda_array_interface__(self):
return {
'shape': (self._size,),
'typestr': self._typestr,
'data': (self._ptr.value, False),
'version': 2,
}
# Create an instance of the array object that allocates some memory
nelems = 32
arr = MyFloatArray(nelems)
# For sanity checks / inspection
arr_cai = arr.__cuda_array_interface__
cai_ptr = arr_cai['data'][0]
print(f"Pointer from CAI is 0x{cai_ptr:x}")
print(f"Shape from CAI is {arr_cai['shape']}")
print(f"Type string from CAI is '{arr_cai['typestr']}'")
print(f"Version from CAI is {arr_cai['version']}")
# Launch a kernel on the array holding the Runtime API-allocated memory
@cuda.jit
def initialize(x):
i = cuda.grid(1)
if i < len(x):
x[i] = 3.14
initialize[1, 32](arr)
# We can also copy the data to the host and print it out to check that it is
# expected. For simplicity in this example, we'll use the as_cuda_array()
# function, that takes any object and creates a Numba device array from it, so
# that we can copy the data and print it.
print(cuda.as_cuda_array(arr).copy_to_host())
# Prints:
# [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14
# 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14
# 3.14 3.14 3.14 3.14]
I had a feeling that exchanging data between contexts required calling a function to get a valid pointer for the current context (or this might have been related to IPC) - I can’t find the documentation of what I was thinking of right now, but will post back when I have located it. In general if you’re just using a single context, I think passing pointers between different libraries will be fine.
Having done some further reading to remind myself, I think I may have been thinking of Peer Access between different devices (CUDA Driver API :: CUDA Toolkit Documentation) - if you’re only concerned with pointers on the same device, then I think this isn’t something you need to worry about for Awkward and could consider all device pointers allocated by different libraries / APIs interchangeable between those libraries and API functions.