Graphics API interop

gmarkall · March 11, 2021, 2:40pm

The general idea is to get hold of a device pointer from D3D interop, and then use it to construct an instance of the DeviceNDArray class. The Rapids Memory Manager (RMM) used to do this before EMM Plugins could be used: rmm/rmm.py at branch-0.13 · rapidsai/rmm · GitHub

def device_array(shape, dtype=np.float, strides=None, order="C", stream=0):
    """
    device_array(shape, dtype=np.float, strides=None, order='C',
                 stream=0)
    Allocate an empty Numba device array. Clone of Numba `cuda.device_array`,
    but uses RMM for device memory management.
    """
    shape, strides, dtype = cuda.api._prepare_shape_strides_dtype(
        shape, strides, dtype, order
    )
    datasize = cuda.driver.memory_size_from_info(
        shape, strides, dtype.itemsize
    )

    buf = librmm.DeviceBuffer(size=datasize, stream=stream)

    ctx = cuda.current_context()
    ptr = ctypes.c_uint64(int(buf.ptr))
    mem = cuda.driver.MemoryPointer(ctx, ptr, datasize, owner=buf)
    return cuda.cudadrv.devicearray.DeviceNDArray(
        shape, strides, dtype, gpu_data=mem
    )

The above code creates a MemoryPointer that points to RMM-allocated memory, then uses it to initialize a DeviceNDArray instance. Assuming you get the pointer to your D3D buffer as an integer, the above could be modified to create a Numba array pointing to the D3D buffer:

def d3d_device_array(ptr, shape, dtype=np.float32, strides=None, order="C"):
    shape, strides, dtype = cuda.api._prepare_shape_strides_dtype(
        shape, strides, dtype, order
    )
    datasize = cuda.driver.memory_size_from_info(
        shape, strides, dtype.itemsize
    )

    def make_finalizer(ptr):
        def finalize():
            # d3d_free is assumed to be a function that "cleans up" ptr
            # e.g. decrementing a reference count, or freeing it, etc...
            # whatever needs to be done when it is no longer needed by
            # Numba.
            d3d_free(ptr)

        return finalize

    ctx = cuda.current_context()
    c_ptr = ctypes.c_uint64(ptr)
    finalizer = make_finalizer(ptr)
    mem = cuda.driver.MemoryPointer(ctx, c_ptr, datasize, finalizer=finalizer)
    return cuda.cudadrv.devicearray.DeviceNDArray(
        shape, strides, dtype, gpu_data=mem
    )

# Using d3d_device_array:

# A function that gets your D3D buffer (I'm unsure of the implementation
# details here, but will somehow follow the SDK example and make it
# accessible to Python)
ptr, size = my_get_d3d_buf()  # Assume this gives a 1D array of float32
d3d_array = d3d_device_array(ptr, size)

# d3d_array is now ready to be passed to a kernel
kernel[griddim, blockdim](d3d_array, ...)

The finalizer is needed so that when the Numba Device Array is garbage collected, it can somehow let D3D know that the pointer is no longer in use / can be freed (I’m not sure exactly what needs to be done, but perhaps you know already / can tell from the SDK example?).

I hope this helps illustrate things - are there other areas I should try to sketch out?

Topic		Replies	Views
CUDA - OpenGL interop Support: How do I do ...?	9	1574	March 30, 2023
Numba for CUDA Programmers course released Announcements	0	665	April 23, 2021
Equivalent of Tex2D or Tex3D and Texture Memory in Current Numba? Support: How do I do ...?	12	2026	July 20, 2022
Usage of CUDA Python, Linear Algebra on GPU and Computational Code Community Support	7	2897	December 31, 2021
Using dynamic shared memory Numba	6	693	August 22, 2023

Graphics API interop

Related Topics