Sharing CUDA memory by numba

sakuskov · November 3, 2021, 2:18pm

I try to share pytorch tensor by numba like this and it gives me strange results.
when I read the headline with numba, I get an incorrect result from the network from time to time. I added a delay after converting the tensor and it started working correctly. And I not understand why it happens

Code sample:

backbone_features = torch.cat(backbone_features_list, dim=0)
desc = backbone_features.__cuda_array_interface__
shape = desc["shape"]
strides = desc.get("strides")
dtype = np.dtype(desc["typestr"])
shape, strides, dtype = _prepare_shape_strides_dtype(shape, strides, dtype, order="C")
size = cuda.driver.memory_size_from_info(shape, strides, dtype.itemsize)
devptr = cuda.driver.get_devptr_for_active_ctx(
            backbone_features.__cuda_array_interface__["data"][0])
data = cuda.driver.MemoryPointer(
            current_context(), devptr, size=size, owner=backbone_features)
ipch = devices.get_context().get_ipc_handle(data)
desc = dict(shape=shape, strides=strides, dtype=dtype)
handle = pickle.dumps([ipch, desc])

Can it happen when new data is loaded before current data was read?

I also try base example with numba.cuda.api:

        arr = cuda.to_device(tensor)
        handle = arr.get_ipc_handle()
        handle = pickle.dumps(handle)

and

with handle as ipc_array:
    z = cuda.open_ipc_array(ipc_array, 1, dtype='float16', strides=None, offset=0)
    hary = z.args[0].copy_to_host(stream=cuda.stream())

handle sending via localhost

Topic		Replies	Views
Graphics API interop Community Support	14	1095	March 12, 2021
Making Awkward Arrays work in the CUDA target Community Support	4	1350	March 8, 2023
Reading a DeviceNDArray on the GPU Support: How do I do ...?	11	1648	March 16, 2021
Numerical difference in matrix multiplication Numba	1	318	May 21, 2021
Why is my Numba CUDA function slower than PyTorch? Community Support	1	827	February 13, 2024

Sharing CUDA memory by numba

Related topics