Is there a way to pass list of arrays to CUDA kernel?

Andrew · June 3, 2023, 1:46am

I need to access several DeviceNDArrays from one kernel.
Previously, when using @njit, I was passing a List() of NDArray references.
But with @cuda.jit this doesn’t seem to work.

Exception: Failed in cuda mode pipeline (step: native lowering)
NRT required but not enabled
During: lowering “array = getitem(value=arrayOfArrays, index=i, fn=)”

Here’s test example of what I’m trying to do:

import numpy as np
from numba import cuda
from numba.typed import List

@cuda.jit()
def testKernel(arrayOfArrays, numArrays):
    itemIndex = cuda.threadIdx.x
    if (itemIndex < numInputItems):
        for i in range(numArrays):
            array = arrayOfArrays[i] # This line causes "NRT required but not enabled" error
            array[itemIndex] += 1

numInputArrays = 5 # the number may change at runtime
numInputItems = 128 # 1024*1024
#buffersList = List() # apparently it's not allowed to append a DeviceNDArray to typed List
buffersList = [] # use reflected list then
for i in range(numInputArrays):
    buffer_h = np.zeros(numInputItems, dtype=np.float32)
    #buffer_d = cuda.device_array_like(buffer_h) # same error anyway
    buffer_d = cuda.to_device(buffer_h)
    buffersList.append(buffer_d)

testKernel[1, numInputItems](buffersList, numInputArrays)
cuda.synchronise()

Is it possible to pass an array of arrays to CUDA kernel?

I’m using Numba 0.57.

cako · June 9, 2023, 5:40pm

You are actually really close! Numba supports passing a tuple of arrays, but not a list of them.

So if you just pass tuples instead of arrays, your original kernel works almost verbatim (with a minor bug fix):

@cuda.jit()
def testKernel(arrayOfArrays, numArrays):
    itemIndex = cuda.threadIdx.x
    numInputItems = len(arrayOfArrays[0]) # Bugfix. Assumes they all have the same size
    if (itemIndex < numInputItems):
        for i in range(numArrays):
            array = arrayOfArrays[i] # This line works now
            array[itemIndex] += 1

a = cuda.to_device(np.zeros(10))
b = cuda.to_device(np.ones(10))
testKernel[1, min(len(a), len(b))]((a, b), 2)
a_host = a.copy_to_host()
b_host = b.copy_to_host()
np.testing.assert_allclose(a_host, 1)
np.testing.assert_allclose(b_host, 2)

So that works! I would also suggest a few changes which I think will improve your kernel.
First, you don’t need to pass numArrays, since you can infer from tuple. Second, you are only updating your array along the threads, which means that if you launch with more blocks, you will run update the same element many times. There is also an issue with the sizes of the arrays. If you want to allow the arrays to be of different sizes, you need to swap the if and the for. Finally, this kernel only works on 1D arrays. You can make it work for ND arrays by using the .flat property. Changing all this yields:

@cuda.jit()
def testKernelNew(tupleOfArrays):
    numArrays = len(tupleOfArrays)
    itemIndex = cuda.grid(1)
    for i in range(numArrays):
        array = tupleOfArrays[i]
        numInputItems = array.size
        if itemIndex < numInputItems:
            array.flat[itemIndex] += 1


my_range = range(100, 128, 2)
arrays_for_input = tuple(
    [cuda.to_device(i * np.ones(i, dtype=np.float32).reshape(2, -1)) for i in my_range]
)
testKernelNew[(max(a.size for a in arrays_for_input) + 31) // 32, 32](arrays_for_input)
for a, n in zip(arrays_for_input, my_range):
    np.testing.assert_allclose(a.copy_to_host(), n + 1)
    assert a.size == n

Andrew · June 16, 2023, 10:46pm

Thanks! I didn’t even consider using tuples before.

So my test case works with just converting list to tuple when passing the list to kernel:

testKernel[1, numInputItems](tuple(buffersList), numInputArrays)

Topic		Replies	Views
Since input type of kernel function could be np.array, so we dont need to do to_device explicitly? Support: How do I do ...?	2	558	November 2, 2022
Cannot create a shared array in a kernel using kernel parameters Community Support	3	876	February 5, 2021
NRT required but not enabled Support: How do I do ...?	1	1639	October 17, 2022
Reading a DeviceNDArray on the GPU Support: How do I do ...?	11	1212	March 16, 2021
String Argument in CUDA Signature Support: How do I do ...?	4	357	June 27, 2022

Is there a way to pass list of arrays to CUDA kernel?

Related Topics