Unable to create an empty array inside the device function

cuda_enthusiast · October 13, 2022, 6:25am

Hello,

I am not able to create an empty array inside a device function that I later want to get filled and returned within the same device function. Short snippet that reproduces the issue is below.

Exception message is also provided. Tried the same with the cuda.local.array method without success.
Must be something very obvious but not sure what exactly is wrong.

Thanks.

@cuda.jit(device=True)
def empty_array(input_array):
    return cuda.device_array(shape=input_array.shape)


@cuda.jit
def test_cuda(array_of_arrays):
    _index = cuda.grid(1)
    if _index < array_of_arrays.shape[0]:
        dated_array = array_of_arrays[_index]
        a = empty_array(dated_array)

Exception: Failed in cuda mode pipeline (step: nopython frontend)
Failed in cuda mode pipeline (step: nopython frontend)
Unknown attribute ‘device_array’ of type Module(<module ‘numba.cuda’ from ‘venv\lib\site-packages\numba\cuda\init.py’>)
def empty_array(input_array):
return cuda.device_array(shape=input_array.shape)
^
During: typing of get attribute at …
File “…”, line 35:
def empty_array(input_array):
return cuda.device_array(shape=input_array.shape)
^
During: resolving callee type: type(<numba.cuda.compiler.Dispatcher object at 0x00000283948DE900>)
During: typing of call at …
File “…”, line 43:
def test_cuda(array_of_arrays):

dated_d2_array = array_of_arrays[_index]
a = empty_array(dated_d2_array)
^

noeliarico · October 14, 2022, 8:14am

Hi,

I’m not very familiar with your problem and probably @gmarkall can shed light here.

Anyway, I think the problem here is with the dynamic size given by parameter, as probably as it happens with cuda.local_array only a constant size is allowed. As far as I know it is not a Numba limitation actually, it is a limitation of CUDA itself as the thread local memory should be statically allocated by the compiler (I think). But let us hear someone else’s opinion, as I’m not even sure about where cuda.device_array is allocating the memory.

gmarkall · October 14, 2022, 8:17am

It’s not possible to allocate an array inside a device function - any arrays that you want to use in a kernel need to be passed in.

gmarkall · October 14, 2022, 8:27am

Ah, @noeliarico beat me to it!

as it happens with cuda.local_array only a constant size is allowed. As far as I know it is not a Numba limitation actually, it is a limitation of CUDA itself as the thread local memory should be statically allocated by the compiler (I think).

It is correct that constant-sized local memory can be used at present. However, this is only a limitation of Numba - you can allocate local memory in a CUDA C/C++ kernel.

We had a session after a Numba dev meeting where I started working on the implementation of dynamic local arrays as a demo of working on Numba (https://www.youtube.com/watch?v=VdqwDyu1lNw) but I never quite finished it up - I did a bit more work on it after that session and the code in its present form is still waiting to be picked up / finished in Commits · gmarkall/numba · GitHub.

I’m not even sure about where cuda.device_array is allocating the memory.

Internally cuda.device_array is calling the driver function cuMemAlloc on the host to allocate memory - however, you can replace the memory allocator with something else using the External Memory Management plugin interface, which you would do if you wanted Numba to share a memory pool with another library.

cuda_enthusiast · October 16, 2022, 6:34am

@gmarkall, @noeliarico - got it! Thank you - I ended up creating an empty CuPy placeholder with right shape and passing it to kernel as an argument. Kernel then passes it down to a device function and all works fine.

Local memory allocation will be a nice enhancement for Numba.

Thanks again.

Topic		Replies	Views
What are the initialised values of numba.cuda.device_array()? Community Support	1	729	March 25, 2022
Cannot create a shared array in a kernel using kernel parameters Community Support	3	877	February 5, 2021
Since input type of kernel function could be np.array, so we dont need to do to_device explicitly? Support: How do I do ...?	2	559	November 2, 2022
Can't use basic NumPy functions with CUDA, like zeros or empty Support: What is this error message?	4	4358	June 14, 2021
Is there a way to pass list of arrays to CUDA kernel? Support: How do I do ...?	2	231	June 16, 2023

Unable to create an empty array inside the device function

Related Topics