Cannot create a shared array in a kernel using kernel parameters

matheuslima · February 4, 2021, 11:47pm

from numba import cuda, float32

@cuda.jit
def my_kernel(w, h):
  cuda.shared.array((w, h), dtype=float32)

my_kernel[4, 4](4, 4)

The above code doesn’t compile, I’m receiving this error message:

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function shared.array at 0x7fefeb373400>) found for signature:
 
 >>> array(UniTuple(int64 x 2), dtype=class(float32))
 
There are 2 candidate implementations:
    - Of which 2 did not match due to:
    Overload of function 'array': File: numba/cuda/cudadecl.py: Line 44.
      With argument(s): '(UniTuple(int64 x 2), dtype=class(float32))':
     No match.

During: resolving callee type: Function(<function shared.array at 0x7fefeb373400>)
During: typing of call at <ipython-input-2-c4eb31dc2103> (5)


File "<ipython-input-2-c4eb31dc2103>", line 5:
def my_kernel(w, h):
  cuda.shared.array((w, h), dtype=float32)
  ^

Why I cannot pass a tuple of integer as parameter to the function?

stuartarchibald · February 5, 2021, 9:46am

Hi @matheuslima,

I think the issue in the above is that cuda.share.array allocations must be sized by compile time constants. This for example would compile:

from numba import cuda, float32

@cuda.jit
def my_kernel():
  cuda.shared.array((4, 4), dtype=float32)

my_kernel[4, 4]()

Numba compiles functions based on the types of the arguments not their values, and so in the original code sample Numba “sees” the argument to cuda.shared.array coming from the input arguments as a UniTuple (a tuple where all the types are the same) comprising two int64s.

Hope this helps?

matheuslima · February 5, 2021, 12:42pm

Hello @stuartarchibald
So there’s no way to allocate a shared array with size specified at runtime? I’m solving a problem where I need to specify a shared array proportional to a image size.

stuartarchibald · February 5, 2021, 6:31pm

Hi @matheuslima

I think you can do “dynamic” shared arrays by declaring the size as zero, like cuda.shared.array(0, dtype=something), and then later declaring a fixed size slice. There’s some examples in the unit tests here: numba/test_sm.py at 00ad12751725391e8423d67ef845ba56110c55df · numba/numba · GitHub

If this isn’t clear then I’ll try and write an example.

This behaviour (compile time constant sizes, all starting from same address and working with offsets) is I think the same as in CUDA C Programming Guide :: CUDA Toolkit Documentation.

Topic		Replies	Views
How do I use `sharedmem` parameter in kernel? Support: How do I do ...?	4	539	March 18, 2023
Dynamically adjust size of cuda.local.array without environment variables Support: How do I do ...?	5	1362	October 18, 2021
Unable to create an empty array inside the device function Support: How do I do ...?	4	1260	October 16, 2022
Configuring how much shared memory is available Support: How do I do ...?	8	112	December 1, 2025
Tuple of CuPy arrays - Numba Cuda Support: How do I do ...?	16	1216	October 18, 2022

Cannot create a shared array in a kernel using kernel parameters

Related topics