Cannot create a shared array in a kernel using kernel parameters

from numba import cuda, float32

@cuda.jit
def my_kernel(w, h):
  cuda.shared.array((w, h), dtype=float32)

my_kernel[4, 4](4, 4)

The above code doesn’t compile, I’m receiving this error message:

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function shared.array at 0x7fefeb373400>) found for signature:
 
 >>> array(UniTuple(int64 x 2), dtype=class(float32))
 
There are 2 candidate implementations:
    - Of which 2 did not match due to:
    Overload of function 'array': File: numba/cuda/cudadecl.py: Line 44.
      With argument(s): '(UniTuple(int64 x 2), dtype=class(float32))':
     No match.

During: resolving callee type: Function(<function shared.array at 0x7fefeb373400>)
During: typing of call at <ipython-input-2-c4eb31dc2103> (5)


File "<ipython-input-2-c4eb31dc2103>", line 5:
def my_kernel(w, h):
  cuda.shared.array((w, h), dtype=float32)
  ^

Why I cannot pass a tuple of integer as parameter to the function?

Hi @matheuslima,

I think the issue in the above is that cuda.share.array allocations must be sized by compile time constants. This for example would compile:

from numba import cuda, float32

@cuda.jit
def my_kernel():
  cuda.shared.array((4, 4), dtype=float32)

my_kernel[4, 4]()

Numba compiles functions based on the types of the arguments not their values, and so in the original code sample Numba “sees” the argument to cuda.shared.array coming from the input arguments as a UniTuple (a tuple where all the types are the same) comprising two int64s.

Hope this helps?

Hello @stuartarchibald
So there’s no way to allocate a shared array with size specified at runtime? I’m solving a problem where I need to specify a shared array proportional to a image size.

Hi @matheuslima

I think you can do “dynamic” shared arrays by declaring the size as zero, like cuda.shared.array(0, dtype=something), and then later declaring a fixed size slice. There’s some examples in the unit tests here: numba/test_sm.py at 00ad12751725391e8423d67ef845ba56110c55df · numba/numba · GitHub

If this isn’t clear then I’ll try and write an example.

This behaviour (compile time constant sizes, all starting from same address and working with offsets) is I think the same as in CUDA C Programming Guide :: CUDA Toolkit Documentation.