How do I use `sharedmem` parameter in kernel?

Docs describe that dispatcher object can be configured with sharedmem:

func[griddim, blockdim, stream, sharedmem]

How do I refer to this value of sharedmem from within kernel function?

Thank you!

You declare a shared array of 0 size, like:

cuda.shared.array(0, dtype=np.float32)

Note that all dynamic shared memory arrays alias, so if you want to have multiple dynamic shared arrays, you need to take disjoint views of the arrays. As an example, consider:

from numba import cuda
import numpy as np


@cuda.jit
def f():
    f32_arr = cuda.shared.array(0, dtype=np.float32)
    i32_arr = cuda.shared.array(0, dtype=np.int32)
    f32_arr[0] = 3.14
    print(f32_arr[0])
    print(i32_arr[0])


f[1, 1, 0, 4]()
cuda.synchronize()

This allocates 4 bytes of shared memory (large enough for one int32 or one float32) and declares dynamic shared memory arrays of type int32 and of type float32. When f32_arr[0] is set, this also sets the value of i32_arr[0], because they’re pointing at the same memory. So we see as output:

$ python repro.py 
3.140000
1078523331

because 1078523331 is the int32 represented by the bits of the float32 3.14. If we take disjoint views of the dynamic shared memory:

@cuda.jit
def f_with_view():
    f32_arr = cuda.shared.array(0, dtype=np.float32)
    i32_arr = cuda.shared.array(0, dtype=np.int32)[1:]
    f32_arr[0] = 3.14
    i32_arr[0] = 1
    print(f32_arr[0])
    print(i32_arr[0])


f_with_view[1, 1, 0, 8]()
cuda.synchronize()

This time we declare 8 dynamic shared memory bytes, using the first four for a float32 value and the next four for an int32 value - we see we can set both the int32 and float32 value without them aliasing:

$ python repro.py
3.140000
1
2 Likes

Thank you Graham. This informative answer belongs to the docs! :slight_smile:

Unless it’s already there, somewhere…

You’re right, it does, and it’s not really written anywhere - I think when the docs were first written it was assumed that readers would learn or know of the semantics from the CUDA C programming guide, but these days it’s reasonable to expect to only need to use the Numba docs for things like this.

Indeed and I have just enough of CUDA C to be horrible at it but I wasn’t certain. My first instinct was to set the size to 0 but didn’t dare to be certain…Anyway if it’s up in docs it’ll help. Thanks!

1 Like