Docs describe that dispatcher object can be configured with sharedmem:
func[griddim, blockdim, stream, sharedmem]
How do I refer to this value of sharedmem from within kernel function?
Thank you!
Docs describe that dispatcher object can be configured with sharedmem:
func[griddim, blockdim, stream, sharedmem]
How do I refer to this value of sharedmem from within kernel function?
Thank you!
You declare a shared array of 0 size, like:
cuda.shared.array(0, dtype=np.float32)
Note that all dynamic shared memory arrays alias, so if you want to have multiple dynamic shared arrays, you need to take disjoint views of the arrays. As an example, consider:
from numba import cuda
import numpy as np
@cuda.jit
def f():
f32_arr = cuda.shared.array(0, dtype=np.float32)
i32_arr = cuda.shared.array(0, dtype=np.int32)
f32_arr[0] = 3.14
print(f32_arr[0])
print(i32_arr[0])
f[1, 1, 0, 4]()
cuda.synchronize()
This allocates 4 bytes of shared memory (large enough for one int32
or one float32
) and declares dynamic shared memory arrays of type int32
and of type float32
. When f32_arr[0]
is set, this also sets the value of i32_arr[0]
, because they’re pointing at the same memory. So we see as output:
$ python repro.py
3.140000
1078523331
because 1078523331
is the int32
represented by the bits of the float32
3.14
. If we take disjoint views of the dynamic shared memory:
@cuda.jit
def f_with_view():
f32_arr = cuda.shared.array(0, dtype=np.float32)
i32_arr = cuda.shared.array(0, dtype=np.int32)[1:]
f32_arr[0] = 3.14
i32_arr[0] = 1
print(f32_arr[0])
print(i32_arr[0])
f_with_view[1, 1, 0, 8]()
cuda.synchronize()
This time we declare 8 dynamic shared memory bytes, using the first four for a float32
value and the next four for an int32
value - we see we can set both the int32
and float32
value without them aliasing:
$ python repro.py
3.140000
1
Thank you Graham. This informative answer belongs to the docs!
Unless it’s already there, somewhere…
You’re right, it does, and it’s not really written anywhere - I think when the docs were first written it was assumed that readers would learn or know of the semantics from the CUDA C programming guide, but these days it’s reasonable to expect to only need to use the Numba docs for things like this.
Indeed and I have just enough of CUDA C to be horrible at it but I wasn’t certain. My first instinct was to set the size to 0 but didn’t dare to be certain…Anyway if it’s up in docs it’ll help. Thanks!