Oh! Let me correct the solution then.
Will do
The original issue has been overcome. Here’s the kernel after some changes:
@cuda.jit
def get_surface(binary_image, surface, counter):
row, image_slice = cuda.grid(2)
span_found = False
if row < binary_image.shape[0] and image_slice < binary_image.shape[2]: # guard for rows and slices
for column in range(binary_image.shape[1]):
if binary_image[row, column, image_slice] == 1:
if not span_found: # Connected Component found
span_found = True
cuda.atomic.add(counter, 0, 1)
surface[counter[0]] = (row, column, image_slice)
if column == binary_image.shape[1] - 1: # Case where span starts in the last column
cuda.atomic.add(counter, 0, 1)
surface[counter[0]] = (row, column, image_slice)
else: # Case where span ends in the last column
cuda.atomic.add(counter, 0, 1)
surface[counter[0]] = (row, column, image_slice)
elif binary_image[row, column, image_slice] == 0 and span_found:
cuda.atomic.add(counter, 0, 1)
surface[counter[0]] = (row, column, image_slice)
To add some context to the kernel:
Binary_image and surface are the CuPy arrays from which I’ll read from and write to respectively. Counter is a Numba array which I’m using as a counter. Binary_image is some 3D binary object with 0 representing the background and 1 the object - arranged as (number_of_points, 3).
The goal is to extract the points that make up the object’s surface which I’m doing using spans. As far as I know, Numba CUDA does not allow Python lists due to their dynamic nature, nor does NumPy/CuPy create mutable arrays. Therefore, a solution could be to preallocate surface using something like np.zeros().
However, I still need to ensure that these positions are indexed hence the counter:
surface[counter[0]] = (row, column, image_slice)
I now have a new issue where this instruction doesn’t work. This is what I get:
No implementation of function Function(<built-in function setitem>) found for signature:
>>> setitem(array(uint16, 2d, C), int32, Tuple(int32, int64, int32))
There are 16 candidate implementations:
- Of which 16 did not match due to:
Overload of function 'setitem': File: <numerous>: Line N/A.
With argument(s): '(array(uint16, 2d, C), int32, Tuple(int32, int64, int32))':
No match.
def get_surface(binary_image, surface, counter):
<source elided>
cuda.atomic.add(counter, 0, 1)
surface[counter[0]] = (row, column, image_slice)
^
Is there some fundamental mistake I’m making?