GPU function apparently blocking due to data size/complexity

JRibeiro · December 29, 2020, 7:51pm

I’ve implemented a function that performs kernels 1 and 2 of the CCL algorithm found here: Traversing a matrix on a thread per row basis

The data is a binary 3D matrix (719x701x900) containing many particles as seen in the image below (I’ve left some orange noise for clarity):

Kernel 1 returns the spans and compressed label matrices as seen in the following sample:
Spans Matrix: Rows 698 to 718, Slice 0

Compressed Labels Matrix: Rows 698 to 718, Slice 0

Kernel 2 updates labels by contiguity and requires several iterations. It’s confirmed to work with very simple data. Below is a simple structure made up of 2 (50x50x50) cubes and one (250x25x25) parallelepiped and its finished spans and compressed label matrices. As can be seen, the function correctly identifies the spans and labels the object with a single label.

Spans Matrix: Rows 39 to 59, Slice 0 | Compressed Labels Matrix: Rows 39 to 59, Slice 0

However, the kernel doesn’t seem to work for larger data. There are no errors. The kernel simply blocks. In fact, it isn’t possible to retrieve any information - e.g. I can’t return the spans nor compressed labels matrices even after only 1 iteration.
Does anyone have any idea why this happens? Could it be the kernel is still working and, if so, is there any way to know?

JRibeiro · December 29, 2020, 7:53pm

Here’s a portion of the code for kernel 2:

@cuda.jit
def kernel_2(s_matrix, cl_matrix, changed):
    row, image_slice = cuda.grid(2)  # the position of the row. All positions start at the 0th column
    pos_s, pos_l = 0, 0  # the position of the row in the spans and label matrices
    pre_s, pre_l = 0, 0  # the previous row
    post_s, post_l = 0, 0  # the following row
    down_s, down_l = 0, 0  # the downwards row
    up_s, up_l = 0, 0  # the upwards row

    if row < s_matrix.shape[0] and image_slice < s_matrix.shape[2]:  # guard for rows and slices
        for column in range(cl_matrix.shape[1]):
            current_span_label = cl_matrix[row, pos_l, image_slice]
            if current_span_label == -1:  # guard for a valid label i.e. it's not a null label
                break
            current_span_start = s_matrix[row, pos_s, image_slice]
            current_span_end = s_matrix[row, pos_s + 1, image_slice]
            # previous row
            if image_slice > 0:  # makes no sense to check the previous row of the first slice
                while -1 < s_matrix[row, pre_s, image_slice - 1] < current_span_end:
                    if current_span_start <= s_matrix[row, pre_s + 1, image_slice - 1] \
                            and s_matrix[row, pre_s, image_slice - 1] <= current_span_end:
                        interval_label = cl_matrix[row, pre_l, image_slice - 1]
                        if interval_label == current_span_label:
                            pre_s = pre_s + 2
                            pre_l = pre_l + 1
                            continue
                        if interval_label == -1:
                            break
                        min_label = min(current_span_label, interval_label)
                        cuda.atomic.min(cl_matrix, (row, pos_l, image_slice), min_label)
                        cuda.atomic.min(cl_matrix, (row, pre_l, image_slice - 1), min_label)
                        cuda.atomic.add(changed, 0, 1)
                        pre_s = pre_s + 2
                        pre_l = pre_l + 1
            # ...omitted code for other rows
            pos_l = pos_l + 1
            pos_s = pos_s + 2

JRibeiro · January 9, 2021, 2:50am

Can’t believe I wasted so much time on this. Problem found. I wasn’t considering the case where the intervals don’t overlap.
All that’s needed is to… remove a tab from the following lines:

pre_s = pre_s + 2  # one tab back
pre_l = pre_l + 1  # one tab back

Topic		Replies	Views
Variable not changing inside while loop in GPU Support: What is this error message?	5	468	September 7, 2021
Error when multiplying device array elements with a scaler (mul(float32, array(float32, 0d, C))) Support: What is this error message?	4	595	September 17, 2022
CUDA: Explainer of a kernel with 2D blocks, shared memory, atomics Community Support	2	1585	September 11, 2020
[IHELP] Numba CUDA running of different GPUs Support: What is this error message?	9	2566	August 3, 2020
CUDA - Nvprof error? Support: What is this error message?	2	1538	January 8, 2021

GPU function apparently blocking due to data size/complexity

Related Topics