Threads freeze with no warning

Hi, while running a parallel numba code, threads would freeze with no warning. This could be avoided by simply generating a random number within the thread loop. I don’t know if it is a bug with Numba or if there is something wrong with my algorithm. The whole code is very long, so I wrote a shorter version with the same parallel logic, but without the main calculations. If the function is run with with_bug=True, the thread will freeze, if with_bug=False, it will run in about 40s.

After this workaround, the code worked perfectly, I got a ~6x speedup with the parallel code, and it can run without freezing even with tens of minutes calculations.

The main justification for this algorithm is that in the full code, worker_subresult is replaced by multiple lists with variable length, and the main thread consolidates all lists in a main list, with all values in the same order.

Here is the code that reproduces the bug:

import time
from numba import njit, prange
import numpy as np
size = 500
array = np.random.binomial(127, 0.5, size = (size, size))

IDLE = 0
ASSIGNED = 1
DONE = 2
FINISHED = 3

@njit
def wait():
np.random.binomial(1, 0.5, size = 1)

@njit(parallel=True)
def test(test_array, with_bug):

threads = 2

size = test_array.shape[0]
worker_status = np.zeros(threads-1, dtype=np.uint8)
worker_row = np.zeros(threads-1, dtype=np.uint8)
worker_subresult = np.zeros(threads-1, dtype=np.uint8)
result = np.zeros(1, dtype=np.float64)
next_row = 0

for w in prange(threads):
    if w == (threads - 1):
        while True:
            if not with_bug: wait()
            if (worker_status == FINISHED).all():
                break
            for i in range(threads-1):
                if worker_status[i] == IDLE:
                    if next_row < size:
                        worker_row[i] = next_row
                        next_row += 1
                        worker_status[i] = ASSIGNED
                    else:
                        worker_status[i] = FINISHED
                elif worker_status[i] == ASSIGNED:
                    pass
                elif worker_status[i] == DONE:
                    result[0] += worker_subresult[i]
                    if next_row < size:
                        worker_row[i] = next_row
                        next_row += 1
                        worker_status[i] = ASSIGNED
                    else:
                        worker_status[i] = FINISHED

    else:
        while True:
            if not with_bug: wait()
            if worker_status[w] == FINISHED:
                break
            elif worker_status[w] == ASSIGNED:
                row = worker_row[w]
                worker_subresult[w] = 0
                for j in range(size):
                    val = test_array[row, j]
                    for k in range(500 * val):
                        worker_subresult[w] += np.sqrt((np.random.random()-0.4))
                worker_status[w] = DONE

return result[0]

start = time.perf_counter()
print(test(array, with_bug=False))
stop = time.perf_counter()
print(stop - start)

I have two questions:

  • Is this a bug in Numba?
  • Is there a better way to parallelize the code?