Hi, while running a parallel numba code, threads would freeze with no warning. This could be avoided by simply generating a random number within the thread loop. I don’t know if it is a bug with Numba or if there is something wrong with my algorithm. The whole code is very long, so I wrote a shorter version with the same parallel logic, but without the main calculations. If the function is run with with_bug=True, the thread will freeze, if with_bug=False, it will run in about 40s.

After this workaround, the code worked perfectly, I got a ~6x speedup with the parallel code, and it can run without freezing even with tens of minutes calculations.

The main justification for this algorithm is that in the full code, worker_subresult is replaced by multiple lists with variable length, and the main thread consolidates all lists in a main list, with all values in the same order.

Here is the code that reproduces the bug:

import time

from numba import njit, prange

import numpy as np

size = 500

array = np.random.binomial(127, 0.5, size = (size, size))

IDLE = 0

ASSIGNED = 1

DONE = 2

FINISHED = 3

@njit

def wait():

np.random.binomial(1, 0.5, size = 1)

@njit(parallel=True)

def test(test_array, with_bug):

```
threads = 2
size = test_array.shape[0]
worker_status = np.zeros(threads-1, dtype=np.uint8)
worker_row = np.zeros(threads-1, dtype=np.uint8)
worker_subresult = np.zeros(threads-1, dtype=np.uint8)
result = np.zeros(1, dtype=np.float64)
next_row = 0
for w in prange(threads):
if w == (threads - 1):
while True:
if not with_bug: wait()
if (worker_status == FINISHED).all():
break
for i in range(threads-1):
if worker_status[i] == IDLE:
if next_row < size:
worker_row[i] = next_row
next_row += 1
worker_status[i] = ASSIGNED
else:
worker_status[i] = FINISHED
elif worker_status[i] == ASSIGNED:
pass
elif worker_status[i] == DONE:
result[0] += worker_subresult[i]
if next_row < size:
worker_row[i] = next_row
next_row += 1
worker_status[i] = ASSIGNED
else:
worker_status[i] = FINISHED
else:
while True:
if not with_bug: wait()
if worker_status[w] == FINISHED:
break
elif worker_status[w] == ASSIGNED:
row = worker_row[w]
worker_subresult[w] = 0
for j in range(size):
val = test_array[row, j]
for k in range(500 * val):
worker_subresult[w] += np.sqrt((np.random.random()-0.4))
worker_status[w] = DONE
return result[0]
```

start = time.perf_counter()

print(test(array, with_bug=False))

stop = time.perf_counter()

print(stop - start)

I have two questions:

- Is this a bug in Numba?
- Is there a better way to parallelize the code?