Numba crashing IPython kernel/python interpreter

Hey there,

as Numba sadly doesn’t support np.nanmin or np.nanmax for parallelization I (and someone else) tried to do it ourselves:

Code
import numpy as np
import numba as nb
from math import isnan, inf

@nb.njit(fastmath=True)
def _minmax_nan(x):
    maximum = -inf
    minimum = inf
    for i in x:
        if not isnan(i):
            if i > maximum:
                maximum = i
            if i < minimum:
                minimum = i
    return minimum, maximum

@nb.njit(parallel=True)
def _minmax_chunks_nan(x, chunk_ranges):
    overall_maxima = []
    overall_minima = []
    for i in nb.prange(chunk_ranges.shape[0]):
        start = chunk_ranges[i, 0]
        end = chunk_ranges[i, 1]
        chunk_minimum, chunk_maximum = _minmax_nan(x[start : end])
        overall_maxima.append(chunk_maximum)
        overall_minima.append(chunk_minimum)
    return min(overall_minima), max(overall_maxima)

def even_chunk_sizes(dividend, divisor):
    quotient, remainder = divmod(dividend, divisor)
    cells = [quotient for _ in range(divisor)]
    for i in range(remainder):
        cells[i] += 1
    return cells

def even_chunk_ranges(dividend, divisor):
    sizes = even_chunk_sizes(dividend, divisor)
    ranges = []
    start = 0
    for s in sizes:
        end = start + s
        ranges.append((start, end))
        start = end
    return ranges

def nanminmax_parallel(x, n_chunks):
    chunk_ranges = np.array([
        [start, end]
        for start, end
        in even_chunk_ranges(len(x), n_chunks)
    ], dtype=np.int64)
    return _minmax_chunks_nan(x, chunk_ranges)

Doing things like this in the jupyter notebook is a sure way to kill the kernel:

arr = np.random.rand(10)
%timeit nanminmax_parallel(arr, 4)

Just calling the function in quick succession seems to cause a crash.

Can someone help with this issue? Why does it crash, seemingly by chance?
Also, I’m pretty new to numba+JIT so any suggestions to improve this piece of code would be much appreciated.

Best regards

PS: relevant github with ipynb+binder: GitHub - rynkk/misc-ipynbs

Hey rynkk ,

I don’t think it’s related to Jupyter (or timeit). I get similar crashes when running it from a normal .py file. When doing so in a loop, it crashes after between 0 and 5 executions for me (judging by what it prints to the terminal).

This disappears when I disable parallel in your _minmax_chunks_nan function. Perhaps appending to a List in parallel is not supported?

Regards,
Rutger

Hi @rynkk

@Rutger’s assessment is correct, the issue is that concurrent write operations on container types are not thread safe. The docs for the upcoming 0.54.0 release have this highlighted: Automatic parallelization with @jit — Numba 0.54.0rc1+0.g9bed2ebb2.dirty-py3.7-linux-x86_64.egg documentation

Hi @stuartarchibald and @Rutger,
thank you very much for the hint, I have altered _minmax_chunks_nan to this and now it works flawlessly:

@nb.njit(parallel=True)
def _minmax_chunks_nan(x, chunk_ranges):
    n_chunks = len(chunk_ranges)
    max_results = [-inf]*n_chunks
    min_results = [inf]*n_chunks
    for i in nb.prange(n_chunks):
        start = chunk_ranges[i, 0]
        end = chunk_ranges[i, 1]
        chunk_minimum, chunk_maximum = _minmax_nan(x[start : end])
        min_results[i] = chunk_minimum
        max_results[i] = chunk_maximum
            
    return min(min_results), max(max_results)
1 Like