How do I use thread locks in nopython mode?

There is prior discussion (not from me) here.

The pattern I want to make work is this one:

import numba as nb
import numpy as np


@nb.jit(nopython=True, nogil=True, parallel=True)
def remap(arr):
    arr = arr.copy()
    assert arr.ndim == 1

    mapped_idx = 0

    for idx in nb.prange(arr.shape[0]):
        el = arr[idx]
        if el == -2:
            arr[idx] = mapped_idx
            mapped_idx += 1  # <- That's where there is a race condition.
    return arr

remap_nopara = jit(nopython=True, nogil=True)(remap.py_func)

a = np.random.randint(-2, 3, size=100)
nb.set_num_threads(4)

b = remap(a)
c = remap_nopara(a)
assert np.all(b == c), (b, c)

So is there a way to protect mapped_idx behind a semaphore in numba? If it was just reduction, it would have been easy.