I’m trying to remap all values in an array according to some 1-1 correspondence. This can be accomplished by the `skimage.util.map_array`

function, inner-loop implementation here. (There is a pure Python wrapper that takes care of the array shapes and array allocation here.)

Here’s how it looks in practice:

```
In [10]: values = np.random.randint(0, 5, size=10)
In [11]: inval = np.arange(5)
In [12]: outval = np.random.random(5)
In [13]: values
Out[13]: array([0, 0, 4, 0, 3, 2, 0, 2, 0, 2])
In [14]: inval
Out[14]: array([0, 1, 2, 3, 4])
In [15]: outval
Out[15]: array([0.595442 , 0.22325946, 0.16452037, 0.70457358, 0.37474462])
In [16]: map_array(values, inval, outval)
Out[16]:
array([0.595442 , 0.595442 , 0.37474462, 0.595442 , 0.70457358,
0.16452037, 0.595442 , 0.16452037, 0.595442 , 0.16452037])
```

This works well but it’s about 4x slower than using array indexing, as in `outval[values]`

:

```
In [39]: image = np.random.randint(0, 5, size=(2048, 2048))
In [40]: %timeit map_array(image, inval, outval)
35.6 ms ± 249 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [41]: %timeit outval[image]
9.48 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

And the problem with the NumPy indexing approach is that you end up with a really huge `outval`

array if the values in `image`

are large — even if you don’t actually have many of them. (e.g. to map `2**32`

and `2**32+1`

to `0.5`

and `1`

, you need to allocate a 4GB array!)

I thought I’d give Numba a go since dictionaries were implemented “recently”. (Thank you! ) That turns out to be ~2x slower still than the C++ `unordered_map`

approach.

```
import numba
@numba.jit
def _map_array(inarr, outarr, inval, outval):
lut = {}
for i in range(len(inval)):
lut[inval[i]] = outval[i]
for i in range(len(inarr)):
outarr[i] = lut[inarr[i]]
```

Measurement:

```
In [30]: nd._map_array(image.ravel(), outarr.ravel(), inval, outval)
In [31]: %timeit nd._map_array(image.ravel(), outarr.ravel(), inval, outval)
69.6 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

Any ideas on how to speed this up?

Thank you!