How do I create this lookup table in Numba?

EDIT: I realized that numba says you cannot edit the dictionary parallel, so I’d lose the ability to parallelize if I did this and that is probably not worth the trade off.

Hello! I am attempting to precalculate a lookup table for an entire 2d grid - I successfully did it with a 2d array holding the results, but it would be faster to access if I had a dictionary instead. Here is what I currently have:

from numba.typed import Dict
from numba.core import types


@numba.jit(parallel=True)
def numba_create_world_lookup_table(local_x_scale, local_y_scale, local_height, local_width):
    lookup_table = Dict.empty(types.unicode_type, numba.float64[:])
    for x in numba.prange(local_width + 1):
        for y in numba.prange(local_height + 1):
            key = (x,y)
            scaled_x_coord = x / local_x_scale
            scaled_y_coord = y / local_y_scale
            lookup_table[key] = (scaled_x_coord, scaled_y_coord)
    return lookup_table

I have tried a 2d dict, using a str as a key and a few other things with no luck.

I get the following error:

No implementation of function Function(<built-in function setitem>) found for signature:
 
 >>> setitem(DictType[unicode_type,array(float64, 1d, A)]<iv=None>, unicode_type, reflected list(float64)<iv=None>)
 
There are 16 candidate implementations:
  - Of which 14 did not match due to:
  Overload of function 'setitem': File: <numerous>: Line N/A.
    With argument(s): '(DictType[unicode_type,array(float64, 1d, A)]<iv=None>, unicode_type, reflected list(float64)<iv=None>)':
   No match.
  - Of which 2 did not match due to:
  Overload in function 'impl_setitem': File: numba\typed\dictobject.py: Line 706.
    With argument(s): '(DictType[unicode_type,array(float64, 1d, A)]<iv=None>, unicode_type, reflected list(float64)<iv=None>)':
   Rejected as the implementation raised a specific error:
     NumbaNotImplementedError: Failed in nopython mode pipeline (step: native lowering)
   Cannot cast reflected list(float64)<iv=None> to array(float64, 1d, A): %"inserted.parent.1" = insertvalue {i8*, i8*} %"inserted.meminfo.2", i8* %"arg.value.1", 1
   During: lowering "castedval = call $38load_global.5(value, $52load_deref.8, func=$38load_global.5, args=[Var(value, dictobject.py:714), Var($52load_deref.8, dictobject.py:716)], kws=(), vararg=None, varkwarg=None, target=None)" at C:\Projects\mpegs\venv\Lib\site-packages\numba\typed\dictobject.py (716)
  raised from C:\Projects\mpegs\venv\Lib\site-packages\numba\core\base.py:704

During: typing of setitem at C:\Projects\mpegs\venv\Lib\site-packages\numba\typed\typeddict.py (34)

File "..\..\venv\Lib\site-packages\numba\typed\typeddict.py", line 34:
def _setitem(d, key, value):
    d[key] = value
    ^

Can you provide a complete program of what you’re trying to run?

Separately, I think the typed dict is not thread safe.

Hi @Lash-L

With the line lookup_table = Dict.empty(types.unicode_type, numba.float64[:]) you tell Numba that the key is a string and the value is a one-dimensional array of floats. With lookup_table[key] = (scaled_x_coord, scaled_y_coord) you try to insert a key-value pair where the key is a types.UniTuple(types.int64, 2) and the value is a types.UniTuple(types.float64, 2). So there are two solutions to this problem. Either you specify the types correctly as follows:

key_type = types.UniTuple(types.int64, 2)
val_type = types.UniTuple(types.float64, 2)

@numba.njit
def numba_create_world_lookup_table(local_x_scale, local_y_scale, local_height, local_width):
    lookup_table = Dict.empty(key_type, val_type)
    ...

Or you omit the types and let Numba infer them:

@numba.njit
def numba_create_world_lookup_table(local_x_scale, local_y_scale, local_height, local_width):
    lookup_table = {}
    ...

I’m interested in seeing if the dictionary is indeed faster. Can you report your findings here?

My intuition is that the dictionary is going to be orders of magnitude slower both for reads and writes irrespective of whether the code can run in parallel. Hash tables (dictionaries) need to hash their keys, go through a few steps to find the bin where the value is, check for equality on the bin entries and then return the value for the entry. That’s slow because of the extra steps, because more data structures are touched along the way (putting more stress on L1 cache), and because it is essentially a blind read/write so the program can’t take any shortcuts by abusing guarantees on what memory addresses are going to be touched ahead of time. Nothin is faster than reading/writing to contiguous arrays, so I’m guessing whatever your old implementation was is about as fast as you’re going to get. Always a good idea to check differences in execution time between different implementations though.