Heterogeneous immutable string key dictionaries?


I’m trying to figure out how to use heterogeneous immutable string key dictionaries (new feature in release 0.51.0: https://numba.pydata.org/numba-doc/dev/release-notes.html). I tried this code block:

from numba import njit
import numpy as np

def foo():
    d = dict()
    d['three'] = np.arange(3)
    d['five'] = np.arange(5)
    d['six'] = 8
    return d

d = foo()
print(d)    # {3: [0 1 2], 5: [0 1 2 3 4]}

but it failed. My assumption was that as long as all the keys are strings, heterogeneous means the values can take on multiple different types. Can anyone provide an example of how to create a heterogeneous immutable string key dictionary? Also, where would I find this in the documentation? I’ve not been able to find anything. I’m using numba version 0.51.2.

hi @calbaker, there’s an example in the release notebook https://mybinder.org/v2/gh/numba/numba-examples/master?filepath=notebooks%2FNumba_051_release_demo.ipynb.

You have to create them as dictionary literals, because they are immutable. You built your example by mutating an empty dictionary, so d cannot be interpreted as an immutable dictionary.
Also, they apparently cannot be returned to interpreted code (probably a missing feature rather than a fundamental limitation), but the example below shows that they work

... @njit
... def foo():
...     d = {'three': np.arange(3), 'five': np.arange(5), 'six': 8}
...     return d['three'], d['six']
... d = foo()
... print(d)
(array([0, 1, 2]), 8) 

I hope this helps!


1 Like

Hi @calbaker

@luk-f-a is spot on with the explanation, thanks @luk-f-a! I’ll expand on why is it like this below.

The requested documentation is here, the docs should be searchable from the search box, if this sort of thing isn’t turning up please report it as a bug, thanks!:

Expanding on “immutable string key dictionaries” a bit… Numba works on bytecode not python source, and it so happens that the bytecode for creating a dictionary like x = {'a':1, 'a':2, ...} appears as a very specific bytecode with a very specific pattern in terms of building the dictionary.

Compare this:

In [10]: def foo():
    ...:     d = dict()
    ...:     d['three'] = np.arange(3)
    ...:     d['five'] = np.arange(5)
    ...:     d['six'] = 8
    ...:     return d

In [11]: from dis import dis

In [12]: dis(foo)
  2           0 LOAD_GLOBAL              0 (dict)
              2 CALL_FUNCTION            0
              4 STORE_FAST               0 (d)

  3           6 LOAD_GLOBAL              1 (np)
              8 LOAD_METHOD              2 (arange)
             10 LOAD_CONST               1 (3)
             12 CALL_METHOD              1
             14 LOAD_FAST                0 (d)
             16 LOAD_CONST               2 ('three')
             18 STORE_SUBSCR

  4          20 LOAD_GLOBAL              1 (np)
             22 LOAD_METHOD              2 (arange)
             24 LOAD_CONST               3 (5)
             26 CALL_METHOD              1
             28 LOAD_FAST                0 (d)
             30 LOAD_CONST               4 ('five')
             32 STORE_SUBSCR

  5          34 LOAD_CONST               5 (8)
             36 LOAD_FAST                0 (d)
             38 LOAD_CONST               6 ('six')
             40 STORE_SUBSCR

  6          42 LOAD_FAST                0 (d)
             44 RETURN_VALUE

with this

In [13]: def bar():
    ...:     d = {'three': np.arange(3), 'five': np.arange(5), 'six': 8}
    ...:     return d

In [14]: dis(bar)
  2           0 LOAD_GLOBAL              0 (np)
              2 LOAD_METHOD              1 (arange)
              4 LOAD_CONST               1 (3)
              6 CALL_METHOD              1
              8 LOAD_GLOBAL              0 (np)
             10 LOAD_METHOD              1 (arange)
             12 LOAD_CONST               2 (5)
             14 CALL_METHOD              1
             16 LOAD_CONST               3 (8)
             18 LOAD_CONST               4 (('three', 'five', 'six'))
             20 BUILD_CONST_KEY_MAP      3
             22 STORE_FAST               0 (d)

  3          24 LOAD_FAST                0 (d)
             26 RETURN_VALUE

Note how in the first example the bytecodes are all very general, dict comes in as a load from globals and it called and then stored into d, the items are then added via loading of const keys and STORE_SUBSCR. In the second example there’s the easily identified BUILD_CONST_KEY_MAP bytecode which is a bytecode specialisation of BUILD_MAP but with constant keys. It’s this that Numba is spotting and then subsequently figuring out that if it’s string keys it can make this into a heterogeneous immutable string key dictionary!

These dictionaries cannot be returned at present, just making them work at all was really complicated and the no-return restriction was necessary to limit the scope of the work (I’m also not quite sure how to do this technically yet, inside the compiler they are essentially namedtuples!). It is however entirely legal for them to cross function boundaries such that they can be passed as e.g. a configuration like object.

Hope this helps?

1 Like

@stuartarchibald, @luk-f-a

Thanks for the super helpful examples and explanations! I realize that I missed the immutable part in the way I was declaring the dictionary in my example, and that was the only thing wrong really. However, the way I am really hoping to use the immutable heterogeneous string key dictionaries is as a configuration object for jit classes. This is the small example I tried to use:

from numba import jitclass
from numba.types import LiteralStrKeyDict
@jitclass([('d', LiteralStrKeyDict),])
class MyJitClass(object):
    def __init__(self):
        self.d = {'three': np.arange(3), 'five': np.arange(5), 'six': 8}

This gives the error:
TypeError: spec values should be Numba type instances, got <class 'numba.core.types.containers.LiteralStrKeyDict'>

I should also say that my workaround for this is to create a configuration class that contains all the properly typed attributes that get used by my jitclass that does all the work. It may be that my solution is actually better than the heterogeneous immutable string key (HISK) dict. If the HISK dict can be implemented as a jitclass attribute, it’d be interesting to compare the two approaches. I like the HISK dict idea because it involves less overhead in adding new configuration parameters, as they don’t need to be explicitly typed.

that error is due to the difference between type classes and type instances in Numba. Float is a type class, float64 is a type instance. LiteralStrKeyDict is a type class. Jitclass declaration requires type classes.

This example shows how I thought this would work, but somehow it doesn’t. We probably need Stuart to help out. That said, if I had to do it I would not use a LiteralStrKeyDict. I would write each parameter as a jitclass parameter.

from numba.experimental import jitclass
from numba import types
from numba.types import LiteralStrKeyDict
@jitclass([('d', LiteralStrKeyDict( {types.StringLiteral('three'):  types.int64})),])
class MyJitClass(object):
    def __init__(self):
        self.d = {'three': 8}

Ah, thanks!

I think you’ve convinced me that my workaround described above is actually a good way to do this.

No problem. Perhaps try:

from numba.experimental import jitclass
from numba.types import LiteralStrKeyDict
from numba import types
import numpy as np

@jitclass([('d', LiteralStrKeyDict(
    {types.StringLiteral('three'): types.int64[::1],
     types.StringLiteral('five'): types.int64[::1],
     types.StringLiteral('six'): types.intp}))])
class MyJitClass(object):
    def __init__(self):
        self.d = {'three': np.arange(3), 'five': np.arange(5), 'six': 8}

    def get_three(self):
        return self.d['three']

jc = MyJitClass()

type spec should be composed of type instances not type classes, the type “spelling” has to be very specific, there’s no/limited (I forget which) type inference in the jitclass constructor.

You might also find StructRef useful…