How to use a sequence as dict key?

I need to index a dict with sequences.
In non-numba mode I normally use tuples, but in numba I can’t because the tuple() constructor is not supported. Is there any way to achieve this?

EDIT: I have found this PR: unify tuple creation intrinsics by ARF1 · Pull Request #5169 · numba/numba · GitHub - can it help?

Hi @ziofill

To create a numba.typed.Dict with tuple keys would require the keys to be all of the same type (same length, same dtype(s)), is this the case you have? i.e. this sort of thing:

In [1]: from numba.typed import Dict

In [2]: d = Dict()

In [3]: d[(1, 'a', 2j)] = 1

In [4]: d[(2, 'c', 5j)] = 2

Hi @stuartarchibald, thanks for the input. This could work, but in my case:

  • the keys have variable length
  • the length is known only at runtime

Even if I had a tuple constructor (something like the PR in the link above), would it still be problematic that the keys have variable length?

Hi @ziofil,

Numba’s typed.Dict requires that keys are all of the same type (Numba type that is) [docs]. Could you perhaps hash your sequence to create an int64 as the key and the just do all look-ups through that hash key generator?

That sounds like a possible way around it! I’ll try, thanks for the suggestion : )

No problem, if this works then there’s some things you can do with StructRef High-level extension API — Numba 0.52.0-py3.7-linux-x86_64.egg documentation to make the interface cleaner (i.e. automatically hash on access).

Out of interest, what is the type of these sequences?

Thanks for the link, this is a bit perhaps too involved for me, but with a bit of patience I could make it work. In non-numba mode the keys are tuples of positive integers.

Here’s something to get you started if you want to go down this route:

from numba import njit, literal_unroll, types
from numba.typed import Dict
import numpy as np
from numba.experimental import structref
from numba.extending import overload
import operator

# The idea here is to wrap a typed.Dict in another type, the "TupleKeyDictType".
# The purpose of this is so that operations like __getitem__ and __setitem__
# can be proxied through functions that call `hash` on the key. This makes it
# possible to have something that behaves like a dictionary, but supports
# heterogeneous keys (tuples of varying size/type).

# Define a the new type and register it
class TupleKeyDictType(types.StructRef):
    def preprocess_fields(self, fields):
        return tuple((name, types.unliteral(typ)) for name, typ in fields)

# Define the Python side proxy class
class TupleKeyDict(structref.StructRefProxy):
    def wrapped_dict(self):
        return TupleKeyDict_get_wrapped_dict(self)

# Set up the wiring for it, "wrapped_dict" is the only member in the "struct"
# and it refers to the typed.Dict instance in use
structref.define_proxy(TupleKeyDict, TupleKeyDictType, ["wrapped_dict"])

# Overload operator.getitem for the TupleKeyDictType, note how defers the look
# up to the wrapped_dict member and hashes the key
def ol_tkd_getitem(inst, key):
    if isinstance(inst, TupleKeyDictType):
        def impl(inst, key):
            return inst.wrapped_dict[hash(key)]
        return impl

# Overload operator.setitem for the TupleKeyDictType, again, it's hashing the
# key before use.
def ol_tkd_setitem(inst, key, value):
    if isinstance(inst, TupleKeyDictType):
        def impl(inst, key, value):
            inst.wrapped_dict[hash(key)] = value
        return impl

# quick demonstration
def foo(keys, values):
    # Create a dictionary to wrap
    wrapped_dictionary = Dict.empty(types.intp, types.complex128)
    # wrap it
    tkd_inst = TupleKeyDict(wrapped_dictionary)

    # Add some items, this is a bit contrived...
    # keys is heterogeneous in dtype (different sized tuples) so needs loop
    # body versioning for iteration (i.e. literal_unroll).
    idx = 0
    for k in literal_unroll(keys):
        tkd_inst[k] = values[idx]
        idx += 1

    # print the wrapped instance

    # demo getitem
    print("getitem", (1, 2), "gives", tkd_inst[(1, 2)])

keyz = ((1, 2), (3, 4, 5), (6,))
valuez = (1j, 2j, 3j)

foo(keyz, valuez)

hope this helps.

That’s amazing, thank you so much! :smiley:

No problem, do let us know how you get on :slight_smile:

I think I’ve got almost all of the parts to assemble what I need, the last obstacle is that I can’t seem to get this generator to be njitted:

def drop_elements(tup: tuple):
    "Yields each pair (element, tuple.drop(element)) of the given tuple [.drop() only for illustration]"
    for i,element in enumerate(tup):
        yield element, tup[:i] + tup[i+1:]

I think the problem is that the new tuple (the one obtained by dropping an element) should be constructed in a different way?

This probably does what you want, but note it’s using “unsafe” Numba-internal things:

from numba import njit
import numpy as np
from numba.cpython.unsafe.tuple import tuple_setitem

def drop_elements(tup: tuple):
    for x in range(len(tup)):
        empty = tup[:-1] # it's not empty, but its the right size and will be mutated
        idx = 0
        for i in range(len(tup)):
            if i != x:
                empty = tuple_setitem(empty, idx, tup[i])
                idx += 1
        yield tup[x], empty

z = [x for x in drop_elements((1,2,3))]

the reason the expression yield element, tup[:i] + tup[i+1:] won’t compile is that i isn’t constant and so Numba can’t work out what size the tuple resulting from tup[:i] should be, essentially, dynamic slicing of tuples isn’t supported.

wow, there are so many useful functions hidden (to me) inside the modules of numba. I didn’t know about tuple_setitem!

By the way, this code is part of a research paper on quantum computing. If this works out I’ll get back to you to add you to the acknowledgements.

No problem. Glad this solve the issue for you :slight_smile:

This FAQ entry on referencing/citing Numba may be useful: Frequently Asked Questions — Numba 0.52.0-py3.7-linux-x86_64.egg documentation

Also, for specific releases, DOI are provided by, there’s a link on the README.rst at GitHub - numba/numba: NumPy aware dynamic Python compiler using LLVM to the latest release DOI (It’s a badge at the top).