How to use a sequence as dict key?

ziofil · January 13, 2021, 2:31pm

I need to index a dict with sequences.
In non-numba mode I normally use tuples, but in numba I can’t because the tuple() constructor is not supported. Is there any way to achieve this?

EDIT: I have found this PR: unify tuple creation intrinsics by ARF1 · Pull Request #5169 · numba/numba · GitHub - can it help?

stuartarchibald · January 14, 2021, 9:16pm

Hi @ziofill

To create a numba.typed.Dict with tuple keys would require the keys to be all of the same type (same length, same dtype(s)), is this the case you have? i.e. this sort of thing:

In [1]: from numba.typed import Dict

In [2]: d = Dict()

In [3]: d[(1, 'a', 2j)] = 1

In [4]: d[(2, 'c', 5j)] = 2

ziofil · January 15, 2021, 2:44pm

Hi @stuartarchibald, thanks for the input. This could work, but in my case:

the keys have variable length
the length is known only at runtime

Even if I had a tuple constructor (something like the PR in the link above), would it still be problematic that the keys have variable length?

stuartarchibald · January 15, 2021, 6:25pm

Hi @ziofil,

Numba’s typed.Dict requires that keys are all of the same type (Numba type that is) [docs]. Could you perhaps hash your sequence to create an int64 as the key and the just do all look-ups through that hash key generator?

ziofil · January 15, 2021, 8:12pm

That sounds like a possible way around it! I’ll try, thanks for the suggestion : )

stuartarchibald · January 18, 2021, 9:55am

No problem, if this works then there’s some things you can do with StructRef High-level extension API — Numba 0.52.0-py3.7-linux-x86_64.egg documentation to make the interface cleaner (i.e. automatically hash on access).

Out of interest, what is the type of these sequences?

ziofil · January 18, 2021, 1:12pm

Thanks for the link, this is a bit perhaps too involved for me, but with a bit of patience I could make it work. In non-numba mode the keys are tuples of positive integers.

stuartarchibald · January 19, 2021, 10:40am

Here’s something to get you started if you want to go down this route:

from numba import njit, literal_unroll, types
from numba.typed import Dict
import numpy as np
from numba.experimental import structref
from numba.extending import overload
import operator

# The idea here is to wrap a typed.Dict in another type, the "TupleKeyDictType".
# The purpose of this is so that operations like __getitem__ and __setitem__
# can be proxied through functions that call `hash` on the key. This makes it
# possible to have something that behaves like a dictionary, but supports
# heterogeneous keys (tuples of varying size/type).

# Define a the new type and register it
@structref.register
class TupleKeyDictType(types.StructRef):
    def preprocess_fields(self, fields):
        return tuple((name, types.unliteral(typ)) for name, typ in fields)

# Define the Python side proxy class
class TupleKeyDict(structref.StructRefProxy):
    @property
    def wrapped_dict(self):
        return TupleKeyDict_get_wrapped_dict(self)

# Set up the wiring for it, "wrapped_dict" is the only member in the "struct"
# and it refers to the typed.Dict instance in use
structref.define_proxy(TupleKeyDict, TupleKeyDictType, ["wrapped_dict"])

# Overload operator.getitem for the TupleKeyDictType, note how defers the look
# up to the wrapped_dict member and hashes the key
@overload(operator.getitem)
def ol_tkd_getitem(inst, key):
    if isinstance(inst, TupleKeyDictType):
        def impl(inst, key):
            return inst.wrapped_dict[hash(key)]
        return impl

# Overload operator.setitem for the TupleKeyDictType, again, it's hashing the
# key before use.
@overload(operator.setitem)
def ol_tkd_setitem(inst, key, value):
    if isinstance(inst, TupleKeyDictType):
        def impl(inst, key, value):
            inst.wrapped_dict[hash(key)] = value
        return impl

# quick demonstration
@njit
def foo(keys, values):
    # Create a dictionary to wrap
    wrapped_dictionary = Dict.empty(types.intp, types.complex128)
    # wrap it
    tkd_inst = TupleKeyDict(wrapped_dictionary)

    # Add some items, this is a bit contrived...
    # keys is heterogeneous in dtype (different sized tuples) so needs loop
    # body versioning for iteration (i.e. literal_unroll).
    idx = 0
    for k in literal_unroll(keys):
        tkd_inst[k] = values[idx]
        idx += 1

    # print the wrapped instance
    print(tkd_inst.wrapped_dict)

    # demo getitem
    print("getitem", (1, 2), "gives", tkd_inst[(1, 2)])

keyz = ((1, 2), (3, 4, 5), (6,))
valuez = (1j, 2j, 3j)

foo(keyz, valuez)

hope this helps.

ziofil · January 19, 2021, 2:48pm

That’s amazing, thank you so much!

stuartarchibald · January 19, 2021, 5:37pm

No problem, do let us know how you get on

ziofil · January 20, 2021, 9:56am

I think I’ve got almost all of the parts to assemble what I need, the last obstacle is that I can’t seem to get this generator to be njitted:

@njit
def drop_elements(tup: tuple):
    "Yields each pair (element, tuple.drop(element)) of the given tuple [.drop() only for illustration]"
    for i,element in enumerate(tup):
        yield element, tup[:i] + tup[i+1:]

I think the problem is that the new tuple (the one obtained by dropping an element) should be constructed in a different way?

stuartarchibald · January 20, 2021, 10:54am

This probably does what you want, but note it’s using “unsafe” Numba-internal things:

from numba import njit
import numpy as np
from numba.cpython.unsafe.tuple import tuple_setitem

@njit
def drop_elements(tup: tuple):
    for x in range(len(tup)):
        empty = tup[:-1] # it's not empty, but its the right size and will be mutated
        idx = 0
        for i in range(len(tup)):
            if i != x:
                empty = tuple_setitem(empty, idx, tup[i])
                idx += 1
        yield tup[x], empty

z = [x for x in drop_elements((1,2,3))]
print(z)

the reason the expression yield element, tup[:i] + tup[i+1:] won’t compile is that i isn’t constant and so Numba can’t work out what size the tuple resulting from tup[:i] should be, essentially, dynamic slicing of tuples isn’t supported.

ziofil · January 20, 2021, 2:40pm

wow, there are so many useful functions hidden (to me) inside the modules of numba. I didn’t know about tuple_setitem!

By the way, this code is part of a research paper on quantum computing. If this works out I’ll get back to you to add you to the acknowledgements.

stuartarchibald · January 21, 2021, 12:39pm

No problem. Glad this solve the issue for you

This FAQ entry on referencing/citing Numba may be useful: Frequently Asked Questions — Numba 0.52.0-py3.7-linux-x86_64.egg documentation

Also, for specific releases, DOI are provided by zenodo.org, there’s a link on the README.rst at GitHub - numba/numba: NumPy aware dynamic Python compiler using LLVM to the latest release DOI (It’s a badge at the top).

ulupo · January 26, 2021, 5:26pm

I was about to open a new thread but having found this one I see that there is a lot of overlap with my own issue. Please feel free to mark this reply as off-topic and I will happily create a new thread if that’s better.

I am currently trying to njit an algorithm for processing Delaunay triangulations, with the following features:

The inputs are an array X and a homogeneous list simplices of tuples of integers. The shape of X and the length of each tuple are arbitrary, but what is always true is that X.shape[1] equals the length of each tuple in simplices plus 1. Call this the “dimension”, dim. The tuples may be assumed to be sorted.
In pure Python, I would have the algorithm work with and return a dictionary d with keys all integers from 1 to dim (included). To key i there would correspond another dictionary whose keys are all the sub-tuples of length i + 1 of the tuples in simplices, and whose values are floats. Example: d = {1: {(0, 1): 0., (0, 2): 0., (1, 2): 0.}, 2: {(0, 1, 2): 0.}} for dim equal to 2. Notice that d is inhomogeneous in values (and homogeneous in keys), but each d[i] is homogeneous (in both keys and values).
In the inner workings of the algorithm, and again assuming pure Python, each d[i] is constructed iteratively from d[i + 1]. In particular, one would want to write loops as follows:

# sigma is a tuple in d[i + 1].keys()
for k in range(i + 1):
    tau = sigma[:i] + sigma[i + 1:]
    # And do stuff with tau

Now I have at least two problems trying to njit this algorithm. One comes from point 2, because I would like to have an inhomogeneous dictionary like d. The second comes from 3, because numba is not happy with non-constant slicing. This seems to be just the same as @ziofil’s situation so I wonder if the recommendation is still to use tuple_setitem. As @ziofil already asked, I wonder if https://github.com/numba/numba/pull/5169 is related.

Thanks in advance for your help! Happy to provide more context if anything is unclear.

stuartarchibald · January 26, 2021, 5:42pm

Hi @ulupo

This sounds like an interesting problem, some of the suggestion in this thread are indeed related. In the interest of not mixing issues, perhaps open a new topic on this, ideally with a small reproducing example that can be run (i.e. python example.py) so that someone can take a look. I expect a “way through” can be found Thanks!

ulupo · January 26, 2021, 5:51pm

Thanks @stuartarchibald! Will do!

ulupo · January 26, 2021, 7:02pm

Perhaps because of substantial text overlap with the reply above, my new post has now been flagged as spam and hidden. Is there anything I can do @stuartarchibald?

stuartarchibald · January 26, 2021, 7:17pm

Thanks, I think I’ve sorted it out, please check and ping on gitter.im if there’s still an issue. Thanks again!

ulupo · January 27, 2021, 9:49am

stuartarchibald:

# quick demonstration
@njit
def foo(keys, values):
    # Create a dictionary to wrap
    wrapped_dictionary = Dict.empty(types.intp, types.complex128)
    # wrap it
    tkd_inst = TupleKeyDict(wrapped_dictionary)

    # Add some items, this is a bit contrived...
    # keys is heterogeneous in dtype (different sized tuples) so needs loop
    # body versioning for iteration (i.e. literal_unroll).
    idx = 0
    for k in literal_unroll(keys):
        tkd_inst[k] = values[idx]
        idx += 1

    # print the wrapped instance
    print(tkd_inst.wrapped_dict)

    # demo getitem
    print("getitem", (1, 2), "gives", tkd_inst[(1, 2)])

keyz = ((1, 2), (3, 4, 5), (6,))
valuez = (1j, 2j, 3j)

foo(keyz, valuez)

~~It seems that this demonstration fails for me when the dictionary value type is changed to types.float64 and we pass valuez = (1., 2., 3.). The error message is~~

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function setitem>) found for signature:
 
 >>> setitem(numba.TupleKeyDictType(('wrapped_dict', DictType[int64,float64]<iv=None>),), UniTuple(int64 x 2), float64)

The original version with complex numbers works. It also works if the value type is changed to types.float32, although it then throws a warning due to the conversion from float64 to float32! Any ideas why this might be happening? Thanks in advance!

UPDATE After restarting my jupyter notebook, this seems to have disappeared! I’ll keep you posted in case new issues emerge, but for now you can ignore all the above.

Topic		Replies	Views
Dynamic, heterogeneous dictionaries with tuple keys and non-constant tuple slicing Support: How do I do ...?	23	1396	January 28, 2021
How To Use Integer Sequence As Key of Typed Dict Support: How do I do ...?	7	788	December 14, 2021
How do I use tuples of different length as values in a dictionary Support: How do I do ...?	1	378	March 13, 2023
Dicts with tuple keys not working in 0.52.0 Community Support	3	496	December 11, 2020
Heterogeneous immutable string key dictionaries? Community Support	7	1337	October 30, 2020

How to use a sequence as dict key?

Related topics