Numba Dict function not compiling

sanjaysrikakulam · May 2, 2021, 11:05pm

Hi,

I am new to Numba,

I am trying to create a dictionary with uint as keys and a list of uint as values and the size (number of keys) of the dictionary will be a few hundred million or a few 10s of billions. So to gain speed and efficiency over the python dictionary I came across Numba and would like to use it.

Python = 3.7.7
Numba = 0.53.1

from numba import types, typed, njit

@njit
def numba_dict(list_of_lists):
key_dtype = types.uint64
value_dtype = types.uint8
bf_sparse_dict = typed.Dict.empty(
    key_type=key_dtype, value_type=types.ListType(value_dtype)
)

for idx, inner_list in enumerate(list_of_lists):
    for key in inner_list:
        if key not in bf_sparse_dict:
            bf_sparse_dict[key] = typed.List.empty_list(value_dtype)
            bf_sparse_dict[key].append(idx)
        else:
            bf_sparse_dict[key].append(idx)

return bf_sparse_dict

When I try to run this function from a python file I get the following error

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<class 'numba.core.types.containers.ListType'>) found for 
signature:

 >>> ListType(class(uint8))

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'ListType': File: numba/core/extending.py: Line 39.
    With argument(s): '(class(uint8))':
   Rejected as the implementation raised a specific error:
     TypingError: List() argument must be iterable
  raised from /home/ssk/anaconda3/envs/zarrtest/lib/python3.7/site-packages/numba/typed/typedlist.py:561

During: resolving callee type: Function(<class 'numba.core.types.containers.ListType'>)
During: typing of call at bitarray_leveldb_sparse_index.py (270)


File "bitarray_leveldb_sparse_index.py", line 270:
def numba_dict(bloomfilters_list):
    <source elided>
    bf_sparse_dict = typed.Dict.empty(
        key_type=types.uint64, value_type=types.ListType(types.uint8)

But the same piece of code works completely fine on an interpreter from command line (i.e without compilation with @njit decorator)

Any pointer to what I am doing wrong would be really helpful.

Thank you!

Hannes · May 3, 2021, 9:06am

Hi @sanjaysrikakulam and welcome to the board!

1st of all, regarding your bug: I think this is caused by your use of types.ListType inside of an njitted function. It may be a bit counter intuitive, but not all of the typing stuff that numba ships, can itself be used in compiled functions, as far as I know. This however only concerns the type definition itself. The resulting type is of course intended to be used with numba. A good way around this is to simply push the type definitions outside of your function. Since your types are truly constant and not inferred from the incoming data, that should not be an issue.

2nd: It seems to me, that the slight modification I proposed then raises a different bug, related to the fact that you are putting a python list of lists into the function (I hope I am correct on this). These are so called reflected lists to numba, since it has to report all mutations that happened inside of compiled code back to the Python interpreter. (You are not mutating the incoming list here as far as I can tell, but due to the nature of and promises around reflected lists numba probably still enforces the same criteria - maybe someone can fact-check me on this). It seems that numba cannot deal with reflected lists of lists at this point (at least even the most simple scenario possible - immediately returning list_of_lists - failed for me).
If at all possible you might be better of doing the conversion from Python into numba datatypes outside the njitted function. I also received a deprecation warning, announcing that reflected list support inside compiled functions may be dropped in the long run, so implementing a new solution on reflected lists may not be the way to go.

3rd: Here is where my comments become a bit hazy. But you mentioned that your dict may have 10s of billions of entries. Even if we just account 2 bytes per item (very optimistic) in the dict, that would equate 10s of GBs of data. I wonder if dicts are the way to go for such a massive dataset. That is of course closely linked to what you actually want to do with this data afterwards, and what the exact format of your input data is. But to me it looks a bit as if something like a numpy array may be a more efficient container.

Cheers
Hannes

Topic		Replies	Views
Numba dictionary is not working Support: What is this error message?	1	673	May 10, 2023
Numba types inside nopython functions Support: How do I do ...?	2	532	February 9, 2021
How to pass dictionaries with Function type as value within njitted func Support: How do I do ...?	2	219	October 16, 2023
Numba dictionary(list) and numpy structured arrays? Support: How do I do ...?	4	329	October 19, 2023
An array of functions with Numba Community Support	0	279	April 19, 2022

Numba Dict function not compiling

Related Topics