The constant-tuple version also ends up running a list comprehension in pure Python, but this time it runs through the elements of lists of (tuple, float) returned by a jitted function, instead of through the key-value pairs of TypedDict
s. So it would seem to me that the issue is a substantial overhead in looping through TypedDict
keys versus List
entries when in a pure Python context. Is this a well-known issue with TypedDict
? It is likely to improve in future versions?
@ulupo this is commonly asked, I’ve opened a PR to add it to the FAQ section here Add note about performance of typed containers from the interpreter. by stuartarchibald · Pull Request #6440 · numba/numba · GitHub. The contents of which I hope will answer your question! Hope this helps?
Thank you! So, unlikely to change in the near future.
@luk-f-a @stuartarchibald I may have found a solution which combines the best of both worlds: writing my own njittable “dictionary unpacking” function which converts each dictionary into a list of (key, value) [(tuple, float) in my case] and then returns it to the python context! This way I perfectly level with the performance of the single-dictionary fixed-tuple-length implementation.
The issue is that in this implementation my dictionaries have variable-length keys. But the lengths (and hence the tuple types) are known at call time, so I thought I could pass the type explicitly to the jitted function. What remains to do is to force the compiler into doing the type inference I want. I did this in a hilariously hacky way by creating a dummy empty dictionary with the passed type, and creating a link with the argument dictionary which reassures the compiler about the argument type. In other words, something like:
@njit
def _unpack_dict(d, typ):
d_dummy = Dict.empty(typ, types.float64)
k = next(iter(d))
d_dummy[k] = d[k] # Helps the compiler infer the type of d!
l = []
for k in d:
# List of heterogeneous pairs, but they are literal so it is OK
l.append((k, np.sqrt(d[k])))
return l
In the code, typ
will be types.UniTuple(types.int64, dim + 1)
where dim
varies.
If there’s a more elegant way to achieve the same goal I would be happy to hear! But, as far as performance and memory use goes, I am now exactly where I want to be (save for a rewrite using sparse structures which I have no courage to undertake yet!).