Hey,
I’m having some issues using Numba in combination with Python’s type annotation, specifically the method for adding additional metadata (in addition to a datatype) as specified in PEP 593.
This allows me to annotate the inputs and outputs of my functions with any custom metadata, which I can then use to automatically create a computation graph based on a large set of functions, some available inputs, and some desired outputs. Using pure Python this is possible by using the available functions from the typing module.
Ideally I would use numba.vectorize
for the most part, and perhaps numba.guvectorize
when functions have multiple return values. The problem is that neither preserves the type hints.
A quick investigation, including potential workarounds like using inspect.getfullargspec
and the docstring resulted in the table shown below. Using getfullargspec
would only provide information on the inputs, not outputs, so that wouldn’t be great. Using the docstring is more flexible (eg Numpy-doc etc), but requires encoding and parsing strings, which also seems a lot worse compared to type hints.
function type | typing.get_type_hints | inspect.getfullargspec | func.doc |
---|---|---|---|
pure python | yes | yes | yes |
njit.py_func | yes | yes | yes |
njit | no | no | yes |
guvectorize | empty dict | incorrect | None |
vectorize | no | no | yes |
It looks like guvectorize
exposes some sort of Python function, since all options “work”, but it’s not the function that’s being decorated.
Since njit
keeps the original Python function available at the .py_func
attribute, that would work well. I do however rather have all the goodness of automatic broadcasting and the compilation targets (parallel) that (gu)vectorize provide. All inputs can be scalar, 1D, 2D, 3D arrays etc, which works seamlessly with vectorize
, but is a lot more involved when using njit
.
If I only had a few functions, wrapping them in a separate Python function would be fine. But I’m aiming to use this with perhaps hundreds of functions, and the set of functions is quite dynamic. And also includes functions provided by users. So having to wrap each would be a lot of overhead, extra code management, and a barrier for users to provide their own. Therefore I would really like to avoid that.
One of the most elegant workarounds I can think of is to create my own decorator which parses whatever metadata I want prior to Numba having a go at it. This introduces the problem that my own decorator would interact with a normal Python function, but the Numba decorator returns a different function, so somehow the connection between both functions needs to be made. Since Numba seems to use the same name as Python, using the func.__name__
attribute could work, and is actually quite a nice solution. Can I however rely on Numba never changing the __name__
attribute to anything different? Given that it’s protected, the conventional answer is of course “no”, but how bad would it be?
I would be curious to hear from people what they think of this. Has anyone ever done something similar? Are there other workarounds that I haven’t thought of?
Here is a notebook replicating what I’ve been trying so far. It’s simplified a lot, but captures the gist of it I think. My extra annotation is just a string in this example, but PEP 593 allows it to be any object. I’m assuming things like that don’t affect how it all relates to Numba.