Thanks to a suggestion - I’m moving (CPU) jitted code from @generated_jit to @overload so that I can use it from CUDA.
@overload seems to focus on the case where there is some other python function you would like to create a jittable implementation for. In my case, I have several of “my own” @generated_jit functions. What is the best practice? Do I need to create a stub function in python to “overload”?
Does the pattern analogous to “single_dispatch” work - ie -
Whether or not the first @overload w/ no arguments works, what order are they tried during compilation? Or is it undefined? And is there some specific error to throw to signal a particular implementation doesn’t handle the argument types?
I think it’s not defined what order overloads are tried in (I certainly wouldn’t like to make guarantees about it). If a particular implementation doesn’t handle the given argument types, it can just return None.
Another issue I’m trying to get my head around is if its possible to ensure that python calls to @overlay functions call jitted versions. With @generated_jit, there is no python stub. Now when I convert to @overlay it would seem that python code always calls the python version.
How to ensure that python code calls jitted version? I can think of:
Then code can call foo the same way that would have called previous generated_jit function. But it seems like a lot of cruft, and I wonder if it will boil down to the same byte code, or will it be less performant?
[NB unless I am thinking of this wrong, would seem to be a great feature to add option for overlay not to take a python function to wrap (optional missing positional argument), in which case it would do what I just wrote above - functioning like @generated_jit without having to write the cruft, and possibly allowing for easier optimization.]