Replacing @generated_jit with @overload

Thanks to a suggestion - I’m moving (CPU) jitted code from @generated_jit to @overload so that I can use it from CUDA.

@overload seems to focus on the case where there is some other python function you would like to create a jittable implementation for. In my case, I have several of “my own” @generated_jit functions. What is the best practice? Do I need to create a stub function in python to “overload”?

Does the pattern analogous to “single_dispatch” work - ie -

@overload
def foo(...): ...

@overload(foo)
def foo_alt(...): ...

@overload(foo)
def foo_alt_2(...): ...

Whether or not the first @overload w/ no arguments works, what order are they tried during compilation? Or is it undefined? And is there some specific error to throw to signal a particular implementation doesn’t handle the argument types?

You do need to create stub functions to overload. Here’s an example of some conversions from generated_jit to overload: Convert implementations using generated_jit to overload by gmarkall · Pull Request #8467 · numba/numba · GitHub - I would have pointed you to this before, but I only just remembered it - apologies!

I think it’s not defined what order overloads are tried in (I certainly wouldn’t like to make guarantees about it). If a particular implementation doesn’t handle the given argument types, it can just return None.

Another issue I’m trying to get my head around is if its possible to ensure that python calls to @overlay functions call jitted versions. With @generated_jit, there is no python stub. Now when I convert to @overlay it would seem that python code always calls the python version.

How to ensure that python code calls jitted version? I can think of:

def foo_base(a): ...

@overlay(foo_base)
def foo_base_ovr(a): ...

@nb.njit
def foo(a):
    return foo_base(a)

Then code can call foo the same way that would have called previous generated_jit function. But it seems like a lot of cruft, and I wonder if it will boil down to the same byte code, or will it be less performant?

[NB unless I am thinking of this wrong, would seem to be a great feature to add option for overlay not to take a python function to wrap (optional missing positional argument), in which case it would do what I just wrote above - functioning like @generated_jit without having to write the cruft, and possibly allowing for easier optimization.]

IIUC the related issue is Replicate old generated_jit behavior on basis of overload for only-jitted variants of functions · Issue #8897 · numba/numba · GitHub, which has been added to the agenda for next Tuesday: Numba Meeting: 2023-04-18 - HackMD