Replacing @generated_jit with @overload

Thanks to a suggestion - I’m moving (CPU) jitted code from @generated_jit to @overload so that I can use it from CUDA.

@overload seems to focus on the case where there is some other python function you would like to create a jittable implementation for. In my case, I have several of “my own” @generated_jit functions. What is the best practice? Do I need to create a stub function in python to “overload”?

Does the pattern analogous to “single_dispatch” work - ie -

@overload
def foo(...): ...

@overload(foo)
def foo_alt(...): ...

@overload(foo)
def foo_alt_2(...): ...

Whether or not the first @overload w/ no arguments works, what order are they tried during compilation? Or is it undefined? And is there some specific error to throw to signal a particular implementation doesn’t handle the argument types?

You do need to create stub functions to overload. Here’s an example of some conversions from generated_jit to overload: Convert implementations using generated_jit to overload by gmarkall · Pull Request #8467 · numba/numba · GitHub - I would have pointed you to this before, but I only just remembered it - apologies!

I think it’s not defined what order overloads are tried in (I certainly wouldn’t like to make guarantees about it). If a particular implementation doesn’t handle the given argument types, it can just return None.

1 Like

Another issue I’m trying to get my head around is if its possible to ensure that python calls to @overlay functions call jitted versions. With @generated_jit, there is no python stub. Now when I convert to @overlay it would seem that python code always calls the python version.

How to ensure that python code calls jitted version? I can think of:

def foo_base(a): ...

@overlay(foo_base)
def foo_base_ovr(a): ...

@nb.njit
def foo(a):
    return foo_base(a)

Then code can call foo the same way that would have called previous generated_jit function. But it seems like a lot of cruft, and I wonder if it will boil down to the same byte code, or will it be less performant?

[NB unless I am thinking of this wrong, would seem to be a great feature to add option for overlay not to take a python function to wrap (optional missing positional argument), in which case it would do what I just wrote above - functioning like @generated_jit without having to write the cruft, and possibly allowing for easier optimization.]

IIUC the related issue is Replicate old generated_jit behavior on basis of overload for only-jitted variants of functions · Issue #8897 · numba/numba · GitHub, which has been added to the agenda for next Tuesday: Numba Meeting: 2023-04-18 - HackMD