Memoization for jitted functions called inside of other jitted functions

Hi all, I’ve got a question regarding memoization. I’m setting up a Monte Carlo simulation that involves computing some information about a design matrix which returns a TypedDict, which is then passed to my function that does the random sampling.

The computation on the design matrix takes much longer than the random sampling (depending on the size of the design matrix, but even in small samples it’s 10x slower), so my current solution is to have two separate jitted functions, one for making the TypedDict (and has some form of memoization) which is returned to Python, which is then immediately passed into the other function. I expect that this all would run faster if I could combine all of these things into a single jitted function. Given that the precision of the simulation is increased with more resamples, I’m really looking for any speed gains possible.

My question is - is there a way to memoize a jitted function that is called from another jitted function?

Here’s a minimal example.

#Current solution:
@nb.jit
def make_dict(data):
    #do some stuff
    return dict  #this is a dict of (X,Y) tuples to 1D arrays of ints, but the arrays are not necessarily the same length

@nb.jit
def random_sampling(data, dict):
    #do some more stuff
    return resampled_data

def monte_carlo(data):
    dict = make_dict(data) 
    #cache this result
    resampled_data = random_sampling(data, dict)
    return resampled_data

#what i would like to do

@nb.jit
def random_sampling(data):
    dict = make_dict(data) ###with memoization
   #do some stuff
    return resampled_data

Hi @rishi-kulkarni
Welcome to the board! :slight_smile:

I guess that you are looking for something similar to the standard libraries LRU cache, correct?
I had a similar question a while back, and unfortunately we were unable to figure out a straightforward solution back then. (Maybe mostly because it was not important enough for me to sink much time into it).

That said it does not seem to be hopeless. @luk-f-a came up with some proof of concept. You can read up on our discussion here: Result caching in Numba · Issue #4062 · numba/numba · GitHub

I’d be very curious to see where this is going, and since some time has passed since that discussion, maybe some of the updates to numba in the meantime can help with finding a nice way to deal with this.
Generally the wish for result caching seems like a no-brainer in some applications that numba is touching on, since it is all about performance at the end of the day :slight_smile:

if I remember correctly our conclusion was that it’s not possible to build a memoize decorator. It’s necessary to pass the dictionary explicitly (as a function input), because Numba does not accept global dictionaries.

Yep, I think you are right. Globals are constants to numba. A bit inconvenient, but hopefully not a total deal breaker. I seem to remember that the global Dict issue was raised explicitly somewhere recently, but I cannot seem to find it (neither here nor on Github)

Shame. Well, I guess that makes sense. There’s not that much friction between my various jitted functions, anyway, but it felt kind of silly to keep passing things back to Python just to immediately pass them back into Numba. It is what it is.