Memoization for jitted functions called inside of other jitted functions

rishi-kulkarni · April 28, 2021, 8:50pm

Hi all, I’ve got a question regarding memoization. I’m setting up a Monte Carlo simulation that involves computing some information about a design matrix which returns a TypedDict, which is then passed to my function that does the random sampling.

The computation on the design matrix takes much longer than the random sampling (depending on the size of the design matrix, but even in small samples it’s 10x slower), so my current solution is to have two separate jitted functions, one for making the TypedDict (and has some form of memoization) which is returned to Python, which is then immediately passed into the other function. I expect that this all would run faster if I could combine all of these things into a single jitted function. Given that the precision of the simulation is increased with more resamples, I’m really looking for any speed gains possible.

My question is - is there a way to memoize a jitted function that is called from another jitted function?

Here’s a minimal example.

#Current solution:
@nb.jit
def make_dict(data):
    #do some stuff
    return dict  #this is a dict of (X,Y) tuples to 1D arrays of ints, but the arrays are not necessarily the same length

@nb.jit
def random_sampling(data, dict):
    #do some more stuff
    return resampled_data

def monte_carlo(data):
    dict = make_dict(data) 
    #cache this result
    resampled_data = random_sampling(data, dict)
    return resampled_data

#what i would like to do

@nb.jit
def random_sampling(data):
    dict = make_dict(data) ###with memoization
   #do some stuff
    return resampled_data

Hannes · April 29, 2021, 7:20am

Hi @rishi-kulkarni
Welcome to the board!

I guess that you are looking for something similar to the standard libraries LRU cache, correct?
I had a similar question a while back, and unfortunately we were unable to figure out a straightforward solution back then. (Maybe mostly because it was not important enough for me to sink much time into it).

That said it does not seem to be hopeless. @luk-f-a came up with some proof of concept. You can read up on our discussion here: Result caching in Numba · Issue #4062 · numba/numba · GitHub

I’d be very curious to see where this is going, and since some time has passed since that discussion, maybe some of the updates to numba in the meantime can help with finding a nice way to deal with this.
Generally the wish for result caching seems like a no-brainer in some applications that numba is touching on, since it is all about performance at the end of the day

luk-f-a · April 29, 2021, 10:06pm

if I remember correctly our conclusion was that it’s not possible to build a memoize decorator. It’s necessary to pass the dictionary explicitly (as a function input), because Numba does not accept global dictionaries.

Hannes · April 30, 2021, 7:27am

Yep, I think you are right. Globals are constants to numba. A bit inconvenient, but hopefully not a total deal breaker. I seem to remember that the global Dict issue was raised explicitly somewhere recently, but I cannot seem to find it (neither here nor on Github)

rishi-kulkarni · April 30, 2021, 7:57am

Shame. Well, I guess that makes sense. There’s not that much friction between my various jitted functions, anyway, but it felt kind of silly to keep passing things back to Python just to immediately pass them back into Numba. It is what it is.

Topic		Replies	Views
Controlling inlining of jitted functions (with TypedDict as argument) Support: How do I do ...?	2	138	December 7, 2023
Compile all Jitted functions before hand Support: How do I do ...?	2	223	July 20, 2023
Override or set default arguments of jitted function Support: How do I do ...?	2	426	May 6, 2021
Cache jitted function with jitclass argument Support: How do I do ...?	3	797	July 7, 2021
Does cache work for factory function? Support: How do I do ...?	2	435	February 19, 2021

Memoization for jitted functions called inside of other jitted functions

Related Topics