Understanding Functiong Caching

I have a simple numba example where I

  1. Define a numba function
  2. Enable caching via enable_caching() (instead of cache=True)
  3. Call the function
  4. Print the contents of the cache directory
rom numba import njit
from os import listdir
import pathlib

PYCACHEDIR = '__pycache__'

@njit
def add(a, b):
    return a + b


if __name__ == "__main__":
    add.enable_caching()
    add(4, 5)
    print(listdir(PYCACHEDIR))

As expected, this prints out:

['test.add-9.py312.nbi', 'test.add-9.py312.1.nbc']

However, if I:

  1. Manually clear/delete the cache
  2. Call the add function again
  3. Print the contents of the cache directory
    [f.unlink() for f in pathlib.Path(PYCACHEDIR).glob("*nb*") if f.is_file()]
    print(listdir(PYCACHEDIR))
    add(4, 5)
    print(listdir(PYCACHEDIR))

Then nothing gets printed as the cache directory is empty! Since I had originally called add.enable_cache(), I would’ve expected the function to be cached again (after having manually cleared/deleted the cache) once I call the function again. Instead, it almost appears as if the writing to the PYCACHEDIR is disabled and, perhaps, the add function is either:

  1. Not cached at all (which would be undesirable)
  2. or cached elsewhere (but where?!)

Can anybody tell me what is happening here?


The full script is as follows:

from numba import njit
from os import listdir
import pathlib

PYCACHEDIR = '__pycache__'

@njit
def add(a, b):
    return a + b


if __name__ == "__main__":
    add.enable_caching()
    add(4, 5)
    print(listdir(PYCACHEDIR))
    [f.unlink() for f in pathlib.Path(PYCACHEDIR).glob("*nb*") if f.is_file()]
    print(listdir(PYCACHEDIR))
    add(4, 5)
    print(listdir(PYCACHEDIR))
1 Like

Perhaps I’m not thinking about this correctly, but I’d expect a disk cache save/load only the first time a signature is compiled. It doesn’t make sense for numba to continually check if some outside process is changing the cache files.

After the first usage, the function is memory-cached and won’t be reloaded from disk. Setting the NUMBA_DEBUG_CACHE environment variable prints out some useful information, and setting a breakpoint inside numba/core/caching.py might be interesting for finding the location of the logic that shortcuts already-saved file caching.

Apologies for the lack of links, I’m responding using a cell phone which is a struggle.

EDIT: I ran a quick test, changed '4’ to ‘4.0’ in the second call to add() and got new cache files as expected. That lines up with my comments above.

3 Likes

I think @nelson2005 answer is correct. After you have a dispatcher object, it will only redirect the call to the generated function. Hence, caching won’t be regenerated.

1 Like

Ahh, gotcha. It looks like explicitly calling the .recompile() method will cause the deleted files to be regenerated:

from numba import njit
from os import listdir
import pathlib

PYCACHEDIR = '__pycache__'

@njit
def add(a, b):
    return a + b


if __name__ == "__main__":
    add.enable_caching()
    add(4, 5)
    print(listdir(PYCACHEDIR))
    [f.unlink() for f in pathlib.Path(PYCACHEDIR).glob("*nb*") if f.is_file()]
    print(listdir(PYCACHEDIR))
    add.recompile()
    add(4, 5)
    print(listdir(PYCACHEDIR))
1 Like