Understanding Functiong Caching

I have a simple numba example where I

  1. Define a numba function
  2. Enable caching via enable_caching() (instead of cache=True)
  3. Call the function
  4. Print the contents of the cache directory
rom numba import njit
from os import listdir
import pathlib

PYCACHEDIR = '__pycache__'

@njit
def add(a, b):
    return a + b


if __name__ == "__main__":
    add.enable_caching()
    add(4, 5)
    print(listdir(PYCACHEDIR))

As expected, this prints out:

['test.add-9.py312.nbi', 'test.add-9.py312.1.nbc']

However, if I:

  1. Manually clear/delete the cache
  2. Call the add function again
  3. Print the contents of the cache directory
    [f.unlink() for f in pathlib.Path(PYCACHEDIR).glob("*nb*") if f.is_file()]
    print(listdir(PYCACHEDIR))
    add(4, 5)
    print(listdir(PYCACHEDIR))

Then nothing gets printed as the cache directory is empty! Since I had originally called add.enable_cache(), I would’ve expected the function to be cached again (after having manually cleared/deleted the cache) once I call the function again. Instead, it almost appears as if the writing to the PYCACHEDIR is disabled and, perhaps, the add function is either:

  1. Not cached at all (which would be undesirable)
  2. or cached elsewhere (but where?!)

Can anybody tell me what is happening here?


The full script is as follows:

from numba import njit
from os import listdir
import pathlib

PYCACHEDIR = '__pycache__'

@njit
def add(a, b):
    return a + b


if __name__ == "__main__":
    add.enable_caching()
    add(4, 5)
    print(listdir(PYCACHEDIR))
    [f.unlink() for f in pathlib.Path(PYCACHEDIR).glob("*nb*") if f.is_file()]
    print(listdir(PYCACHEDIR))
    add(4, 5)
    print(listdir(PYCACHEDIR))

Perhaps I’m not thinking about this correctly, but I’d expect a disk cache save/load only the first time a signature is compiled. It doesn’t make sense for numba to continually check if some outside process is changing the cache files.

After the first usage, the function is memory-cached and won’t be reloaded from disk. Setting the NUMBA_DEBUG_CACHE environment variable prints out some useful information, and setting a breakpoint inside numba/core/caching.py might be interesting for finding the location of the logic that shortcuts already-saved file caching.

Apologies for the lack of links, I’m responding using a cell phone which is a struggle.

EDIT: I ran a quick test, changed '4’ to ‘4.0’ in the second call to add() and got new cache files as expected. That lines up with my comments above.

I think @nelson2005 answer is correct. After you have a dispatcher object, it will only redirect the call to the generated function. Hence, caching won’t be regenerated.

Ahh, gotcha. It looks like explicitly calling the .recompile() method will cause the deleted files to be regenerated:

from numba import njit
from os import listdir
import pathlib

PYCACHEDIR = '__pycache__'

@njit
def add(a, b):
    return a + b


if __name__ == "__main__":
    add.enable_caching()
    add(4, 5)
    print(listdir(PYCACHEDIR))
    [f.unlink() for f in pathlib.Path(PYCACHEDIR).glob("*nb*") if f.is_file()]
    print(listdir(PYCACHEDIR))
    add.recompile()
    add(4, 5)
    print(listdir(PYCACHEDIR))