I’m working to fix the current lack of cache invalidation when a secondary file—a module from which other jit functions are imported and then called from a main jit function—is modified.
I already have a branch where the overloads are picked from the cache according to the code signature of the calling function and all dependencies. This means that a change in the code of a dependency does not lead anymore to incorrect calculations.
So far so good. In the process of testing, I noticed that the cache of the main function keeps growing, ie accumulating overloads.
I found out that this is because the code of the main function has not changed, which means that the cached overloads are not considered stale, but simply unused. This could be a problem if the cache keeps growing and growing, so I started looking into that. In the process of looking into it, I noticed that currently the cache is very pessimistic about changes in the main file. Even changes outside the function invalidate the cache.
So currently, the cache is ignoring changes in secondary files, but invalidating the cache for every change in the file, even outside the main function or its closure variables.
I would like to propose to move away from invalidating the cache index based on the timestamp of the file, and use only the code+closure signature of the function itself. Would anyone see a problem with that? When I say code+closure signature I mean the exact same information that is used to select the overload from within the cache.
The cache index is currently a combination of (function signature, target context, code hash, closures hash). My proposal would be to use code hash + closure hash to decide whether the index should be reset (implicitly throwing away every existing cached compilation). When saving an overload in the cache, the cache machinery would compare the hashes of the latest overload against existing ones, and reset the index if they don’t match.
All feedback and suggestions welcome.