I am currently trying to deploy a Numba application in a docker container. As there are restrictions on the startup latency I really need to avoid the recompilation time, which is already in the order of minutes and growing.
So far I have tried those methods in the container build step without success:
Ahead of time compliation
My problem here is that does not support all numba features. In particular I need objmode at one critical point in the application (to get the elapsed time, see e.g. [I am not allowed to link]) which does not seem to work with AOT.
Trigger a cached JIT and reuse the cache after startup
While this works when running the container on the same machine where it was built, running on a different host triggers a recompilation, probably due to a different “magic tuple” (again, link missing). I tried NUMBA_CPU_NAME=generic but the code still gets recompiled.
Any help is greatly appreciated. I’m wondering if I am missing something as this use case does not seem to be that uncommon.
The magic tuple and the hashes are actually the same, it’s only the timestamp that differs. Seems to be very similar to your Issue on Spark (“This process also appears to not preserve the requisite python source code file timestamp precision to match the sub-second value contained in the numba cache index .nbi file.”). However, in my case there is no unpacking/copying involved and the timestamps originate directly from the unpickled nbi. That currently does not make sense to me.
Can that difference be the reason for the recompliation?
The time precision of the filesystem seems to be a feature of the docker host and the remote one did not support sub-second precision, leading to a recompilation even when the magic tuple was the same.
I monkey-patched the method _load_index in the Class IndexDataCacheFile to not do the timestamp check and it finally worked. Now looking for a cleaner solution as this like messing with the timestamp values does not seem ideal. It would be nice to have an option that disables automatic cache invalidation.
Glad you solved it! Another option might be possibly patching the timestamp on the save-side?
Kind-of-related to your objmode timing, this discussion has @stuartarchibald’s nifty implementation of doing timing with the stdlib.
But in any case I think you would need to modify both the pickled timestamps as well as the file timestamps. Otherwise you could end up with the inverse situation when the host actually supports the higher precision. However, I noticed there was already a discussion in the dev forum to remove the timestamp-based invalidation, so there is probably no need for a naive PR and I’ll diff-patch the code during the build step for now.
Thanks for the other link, completely missed that. I am still not sure if that would also work with AOT, but caching is the more convenient method anyway.