Containerized application without recompilation at startup

Hello,

I am currently trying to deploy a Numba application in a docker container. As there are restrictions on the startup latency I really need to avoid the recompilation time, which is already in the order of minutes and growing.

So far I have tried those methods in the container build step without success:

  • Ahead of time compliation
    My problem here is that does not support all numba features. In particular I need objmode at one critical point in the application (to get the elapsed time, see e.g. [I am not allowed to link]) which does not seem to work with AOT.

  • Trigger a cached JIT and reuse the cache after startup
    While this works when running the container on the same machine where it was built, running on a different host triggers a recompilation, probably due to a different “magic tuple” (again, link missing). I tried NUMBA_CPU_NAME=generic but the code still gets recompiled.

Any help is greatly appreciated. I’m wondering if I am missing something as this use case does not seem to be that uncommon.

What operating system are you using? How are you installing into the container? Pip?

I understand you can’t post links… Have you reviewed this and links on that topic?

Thanks @nelson2005. For development I am building the container on Mac OS, the failed caching happens when running it on a serverless cloud service.

With the help of your print function I was able to extract the nbi information on both machines. For all files it looks like this:

Local

numba_version=0.56.0
stamp=(1668417210.6453834, 1311)
overload_fname=[...]-13.py38.1.nbc
overload_key_signature=(array(int64, 1d, C), int64)
overload_key_magic_tuple=('x86_64-unknown-linux-gnu', 'generic', '')
overload_key_code_hash=('2c74cec6df4a9612533131c0106277fd1c8c0f4d75b92ed37caef8dcf3380c7d', 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855')

Remote

numba_version=0.56.0
stamp=(1668417210.0, 1311)
overload_fname=[...]-13.py38.1.nbc
overload_key_signature=(array(int64, 1d, C), int64)
overload_key_magic_tuple=('x86_64-unknown-linux-gnu', 'generic', '')
overload_key_code_hash=('2c74cec6df4a9612533131c0106277fd1c8c0f4d75b92ed37caef8dcf3380c7d', 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855')

The magic tuple and the hashes are actually the same, it’s only the timestamp that differs. Seems to be very similar to your Issue on Spark (“This process also appears to not preserve the requisite python source code file timestamp precision to match the sub-second value contained in the numba cache index .nbi file.”). However, in my case there is no unpacking/copying involved and the timestamps originate directly from the unpickled nbi. That currently does not make sense to me.

Can that difference be the reason for the recompliation?

Alright, got it.

The time precision of the filesystem seems to be a feature of the docker host and the remote one did not support sub-second precision, leading to a recompilation even when the magic tuple was the same.

I monkey-patched the method _load_index in the Class IndexDataCacheFile to not do the timestamp check and it finally worked. Now looking for a cleaner solution as this like messing with the timestamp values does not seem ideal. It would be nice to have an option that disables automatic cache invalidation.

Glad you solved it! Another option might be possibly patching the timestamp on the save-side?
Kind-of-related to your objmode timing, this discussion has @stuartarchibald’s nifty implementation of doing timing with the stdlib.

But in any case I think you would need to modify both the pickled timestamps as well as the file timestamps. Otherwise you could end up with the inverse situation when the host actually supports the higher precision. However, I noticed there was already a discussion in the dev forum to remove the timestamp-based invalidation, so there is probably no need for a naive PR and I’ll diff-patch the code during the build step for now.

Thanks for the other link, completely missed that. I am still not sure if that would also work with AOT, but caching is the more convenient method anyway.