Numba portable caching logic

This is being transferred from a gitter conversation. Here’s the slightly edited situation.

@nelson2005
I know this is a bit broad, but hoping for some direction. What triggers a recompilation of the cache? I have some simple examples where I run the code, zip up the code dir (including nbi/nbc files in pycache), unzip it somewhere else and run it again without triggering recompilation. (RHEL6, numba 0.52.0)

@stuartarchibald
@nelson2005 the on disk cache is keyed on an llvm derived “magic tuple” along with other stuff like function bytecode and cell vars: numba/caching.py at 453765375d04232952868c47b0270df1747544d9 · numba/numba · GitHub , magic tuple is here numba/codegen.py at 453765375d04232952868c47b0270df1747544d9 · numba/numba · GitHub

@stuartarchibald
if you change the key by virtue of changing the target hardware (i.e. move to a sufficiently “different” CPU), then that would cause a recompile.

@nelson2005
Thanks! If the cache is generated on one machine, (say RHEL7 with ‘broadwell’ as the middle of the magic tuple and some set of autodetected flags as the third element) and running on another machine (RHEL6 with ‘haswell’ as the middle tuple and a different set of flags as the autodetected third element) this seems to indicate the way to go is to set NUMBA_CPU_NAME=generic and NUMBA_CPU_FEATURES=xxx. It may be too broad a question, but what’s a good value for NUMBA_CPU_FEATURES? The autodetected flags for the earlier (haswell) architecture?

@nelson2005
for part b, on the ‘cache consuming’ side, do these two env vars need to be set as well?

Here’s what I found in my adventure, comments/corrections welcome! This is also related to this post. I used code like this for investigating the numba cache index file format. Much of it is my interpretation of @stuartarchibald 's previous pointers in this thread.

Cached code (.nbc) files are only loaded if they have a valid overload entry in the cached index (.nbi) files. nbi files are invalidated unless the ‘stamp’ of the python source file matches the ‘stamp’ stored in the nbi file. The ‘stamp’ is a tuple2 of (python file stat st_mtime, number of lines in the python file). If the nbi file is invalidated, all cached nbc files are also invalidated.

In my case, I’m using setup.py with bdist_egg to package my application. It appears that the egg zipfile format doesn’t support the st_mtime python file timestamp granularity used by numba in the .nbi index files so my nbi files were being invalidated because the ‘stamp’ stored in the nbi didn’t match the ‘stamp’ of the unzipped python source file.

I solved this by adding a bit of code in setup.py to store a file with a pickled dict of filename and reset the timestamps when I unpacked the egg.

Assuming it is valid, the index nbi file contains a dict of overloads where the values are the names of cached code .nbc files. The keys are type tuple3 of (signature, magic tuple, code hash tuple). All of these must match in order for an overload to be used.

The three elements of the key are:

  1. signature is self-explanatory
  2. magic tuple is a tuple3 of (os identifier, cpu name, cpu compile flags). This is normally all autodetected, but the second and third items can be controlled by environment variables as described here
  3. code hash tuple is a tuple2 that makes sure the code and closure environment was really the same

In my case, the egg-building machine had a broadwell cpu, and the program-running had a haswell cpu. They’re quite similar but the detected cpu name and cpu flags (items 2 & 3 of the magic tuple) are different, which means that none of the default-configured cached nbc files built on a broadwell cpu will be selected for use on a haswell cpu, and vice-versa.

What I did was set environment variables:

  1. NUMBA_CPU_NAME=generic
  2. NUMBA_CPU_FEATURES="+64bit,+adx,+aes,+avx,+avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vpopcntdq,+bmi,+bmi2,-cldemote,-clflushopt,-clwb,-clzero,+cmov,+cx16,+cx8,-enqcmd,+f16c,+fma,-fma4,+fsgsbase,+fxsr,-gfni,+invpcid,-lwp,+lzcnt,+mmx,+movbe,-movdir64b,-movdiri,-mwaitx,+pclmul,-pconfig,-pku,+popcnt,-prefetchwt1,+prfchw,-ptwrite,-rdpid,+rdrnd,+rdseed,+rtm,+sahf,-sgx,-sha,-shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,-sse4a,+ssse3,-tbm,-vaes,-vpclmulqdq,-waitpkg,-wbnoinvd,-xop,+xsave,-xsavec,+xsaveopt,-xsaves"

I’m not sure what the optimal value for NUMBA_CPU_FEATURES is, but I used the values that were detected in a nbi file generated on the haswell platform.

For my part ‘b’ question, any program that wants to use the overloads compiled with NUMBA_CPU_NAME/NUMBA_CPU_FEATURES needs to have those environment variables set the same way as the program that compiled the overloads.

1 Like

Addendum that may be of interest to those using Apache Spark. In my case, there was a further complication. After unzipping the egg and resetting the the file timestamps on the driver machine, I use standard python tarfile to pack the source tree into a tgz that’s added to the executors with spark.sparkContext.addFile().

This process also appears to not preserve the requisite python source code file timestamp precision to match the sub-second value contained in the numba cache index .nbi file. Repeating the process of saving the os.stat() values of each file in the tgz and updating them on the executor side seemed to resolve that issue.

Thanks for all the helpful discussion. This is trenmendously helpful for me to get cache loading in cluster jobs. Add to that. The source code’s absolute path also matters. This is because nbi and nbc files are generated under a folder of root cache directory. the folder’s name depends on absolute path of source code file’s parent directory. (some kind of hash of that path). In my case, source code’s location is different when comparing running jobs in cluster and locally. One dirty trick would be to rename cache folders. How to compute the cache folder’s name can be found in source code of cache.py as mentioned in above discussion . where py_file is the file path of source code.

1 Like

That’s a great insight, and something I didn’t understand. Thanks for contributing to the community!