Numba portable caching logic

Here’s what I found in my adventure, comments/corrections welcome! This is also related to this post. I used code like this for investigating the numba cache index file format. Much of it is my interpretation of @stuartarchibald 's previous pointers in this thread.

Cached code (.nbc) files are only loaded if they have a valid overload entry in the cached index (.nbi) files. nbi files are invalidated unless the ‘stamp’ of the python source file matches the ‘stamp’ stored in the nbi file. The ‘stamp’ is a tuple2 of (python file stat st_mtime, number of lines in the python file). If the nbi file is invalidated, all cached nbc files are also invalidated.

In my case, I’m using setup.py with bdist_egg to package my application. It appears that the egg zipfile format doesn’t support the st_mtime python file timestamp granularity used by numba in the .nbi index files so my nbi files were being invalidated because the ‘stamp’ stored in the nbi didn’t match the ‘stamp’ of the unzipped python source file.

I solved this by adding a bit of code in setup.py to store a file with a pickled dict of filename and reset the timestamps when I unpacked the egg.

Assuming it is valid, the index nbi file contains a dict of overloads where the values are the names of cached code .nbc files. The keys are type tuple3 of (signature, magic tuple, code hash tuple). All of these must match in order for an overload to be used.

The three elements of the key are:

  1. signature is self-explanatory
  2. magic tuple is a tuple3 of (os identifier, cpu name, cpu compile flags). This is normally all autodetected, but the second and third items can be controlled by environment variables as described here
  3. code hash tuple is a tuple2 that makes sure the code and closure environment was really the same

In my case, the egg-building machine had a broadwell cpu, and the program-running had a haswell cpu. They’re quite similar but the detected cpu name and cpu flags (items 2 & 3 of the magic tuple) are different, which means that none of the default-configured cached nbc files built on a broadwell cpu will be selected for use on a haswell cpu, and vice-versa.

What I did was set environment variables:

  1. NUMBA_CPU_NAME=generic
  2. NUMBA_CPU_FEATURES="+64bit,+adx,+aes,+avx,+avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vpopcntdq,+bmi,+bmi2,-cldemote,-clflushopt,-clwb,-clzero,+cmov,+cx16,+cx8,-enqcmd,+f16c,+fma,-fma4,+fsgsbase,+fxsr,-gfni,+invpcid,-lwp,+lzcnt,+mmx,+movbe,-movdir64b,-movdiri,-mwaitx,+pclmul,-pconfig,-pku,+popcnt,-prefetchwt1,+prfchw,-ptwrite,-rdpid,+rdrnd,+rdseed,+rtm,+sahf,-sgx,-sha,-shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,-sse4a,+ssse3,-tbm,-vaes,-vpclmulqdq,-waitpkg,-wbnoinvd,-xop,+xsave,-xsavec,+xsaveopt,-xsaves"

I’m not sure what the optimal value for NUMBA_CPU_FEATURES is, but I used the values that were detected in a nbi file generated on the haswell platform.

For my part ‘b’ question, any program that wants to use the overloads compiled with NUMBA_CPU_NAME/NUMBA_CPU_FEATURES needs to have those environment variables set the same way as the program that compiled the overloads.

1 Like