Numba caching on a batch cluster?

Hi all,

I was wondering if anyone here has experience with using numba on a batch system.

Specifically I am facing the following situation:

  • We have a central HTCondor system with thousands of servers that supposedly have a range of different hardware configurations. I am planning to submit hundreds of smaller jobs to the cluster, which would make use of some numba accelerated code that I have written.

  • I know that the initial compilation of my code takes a while, and may actually exceed the runtime of a single job, that is hardly an efficient use of the resources.

I see two approaches to tackle this problem:

  1. Bundle together several small jobs into one big job, and provide a copy of my code that is local to a single worker instance. Since I am using numba’s caching feature, JIT compilation should only happen once per job collection. A possible downside that I see here, is that there is less flexibility in scheduling the individual jobs.

  2. Leave all the jobs individual and provide my code from a central network file system location that is accessible from the cluster instances. Ideally this would allow the first workers that spin up to take care of the compilation for their own architecture and later runs should find the cached compiled versions in the centrally hosted package. However, I have no idea if numba’s caching system is designed to deal with simultaneous access from various Python sessions running on different machines. Are there any file based locking mechanisms?

I will happily admit that these things are way out of my comfort zone. My experience on working with the batch system is very limited and I also don’t really know under which circumstances a recompilation of numba code is actually required (i.e. how different does the hardware have to be for that to be necessary).

If somebody has any experience to share, I will happily soak it up. I’ll also report back at a later point if I figure out a way to do this well.

Cheers
Hannes

Hi @Hannes

RE 1. This would probably work, but as noted comes at the cost of flexibility in individual job scheduling.

RE 2. Without knowing the cluster set up it’s hard to guess at this. From memory, it should be ok to cache across architecture as system information (LLVM triple, CPU and target machine features) are part of the cache key. Consideration probably needs to be given to how much concurrent access is made to the shared space!