Long compilation times for simple but long functions

In our workflow, we employ macros that generate pure python code and use Numba to compile the generated code. This code can get really long with 1000s of variables; but the transformation rules are very basic. The compiled code runs blazingly fast, but compilation takes way too long. The (illustrative) code below takes about 7.5 minutes to compile.

import numpy as np
import numba as nb

N_COLS = 500

assignments = "\n    ".join(
  [f"x{i} = np.zeros(100,dtype='f4')" for i in range(N_COLS)]
)
updates = "\n    ".join(
  [f"x{i}[2:7] += 1" for i in range(N_COLS)]
)

code = f"""\
@nb.njit('f4[:](f4[:])')
def f(x):
    {assignments}
    {updates}
    return x1
"""

exec(code)

Profiling points to the llvmlite function _lib_fn_wrapper.__call__ in ffi.py as the main culprit. It makes up 90%+ of the total run time.This i guess (based on the pipeline times) is called during the compiler pass native_lowering. Is there a way to reduce the compilation time? In the example above, I could combine all columns into a 2D array; but in our actual workload, these arrays might be of different types and lengths and thus would be very tedious to combine them. Thank you!

EDIT

The code below on the other hand compiles in under 30 seconds. Maybe, the code above has too many loops (one for each slice assignment) which makes the compilation harder?

assignments = "\n    ".join(
  [f"x{i} = np.zeros(100,dtype='f4')" for i in range(N_COLS)]
)
updates = "\n        ".join(
  [f"x{i}[n] += 1" for i in range(N_COLS)]
)

code = f"""\
@nb.njit('f4[:](f4[:])')
def f(x):
    {assignments}
    for n in range(2,7):
        {updates}
    return x1
"""
1 Like

Disclaimer: I don’t know how to make that compile faster :frowning:

That being said, the exec prevents the cache=True from working. @DannyWeitekamp’s CRE seems to have a nifty caching solution for machine-generated code. It might be worth spending a few minutes investigating whether that would be a good fit for your situation.

The gist of the solution I have for this in CRE is to write the source to temporary files. I use numba’s OS agnostic way for find a good place to put these:

from numba.misc.appdirs import AppDirs
appdirs = AppDirs(appname="cre", appauthor=False)
cache_dir = os.path.join(appdirs.user_cache_dir,"cre_cache")

You can take a look at the source in CRE for the rest.

The general idea is that you write a source generation script, and a provide a list of parameters to hash on. Then every file is uniquely identified by it’s name and hash. Here is the general usage pattern:

hash_code = unique_hash([...whatever script parameters])  
if(not source_in_cache('script_name', hash_code)):
    src = my_script_gen_function(...whatever script parameters)
    source_to_cache('script_name', hash_code, src)
l = import_from_cached('script_name', hash_code,['my_func'])
my_func = l['my_func']

When you import from this file cache, any cache=True in a jitted function will work as expected because the definition lives on the disk somewhere that numba can keep track of. One annoyance is that while writing your source generation script you will inevitably make mistakes, in which case you will probably need to clear the cache of any erroneous generated scripts before rerunning it.

1 Like

We do not use exec precisely due to this issue and follow an approach similar to What @DannyWeitekamp suggested. I have not tried using AppDirs; but we need a location accessible to each worker node in the Spark application and we set it ‘manually’. Thank you for suggesting CRE. Looks interesting and relevant to the things we do.

It may or may not be useful to you, but it’s possible to build the cache ‘locally’ on a spark edge node and then zip+ship the cache to the executors for a ‘shared nothing’ approach.