Reducing Import Time

I have created a minimum-reproducible-example where I have a Python module core.py:

# core.py

import numpy as np
from numba import njit


@njit()
def _welford_nanvar(a, w, a_subseq_isfinite):
    all_variances = np.empty(a.shape[0] - w + 1, dtype=np.float64)
    prev_mean = 0.0
    prev_var = 0.0

    for start_idx in range(a.shape[0] - w + 1):
        prev_start_idx = start_idx - 1
        stop_idx = start_idx + w  # Exclusive index value
        last_idx = start_idx + w - 1  # Last inclusive index value

        if (
            start_idx == 0
            or not a_subseq_isfinite[prev_start_idx]
            or not a_subseq_isfinite[start_idx]
        ):
            curr_mean = np.nanmean(a[start_idx:stop_idx])
            curr_var = np.nanvar(a[start_idx:stop_idx])
        else:
            curr_mean = prev_mean + (a[last_idx] - a[prev_start_idx]) / w
            curr_var = (
                prev_var
                + (a[last_idx] - a[prev_start_idx])
                * (a[last_idx] - curr_mean + a[prev_start_idx] - prev_mean)
                / w
            )

        all_variances[start_idx] = curr_var

        prev_mean = curr_mean
        prev_var = curr_var

    return all_variances

and then I simply import core.py into a test.py script:

# test.py

import core

if __name__ == "__main__":
    pass

and execute test.py via:

python -X importtime test.py

This generates import timings where the most relevant part is the time it takes to import core.py:

import time:       723 |    1024643 | core

This is fast. However, when I add a function signature to the njit function in core.py (i.e., @njit("f8[:](f8[:], i8, b1[:])", fastmath={"nsz", "arcp", "contract", "afn", "reassoc"})) then the import time explodes to:

import time:    766185 |    2492403 | core

In reality, I have around 18 njit functions in my code and recently added function signatures to all of them. While signatures make the functions a lot ā€œsaferā€ to use since it checks the input types, it now takes me 15-30 seconds to import my Python package. I don’t understand why adding a function signature would cause the import time to increase so dramatically. Is there any way to get the benefits of signatures while not having the import time become bloated?

I think if you provide a signature, that triggers immediate compilation of the function upon definition. So the increased import time is probably caused by numba precompiling your functions. I don’t know if it is possible to stop this while providing a signature.

I don’t think numba’s typing system is really meant for runtime type checking. Signatures only really make sense if you want to actively block numba from recompiling functions for certain data types. If you want type checking something like mypy (or just providing type annotations for your users) is probably better suited for the job.

No, type checking was only a secondary benefit. I was hoping that adding function signatures would help reduce the JIT compilation time for when the function was called. I suspected but couldn’t confirm that providing a signature would trigger immediate compilation, which makes sense given the explosion in import time. 15-30 seconds of import time is less than ideal but I hope that there is another option like some sort of explicit lazy compilation.

If you are concerned about the time needed for type inference, then my gut feeling is that you will hardly see any improvement by supplying a type, since the type inference should be pretty fast compared to all other steps. Numba’s inference works great and explicitly passing signatures is discouraged all over the docs unless you have very good reasons to do so. Apologies if I misunderstood your point.

One of these reasons is limiting precision of floats for example.

I seem to recall that the docs had a section explaining this in detail (especially how the order of ā€œfirst callsā€ with a new signature can impact which specialisations are actually compiled and which just cast the input values if it is safe), but I cannot seem to find that anymore. Not sure if I am unable to find it or if it has been removed because something changed internally.

IIRC the dispatcher can also be ā€œlockedā€ manually to prohibit further specialisation / compilation, but I don’t have a good idea how one would use that to make the compilation lazy but constrained.

1 Like

Perhaps unrelated, but for programs that are run many times but caching can prevent recompiling with each invocation

Thanks @Hannes I appreciate your input. I definitely learned something new!

@nelson2005 I have only read about caching and understand that we can save recompile time. However, I don’t fully understand how this works when we’re dealing with Numba function caching in a Python package. In other words, if somebody pip installs a Python package from PyPI that contains Numba code with cache=True, what happens? Also, what happens when a new version of the package comes out and gets installed over an older version?

My hope is that caching will never be a problem but that’s not based on anything concrete.

@seanlaw Glad if that helped :slight_smile:

I have made extremely good experiences with numba’s caching in a ā€œlibrary styleā€ package. I use it all the time (on of my funcs takes minutes to compile, so it is a life saver). Numba will notice if your code is changing and recompile if necessary. There is some more info on that here: Notes on Caching — Numba 0.54.1+0.g39aef3deb.dirty-py3.7-linux-x86_64.egg documentation

I am fairly sure that there is some more info going into the hash that is not documented right now, but I had a look on the source code at some point. If I find it I will post it here too.

The only point where I had to clean the cache manually was in the past when I was messing around with some flags around TBB and fastmath if I recall correctly, cause I was runnning into Segfaults under very weird circumstances. But that was only ever a problem while I was messing around with configurations wild-west style. For those cases I have a small script that clears all Python and numba caches so I can proceed quickly.

1 Like

Here is the makefile I use to clean up my crimes, no guarantees though, my shell scripting abilities are that of a toddler

.PHONY: numba-clean numba-cache-clean python-cache-clean cache-clean safety_net
safety_net:
	@echo No target specified, exiting.

numba-cache-clean:
	find -type f -name "*.nb[ci]" -delete
	@echo Done cleaning numba cache files.

python-cache-clean:
	find -type f -name "*.py[co]" -delete
	find -type d -name "__pycache__" -delete
	@echo Done cleaning pycache.

cache-clean: numba-cache-clean python-cache-clean
	@echo Done.

I’ve had good experience with caching; my app takes over an hour to jit on 0.53.1 so caching is pretty key.

If the cpu/os changes, the cache is rebuilt. Like @Hannes mentioned, problems are unusual for ā€˜normal’ usages. The timestamp of the python source file is part of the fingerprint to changing/updating it will simply cause the cache to be rebuilt.

For more info, you might find this discussion interesting

1 Like

Thank you both! Alas, I added cache=True to all of my functions and it looks like I am running into issues using it with Dask that seems to be related to this Numba issue. I’ve even tried limiting cache=True to functions without parallel=True but that didn’t solve the segmentation faults either. It seems that cache=True isn’t quite the answer yet.

1 Like

Ah that is a shame, I wasn’t aware of the problems with DASK. but thanks for bringing this to my attention, there is a good chance I would have tripped over it in the very near future!