I have created a minimum-reproducible-example where I have a Python module core.py:
# core.py
import numpy as np
from numba import njit
@njit()
def _welford_nanvar(a, w, a_subseq_isfinite):
all_variances = np.empty(a.shape[0] - w + 1, dtype=np.float64)
prev_mean = 0.0
prev_var = 0.0
for start_idx in range(a.shape[0] - w + 1):
prev_start_idx = start_idx - 1
stop_idx = start_idx + w # Exclusive index value
last_idx = start_idx + w - 1 # Last inclusive index value
if (
start_idx == 0
or not a_subseq_isfinite[prev_start_idx]
or not a_subseq_isfinite[start_idx]
):
curr_mean = np.nanmean(a[start_idx:stop_idx])
curr_var = np.nanvar(a[start_idx:stop_idx])
else:
curr_mean = prev_mean + (a[last_idx] - a[prev_start_idx]) / w
curr_var = (
prev_var
+ (a[last_idx] - a[prev_start_idx])
* (a[last_idx] - curr_mean + a[prev_start_idx] - prev_mean)
/ w
)
all_variances[start_idx] = curr_var
prev_mean = curr_mean
prev_var = curr_var
return all_variances
and then I simply import core.py into a test.py script:
# test.py
import core
if __name__ == "__main__":
pass
and execute test.py via:
python -X importtime test.py
This generates import timings where the most relevant part is the time it takes to import core.py:
import time: 723 | 1024643 | core
This is fast. However, when I add a function signature to the njit function in core.py (i.e., @njit("f8[:](f8[:], i8, b1[:])", fastmath={"nsz", "arcp", "contract", "afn", "reassoc"})) then the import time explodes to:
import time: 766185 | 2492403 | core
In reality, I have around 18 njit functions in my code and recently added function signatures to all of them. While signatures make the functions a lot āsaferā to use since it checks the input types, it now takes me 15-30 seconds to import my Python package. I donāt understand why adding a function signature would cause the import time to increase so dramatically. Is there any way to get the benefits of signatures while not having the import time become bloated?
I think if you provide a signature, that triggers immediate compilation of the function upon definition. So the increased import time is probably caused by numba precompiling your functions. I donāt know if it is possible to stop this while providing a signature.
I donāt think numbaās typing system is really meant for runtime type checking. Signatures only really make sense if you want to actively block numba from recompiling functions for certain data types. If you want type checking something like mypy (or just providing type annotations for your users) is probably better suited for the job.
No, type checking was only a secondary benefit. I was hoping that adding function signatures would help reduce the JIT compilation time for when the function was called. I suspected but couldnāt confirm that providing a signature would trigger immediate compilation, which makes sense given the explosion in import time. 15-30 seconds of import time is less than ideal but I hope that there is another option like some sort of explicit lazy compilation.
If you are concerned about the time needed for type inference, then my gut feeling is that you will hardly see any improvement by supplying a type, since the type inference should be pretty fast compared to all other steps. Numbaās inference works great and explicitly passing signatures is discouraged all over the docs unless you have very good reasons to do so. Apologies if I misunderstood your point.
One of these reasons is limiting precision of floats for example.
I seem to recall that the docs had a section explaining this in detail (especially how the order of āfirst callsā with a new signature can impact which specialisations are actually compiled and which just cast the input values if it is safe), but I cannot seem to find that anymore. Not sure if I am unable to find it or if it has been removed because something changed internally.
IIRC the dispatcher can also be ālockedā manually to prohibit further specialisation / compilation, but I donāt have a good idea how one would use that to make the compilation lazy but constrained.
Thanks @Hannes I appreciate your input. I definitely learned something new!
@nelson2005 I have only read about caching and understand that we can save recompile time. However, I donāt fully understand how this works when weāre dealing with Numba function caching in a Python package. In other words, if somebody pip installs a Python package from PyPI that contains Numba code with cache=True, what happens? Also, what happens when a new version of the package comes out and gets installed over an older version?
My hope is that caching will never be a problem but thatās not based on anything concrete.
I have made extremely good experiences with numbaās caching in a ālibrary styleā package. I use it all the time (on of my funcs takes minutes to compile, so it is a life saver). Numba will notice if your code is changing and recompile if necessary. There is some more info on that here: Notes on Caching ā Numba 0.54.1+0.g39aef3deb.dirty-py3.7-linux-x86_64.egg documentation
I am fairly sure that there is some more info going into the hash that is not documented right now, but I had a look on the source code at some point. If I find it I will post it here too.
The only point where I had to clean the cache manually was in the past when I was messing around with some flags around TBB and fastmath if I recall correctly, cause I was runnning into Segfaults under very weird circumstances. But that was only ever a problem while I was messing around with configurations wild-west style. For those cases I have a small script that clears all Python and numba caches so I can proceed quickly.
Iāve had good experience with caching; my app takes over an hour to jit on 0.53.1 so caching is pretty key.
If the cpu/os changes, the cache is rebuilt. Like @Hannes mentioned, problems are unusual for ānormalā usages. The timestamp of the python source file is part of the fingerprint to changing/updating it will simply cause the cache to be rebuilt.
Thank you both! Alas, I added cache=True to all of my functions and it looks like I am running into issues using it with Dask that seems to be related to this Numba issue. Iāve even tried limiting cache=True to functions without parallel=True but that didnāt solve the segmentation faults either. It seems that cache=True isnāt quite the answer yet.
Ah that is a shame, I wasnāt aware of the problems with DASK. but thanks for bringing this to my attention, there is a good chance I would have tripped over it in the very near future!