This time to startup — do you mean that even if JIT has already compiled, and even if the cache works, there is still some startup time accumulating in a large project?
In my experience, yes. It isn’t a ton, but it is noticeable. If your project gets big enough it can amount to several seconds of lag on each run. Essentially every cached end-point, every individually compiled @jit decorated function w/ cache=True that is called from Python (jits called from jits don’t count) has a cache file with its compiled code. Checking types and hashing the signature and file contents (in case of dependant globals), then reading that cache from the disk takes a noticeable amount of time.
I’ve noticed that there is often hidden time when the JIT doesn’t cache properly; it doesn’t write anything or give any messages but simply compiles silently each time. I only discovered it by timing measurements.
If you change a file then every jitted function in that file will probably need to be recompiled. Numba is very conservative about when it recompiles because of edge cases where a jitted function depends on a global variable.
Regarding Cython, it seems you talked me out of using it. Indeed, I didn’t think about it; there won’t be any boundary checks for arrays…
I can’t say from experience that Cython is bad, but my impression from other people’s reviews of it is that it can be limiting and it can produce all sorts of new problems, like the fact that it tends to interleave Python and C when it cannot fully compile things, and consequently not always give you a full speed up. It sounds like from some accounts that it also doesn’t offer as much protection as you might hope from the kinds of mistakes you can make in C or C++, including memory safety issues (I suspect those issues come into play when you use the extended syntax beyond just Python). Reading past the end of an array is really the tip of the iceberg in that department. A lot of memory safety issues have to do with malloc’ing and free’ing things at apporpriate times. If those are indeed issues in Cython, then perhaps it would be easier to use C/C++, Rust, C#, Java or other relatively fast options for which tools already exist (e.g. valgrind) for finding hard-to-spot issues like memory safety that don’t occur in Python. Put another way, Cython feels like it is at best easing (but not completely eliminating) an issue that takes maybe a week or two to overcome: learning C/C++ syntax—but not helping much with what makes writing safe and efficient C/C++ hard. If you really need the fastest implementation possible, C/C++ won’t hold you back, but there is a learning curve.
I have to say numba, seems to get pretty darn close to C/C++ for number crunching stuff. It’s got a way to go w/ objects/classes, however. And it would be pretty hard to do all of the same things you can do with objects in C/C++ (like stack allocating them, using move semantics, etc…) with objects in numba, since uses Python syntax.