Unexpected Performance Decrease in np.cov with Numba JIT

I’m encountering an unexpected performance decrease when calling an @njit-compiled function using np.cov. While I expect the first call to be slow due to JIT compilation, I noticed that changing the input shape for the first time results in a significant execution time increase. This behavior is surprising because the function signature remains the same across calls.

Code to Reproduce the Issue

from contextlib import contextmanager
from time import perf_counter
from typing import Callable, Generator

from numba import njit
import numpy as np


@contextmanager
def timer(f: Callable[[float], object] = lambda _: None) -> Generator[Callable[[], float], None, None]:
    _ = perf_counter()

    def t() -> float:
        return perf_counter() - _

    yield t
    f(t())
    del _


@njit
def test(x, y):
    return np.cov(x, y)


with timer(print):
    test(np.array([0.]), np.array([0.]))  # First call (JIT compilation expected)
# Example output: 3.549982499331236

with timer(print):
    test(np.array([0.]), np.array([0.]))  # Second call (should be fast)
# Example output: 1.730024814605713e-05

with timer(print):
    test(np.array([0., 0.]), np.array([0., 0.]))  # First shape change
# Example output: 0.02204340137541294  <-- Unexpectedly high execution time

with timer(print):
    test(np.array([0., 0.]), np.array([0., 0.]))  # Second call with same new shape (should be fast)
# Example output: 2.8500333428382874e-05

print(test.nopython_signatures)
# Output: [(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]

Observed Behavior

  • First execution (compilation overhead expected) → Slow.
  • Second execution (same input shape) → Fast, as expected.
  • Third execution (first shape change) → Unexpectedly slow, even though the function signature remains the same.
  • Fourth execution (same new shape) → Fast again, as expected.

Questions

  1. Why does the first execution with a new input shape result in a significant performance decrease, even though the function signature does not change?
  2. How can I investigate what is happening internally?

Any insights or debugging strategies would be greatly appreciated!

Python version: 3.12.8
Numba version: 0.60.0

Numba’s overload of np.cov selects between different implementations based on the shape of the input arrays. Even though both cases ultimately return a 2D array, the internal code path is different. Numba must compile a separate version (specialization) for each case, which is why you see additional compilation overhead when the input shape changes.

Edit:
Running your code, the first call (which has the compilation overhead) takes much longer than the subsequent calls.
Once compiled, each specialization seems to be cached and reused, so separate compilations aren’t causing the delays.
Here is what I get:

5.915852659003576
2.0213992684148252e-05
1.0037998436018825e-05
7.039998308755457e-06
[(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]

I’ve used Numba version 0.61.0. Do you observe the same result with this version?

I have updated numba to 0.61.0 but nothing changed. New output:

6.279752001166344
2.2199004888534546e-05
0.030417900532484055
3.0800700187683105e-05
[(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]
0.61.0

(The last line is produced by print(nb.__version__))

If test(x, y) is compiled only once, there should be one .nbc file.
If test(x, y) is recompiled for different input shapes, multiple .nbc files should appear.
Can you set a cache directory, cache the function and check the files that will be created.

import os
os.environ["NUMBA_CACHE_DIR"] = "/SomePath/numba_cache"

from numba import njit
import numpy as np

@njit(cache=True)
def test(x, y):
    return np.cov(x, y)

test(np.array([0.]), np.array([0.]))
test(np.array([0., 0.]), np.array([0., 0.]))
".../test.test-7.py312.1.nbc",
".../test.test-7.py312.nbi"

There are only two cache files in the directory, just like your result.

If you run the following code line by line in Numba debug mode, you should be able to see when compilations occur.
For me, there is only one compilation, even when the input array sizes change.

import os
os.environ["NUMBA_DEBUG"] = "1"

from numba import njit
import numpy as np

@njit
def test(x, y):
    return np.cov(x, y)

test(np.array([0.]), np.array([0.]))
# Debug output (truncated for brevity):
# ...
# 	.quad	.const.pickledata.137525623976768.22
# 	.long	178
# 	.zero	4
# 	.quad	.const.pickledata.137525623976768.sha1.23
# 	.quad	0
# 	.long	0
# 	.zero	4
# 	.size	.const.picklebuf.137525623976768.21, 40
# 	.section	".note.GNU-stack","",@progbits
# ...

test(np.array([0., 0.]), np.array([0., 0.]))
# array([[0., 0.],
#        [0., 0.]])

If multiple compilations were happening, you would see additional compilation logs while running the function with different input sizes.

I did not see anything too useful because the amount of output was so large that it was truncated in my IDE.

The timings of the “test” function suggest that np.cov might be compiled twice depending on the shape of the input arrays. I couldn’t reproduce this behavior with the same versions of Python and Numba.
To investigate further, you can set os.environ[“NUMBA_DEBUG”] = “1” to check if a compilation is triggered when executing the line.
What you should see when running test(np.array([0.]), np.array([0.])) is a series of long compilation logs (the content doesn’t really matter, just the fact that some compilation appears). After that, when you run test(np.array([0., 0.]), np.array([0., 0.])), the result should return quickly without the lengthy compilation logs.
If you still see a long compilation log for the second call, it indicates that recompilation is occurring for some reason.

I can see that the compilation log only appears before the first call.

Great, this is exactly how it should behave.
Could you try the same with the original code, using os.environ[“NUMBA_DEBUG”] = “1”? What you should see is similar behavior: there should only be one compilation log. The first function call will take longer, while subsequent calls should be much faster.

Here’s an example of what the output should look like:

    ...some long compilation logs...
	.long	178
	.zero	4
	.quad	.const.pickledata.125757979843904.sha1
	.quad	0
	.long	0
	.zero	4
	.size	.const.picklebuf.125757979843904, 40
	.section	".note.GNU-stack","",@progbits
===========================================================
7.919435390998842   # (<= longer execution time for the first call)
2.1211017156019807e-05
1.015502493828535e-05
7.238006219267845e-06
[(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]

Actually, the end of output is like this:

.const.picklebuf.2338120985216:
	.quad	.const.pickledata.2338120985216
	.long	176
	.zero	4
	.quad	.const.pickledata.2338120985216.sha1
	.quad	0
	.long	0
	.zero	4
	.size	.const.picklebuf.2338120985216, 40

	.section	".note.GNU-stack","",@progbits

================================================================================
10.276019901037216
2.8699636459350586e-05
0.05956900119781494
5.569867789745331e-05
[(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]

If you ran the code all at once, it’s possible that two compilation logs appeared before the actual output. Have you tried executing each function call line by line to ensure that step 3 doesn’t trigger a recompile?

Yes, I am sure there is no compilation log output on step 3.

The longer third call suggests that Numba might be compiling a specialized version of np.cov for the new input shape. Normally, you’d expect to see compilation logs, but if the visible signature doesn’t change, those logs might be hidden (???).
I couldn’t reproduce this delay under the same Python/Numba environment, so it might be due to environmental factors or even a bug. It’s hard to say…

This is really a strange problem. Thanks a lot for your help!