Unexpected Performance Decrease in np.cov with Numba JIT

plutonium-94 · February 17, 2025, 7:26pm

I’m encountering an unexpected performance decrease when calling an @njit-compiled function using np.cov. While I expect the first call to be slow due to JIT compilation, I noticed that changing the input shape for the first time results in a significant execution time increase. This behavior is surprising because the function signature remains the same across calls.

Code to Reproduce the Issue

from contextlib import contextmanager
from time import perf_counter
from typing import Callable, Generator

from numba import njit
import numpy as np


@contextmanager
def timer(f: Callable[[float], object] = lambda _: None) -> Generator[Callable[[], float], None, None]:
    _ = perf_counter()

    def t() -> float:
        return perf_counter() - _

    yield t
    f(t())
    del _


@njit
def test(x, y):
    return np.cov(x, y)


with timer(print):
    test(np.array([0.]), np.array([0.]))  # First call (JIT compilation expected)
# Example output: 3.549982499331236

with timer(print):
    test(np.array([0.]), np.array([0.]))  # Second call (should be fast)
# Example output: 1.730024814605713e-05

with timer(print):
    test(np.array([0., 0.]), np.array([0., 0.]))  # First shape change
# Example output: 0.02204340137541294  <-- Unexpectedly high execution time

with timer(print):
    test(np.array([0., 0.]), np.array([0., 0.]))  # Second call with same new shape (should be fast)
# Example output: 2.8500333428382874e-05

print(test.nopython_signatures)
# Output: [(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]

Observed Behavior

First execution (compilation overhead expected) → Slow.
Second execution (same input shape) → Fast, as expected.
Third execution (first shape change) → Unexpectedly slow, even though the function signature remains the same.
Fourth execution (same new shape) → Fast again, as expected.

Questions

Why does the first execution with a new input shape result in a significant performance decrease, even though the function signature does not change?
How can I investigate what is happening internally?

Any insights or debugging strategies would be greatly appreciated!

Python version: 3.12.8
Numba version: 0.60.0

Oyibo · February 18, 2025, 3:52am

Numba’s overload of np.cov selects between different implementations based on the shape of the input arrays. Even though both cases ultimately return a 2D array, the internal code path is different. Numba must compile a separate version (specialization) for each case, which is why you see additional compilation overhead when the input shape changes.

Edit:
Running your code, the first call (which has the compilation overhead) takes much longer than the subsequent calls.
Once compiled, each specialization seems to be cached and reused, so separate compilations aren’t causing the delays.
Here is what I get:

5.915852659003576
2.0213992684148252e-05
1.0037998436018825e-05
7.039998308755457e-06
[(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]

I’ve used Numba version 0.61.0. Do you observe the same result with this version?

plutonium-94 · February 19, 2025, 1:35am

I have updated numba to 0.61.0 but nothing changed. New output:

6.279752001166344
2.2199004888534546e-05
0.030417900532484055
3.0800700187683105e-05
[(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]
0.61.0

(The last line is produced by print(nb.__version__))

Oyibo · February 19, 2025, 2:17am

If test(x, y) is compiled only once, there should be one .nbc file.
If test(x, y) is recompiled for different input shapes, multiple .nbc files should appear.
Can you set a cache directory, cache the function and check the files that will be created.

import os
os.environ["NUMBA_CACHE_DIR"] = "/SomePath/numba_cache"

from numba import njit
import numpy as np

@njit(cache=True)
def test(x, y):
    return np.cov(x, y)

test(np.array([0.]), np.array([0.]))
test(np.array([0., 0.]), np.array([0., 0.]))
".../test.test-7.py312.1.nbc",
".../test.test-7.py312.nbi"

plutonium-94 · February 19, 2025, 2:32am

There are only two cache files in the directory, just like your result.

Oyibo · February 19, 2025, 3:08am

If you run the following code line by line in Numba debug mode, you should be able to see when compilations occur.
For me, there is only one compilation, even when the input array sizes change.

import os
os.environ["NUMBA_DEBUG"] = "1"

from numba import njit
import numpy as np

@njit
def test(x, y):
    return np.cov(x, y)

test(np.array([0.]), np.array([0.]))
# Debug output (truncated for brevity):
# ...
# 	.quad	.const.pickledata.137525623976768.22
# 	.long	178
# 	.zero	4
# 	.quad	.const.pickledata.137525623976768.sha1.23
# 	.quad	0
# 	.long	0
# 	.zero	4
# 	.size	.const.picklebuf.137525623976768.21, 40
# 	.section	".note.GNU-stack","",@progbits
# ...

test(np.array([0., 0.]), np.array([0., 0.]))
# array([[0., 0.],
#        [0., 0.]])

If multiple compilations were happening, you would see additional compilation logs while running the function with different input sizes.

plutonium-94 · February 19, 2025, 11:27pm

I did not see anything too useful because the amount of output was so large that it was truncated in my IDE.

Oyibo · February 20, 2025, 1:19am

The timings of the “test” function suggest that np.cov might be compiled twice depending on the shape of the input arrays. I couldn’t reproduce this behavior with the same versions of Python and Numba.
To investigate further, you can set os.environ[“NUMBA_DEBUG”] = “1” to check if a compilation is triggered when executing the line.
What you should see when running test(np.array([0.]), np.array([0.])) is a series of long compilation logs (the content doesn’t really matter, just the fact that some compilation appears). After that, when you run test(np.array([0., 0.]), np.array([0., 0.])), the result should return quickly without the lengthy compilation logs.
If you still see a long compilation log for the second call, it indicates that recompilation is occurring for some reason.

plutonium-94 · February 20, 2025, 1:34am

I can see that the compilation log only appears before the first call.

Oyibo · February 20, 2025, 1:47am

Great, this is exactly how it should behave.
Could you try the same with the original code, using os.environ[“NUMBA_DEBUG”] = “1”? What you should see is similar behavior: there should only be one compilation log. The first function call will take longer, while subsequent calls should be much faster.

Here’s an example of what the output should look like:

    ...some long compilation logs...
	.long	178
	.zero	4
	.quad	.const.pickledata.125757979843904.sha1
	.quad	0
	.long	0
	.zero	4
	.size	.const.picklebuf.125757979843904, 40
	.section	".note.GNU-stack","",@progbits
===========================================================
7.919435390998842   # (<= longer execution time for the first call)
2.1211017156019807e-05
1.015502493828535e-05
7.238006219267845e-06
[(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]

plutonium-94 · February 20, 2025, 2:24am

Actually, the end of output is like this:

.const.picklebuf.2338120985216:
	.quad	.const.pickledata.2338120985216
	.long	176
	.zero	4
	.quad	.const.pickledata.2338120985216.sha1
	.quad	0
	.long	0
	.zero	4
	.size	.const.picklebuf.2338120985216, 40

	.section	".note.GNU-stack","",@progbits

================================================================================
10.276019901037216
2.8699636459350586e-05
0.05956900119781494
5.569867789745331e-05
[(Array(float64, 1, 'C', False, aligned=True), Array(float64, 1, 'C', False, aligned=True)) -> array(float64, 2d, C)]

Oyibo · February 20, 2025, 2:35am

If you ran the code all at once, it’s possible that two compilation logs appeared before the actual output. Have you tried executing each function call line by line to ensure that step 3 doesn’t trigger a recompile?

plutonium-94 · February 20, 2025, 3:24am

Yes, I am sure there is no compilation log output on step 3.

Oyibo · February 20, 2025, 12:30pm

The longer third call suggests that Numba might be compiling a specialized version of np.cov for the new input shape. Normally, you’d expect to see compilation logs, but if the visible signature doesn’t change, those logs might be hidden (???).
I couldn’t reproduce this delay under the same Python/Numba environment, so it might be due to environmental factors or even a bug. It’s hard to say…

plutonium-94 · February 20, 2025, 7:27pm

This is really a strange problem. Thanks a lot for your help!

Topic		Replies	Views
Major slow down when adding one more layer of function Numba	3	212	September 28, 2023
Re-execution of the same jitted function causes re-compilation for same type of data Community Support	9	526	October 23, 2020
Tips or tricks for speeding up compilation time on first call of large Numba-jitted NumPy-containing functions? Community Support	3	755	July 28, 2020
Reduce Compilation Time Support: How do I do ...?	8	1878	November 3, 2021
Passing namedTuple to a jitted function is slow Community Support	1	866	April 25, 2022

Unexpected Performance Decrease in np.cov with Numba JIT

Related topics