Why np.nanmean() seems to be faster than np.mean()

I compared the following two function:

@njit
def just_mean(arr):
return np.mean(arr)

and

@njit
def nan_mean(arr):
return np.nanmean(arr)

on

arr = np.array(random.sample(range(1, 100000), 5000))

the

just_mean()
function: 3.36 µs ± 613 ns per loop (mean ± std. dev. of 30 runs, 100,000 loops each)

the

nan_mean()
function: 3.28 µs ± 175 ns per loop (mean ± std. dev. of 30 runs, 100,000 loops each)

Just curious. Many thanks.

Hey @aeyou ,

I wouldn’t say nanmean is faster than mean. It has similar performance.
If the compiler is able to parallelize loops than isnan checks become cheap.
This is the case if your array has a contiguous memory layout.

import numpy as np
from numba import njit

@njit
def just_mean(arr):
    return np.mean(arr)

@njit
def nan_mean(arr):
    return np.nanmean(arr)

# warmup
arr = np.arange(5.)
just_mean(arr)
nan_mean(arr)

N = 1_000_000

# time contiguous
arr = np.random.rand(N)
%timeit just_mean(arr)
%timeit nan_mean(arr)
# 1.12 ms ± 28.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 1.12 ms ± 8.41 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# time strided
arr = np.random.rand(N*16)[::16]
%timeit just_mean(arr)
%timeit nan_mean(arr)
# 6.18 ms ± 32.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 7.29 ms ± 52.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Contiguous memory layout:
Figure 2024-04-10 044005

Strided memory layout
Figure 2024-04-10 044022