I’m benchmarking getCenters
function shown below with Python 3.10.6, Numba 0.56.0, Ubuntu 22.04.1 on x64 CPU:
import numpy as np
import numba as nb
import timeit as ti
@nb.njit(fastmath=True)
def getCenters(im, ce, w):
for i in range(ce.size-w-1):
square = [im[i], im[i+1], im[i+w], im[i+w+1]]
square.sort()
ce[i] = (square[1]+square[2])/2
w, h = 640, 480
im = np.random.rand(w*h).astype(np.float32)
ce = np.empty_like(im)
fun = f'getCenters(im, ce, w)'
t = 1000 * np.array(ti.repeat(stmt=fun, setup=fun, globals=globals(), number=1, repeat=10))
print(f'{fun}: {np.amin(t):6.3f}ms {np.median(t):6.3f}ms')
and getting about 151ms execution time. Equivalent C++ version shown here:
void getCenters(vector<float> &im, vector<float> &ce, int w) {
for (int i=0; i < im.size()-w-1; i++) {
array<float, 4> square = {im[i], im[i+1], im[i+w], im[i+w+1]};
sort(square.begin(), square.end());
ce[i] = (square[1]+square[2])/2;
}
}
executes in 6.33ms when compiled with gcc 11.2. Why Numba version is 24 times slower?
It seems that sort()
is a high-cost operation. Commenting it out lowers the time from 151ms to 22ms. Still far from C++ version, though.