Hi!
I’m new to Numba and I would like to use it to write code that may run both on multiple CPUs and GPUs.
In particular, I’m currently learning the basic differences between the decorators @njit and @vectorize, but I have a problem in understanding their different performances.
I wrote the same python function using both the decorator “@njit(parallel=True)” and “@vectorize(…,target=‘parallel’)”.
It turned out that the first implementation is much faster than the latter.
Here below I report the code.
import numba as nb
import numpy as np
from numba import float64
@nb.njit
def logdiffexp(x, y):
return x + np.log1p(-np.exp(y - x))
@nb.njit
def _logTPL(x, alpha, mmin, mmax):
log_norm_cost = -np.log(alpha - 1) + logdiffexp((1 - alpha) * np.log(mmin), (1 - alpha) * np.log(mmax))
if (mmin < x) and (x < mmax):
result = -alpha * np.log(x) - log_norm_cost
else:
result = -np.inf
return result
@nb.njit
def _logSmoothing(m, delta_m, ml):
if m <= ml:
result = -np.inf
elif m >= (ml + delta_m):
result = 0.0
else:
result = -np.logaddexp(0.0, (delta_m / (m - ml) + delta_m / (m - ml - delta_m)))
return result
@nb.njit
def _logPLm2(m2, beta, ml):
return beta * np.log(m2) if m2 >= ml else -np.inf
@nb.njit
def _logC_PL(m1, beta, ml):
return np.log((1 + beta) / (m1**(1 + beta) - ml**(1 + beta)))
@nb.njit(parallel=True)
def log_PL(m1, m2, alpha, beta, ml, mh):
result = np.empty_like(m1)
for i in nb.prange(len(m1)):
if ml < m2[i] < m1[i] < mh:
result[i] = _logTPL(m1[i], alpha, ml, mh) + _logPLm2(m2[i], beta, ml) + _logC_PL(m1[i], beta, ml)
else:
result[i] = -np.inf
return result
@nb.vectorize([float64(float64, float64, float64, float64, float64, float64)],target='parallel')
def log_PLvec(m1, m2, alpha, beta, ml, mh):
if ml < m2 < m1 < mh:
result = _logTPL(m1, alpha, ml, mh) + _logPLm2(m2, beta, ml) + _logC_PL(m1, beta, ml)
else:
result = -np.inf
return result
The execution times are reported in the picture here below.
Can someone explain me why this happens?
I’m interested in the @vectorize decorator because, as far as I understand, it can take “cuda” as a target and thus run on the GPU (am I wrong?).