It was a version issue
The simple example above was on version 0.53, which was used because of this issues.
https://github.com/numba/numba/issues/8172
https://github.com/numba/numba/issues/8398
On newer versions (0.56) both simple examples are working.
Actual code in version 0.53
import numba as nb
import numpy as np
float_type = np.float32
#float_type = np.float64
itot = 384;
jtot = 384;
ktot = 384;
ncells = itot*jtot*ktot;
at = np.zeros((ktot, jtot, itot), dtype=float_type)
a = np.random.rand(ktot, jtot, itot)
a = a.astype(float_type)
@nb.njit(["(float32[:,:,::1])(float32[:,:,::1], float32[:,:,::1], float32, float32, float32, float32)",
"(float64[:,:,::1])(float64[:,:,::1], float64[:,:,::1], float64, float64, float64, float64)"],)
def diff_1(at, a, visc, dxidxi, dyidyi, dzidzi):
ktot, jtot, itot=at.shape
for k in range(1, ktot-1):
for j in range(1, jtot-1):
for i in range(1, itot-1):
at[k, j, i] = visc * (
+ ( (a[k+1, j , i ] - a[k , j , i ])
- (a[k , j , i ] - a[k-1, j , i ]) ) * dxidxi
+ ( (a[k , j+1, i ] - a[k , j , i ])
- (a[k , j , i ] - a[k , j-1, i ]) ) * dyidyi
+ ( (a[k , j , i+1] - a[k , j , i ])
- (a[k , j , i ] - a[k , j , i-1]) ) * dzidzi )
return at
@nb.njit(["(float32[:,:,::1])(float32[:,:,::1], float32[:,:,::1], float32, float32, float32, float32)",
"(float64[:,:,::1])(float64[:,:,::1], float64[:,:,::1], float64, float64, float64, float64)"])
def diff_2(at, a, visc, dxidxi, dyidyi, dzidzi):
ktot, jtot, itot=at.shape
for k in range(ktot-2):
for j in range(jtot-2):
for i in range(itot-2):
at[k+1, j+1, i+1] += visc * (
+ ( (a[k+2, j+1, i+1] - a[k+1, j+1 , i+1])
- (a[k+1, j+1, i+1] - a[k , j+1 , i+1]) ) * dxidxi
+ ( (a[k+1, j+2, i+1] - a[k+1, j+1 , i+1])
- (a[k+1, j+1, i+1] - a[k+1, j , i+1]) ) * dyidyi
+ ( (a[k+1, j+1, i+2] - a[k+1, j+1 , i+1])
- (a[k+1, j+1, i+1] - a[k+1, j+1 , i ]) ) * dzidzi )
return at
There only diff_2 was working properly (~30ms), diff_1 was much slower at ~200ms. Therefore I expected a wraparound problem.
Version 0.56
Both implementations are quite slow, which wasn’t unexpected because of the issues above.
With this fix https://github.com/numba/numba/issues/8172#issuecomment-1160474583, both implementations are showing the expected performance of ~30ms.