I am trying to calculate distance between two sets of 3D vectors in a toroidal (periodic boundaries) geometry:
#@njit('float32[:](float32[:,:], float32[:,:], float32[:])',cache=True,fastmath=True)
@njit(cache=True, parallel=False, fastmath=True, boundscheck=False, nogil=True)
def pair_vec_dist_test(pIList,pJList,cellsize):
#assert pIList.shape[1] == pJList.shape[1]
#assert cellsize.size == 3
dr2 = np.empty(int(pIList.shape[0]*pJList.shape[0]),dtype=np.float32)
inv_cellsize = 1.0 / cellsize
for i in numba.prange(pIList.shape[0]):
for j in range(pJList.shape[0]):
index = j + pJList.shape[0] * i
xdist = pJList[j,0]-pIList[i,0]
ydist = pJList[j,1]-pIList[i,1]
zdist = pJList[j,2]-pIList[i,2]
xk = xdist-cellsize[0]*np.rint(xdist*inv_cellsize[0])
yk = ydist-cellsize[1]*np.rint(ydist*inv_cellsize[1])
zk = zdist-cellsize[2]*np.rint(zdist*inv_cellsize[2])
dr2[index] = xk**2+yk**2+zk**2
return dr2
N1, N2, dim = 700, 1000, 3 # particles in 3D
cellsize=np.array([26.4,26.4,70.0]).astype("float32")
vecI = np.random.random((N1, dim)).astype("float32")
vecJ = np.random.random((N2, dim)).astype("float32")
pair_vec_dist_test(vecI,vecJ,cellsize)
I am trying to optimize this function as much as possible and so I have unrolled the loops, and applied the fastmath
and nogil
options. When I tried for parallel
option, my exec time jumped from 472 microsec to 11ms which is a bit surprising. Can someone help me understand what has gone wrong in this scenario?
I am also looking for further optimizations. I was suggested to make the code cache-friendly by loop tiling/blocking though I couldn’t see any significant changes. If you have any suggestions in mind, please let me know.