Creating numpy arrays from arrays

Hi, in reference to issue: #4470 , are there any workarounds to create numpy arrays from arrays. My sample code #4470 is:

import numpy as np
import numba

@numba.njit
def myfunc(mol1Coord,mol2Coord):

    for j in numba.prange(int(mol2Coord.shape[0]/3)):

        rIVec=np.array([mol1Coord[0],mol1Coord[0],mol1Coord[0],mol2Coord[3*j],mol2Coord[3*j]], dtype=numba.float64)
        rJVec=np.array([mol2Coord[3*j],mol2Coord[3*j+1],mol2Coord[3*j+2],mol1Coord[1],mol1Coord[2]], dtype=numba.float64)
        
        print(rIVec,rJVec)



dim=3; NI=3; NJ=12

vecI = np.random.random((NI, dim))
vecJ = np.random.random((NJ, dim))

print(myfunc(vecI,vecJ))

I don’t know to the answer to the specific question asked, but assuming the goal is to get rIVec an rJVec out of the function and you know all the dimensions, why not pre-allocate them and write into them inside the loop?

So, my the dimension of vecI is fixed (3,3) and vecJ is a larger vector (3*N,3) where N is variable. To create rIVec I know the first three values (the first and the last two values for rJVec) , but the last two values are extracted by looping over mol2Coord/vecJ (same for rJVec). I can rewrite them as:

def myfunc(mol1Coord,mol2Coord):

    rIVec=np.zeros((5,3), dtype=numba.float64)
    rJVec=np.zeros((5,3), dtype=numba.float64)
    for j in numba.prange(int(mol2Coord.shape[0]/3)):

         rIVec[0] = rIVec[1] = rIVec[2] = mol1Coord[0]
        
         rIVec[3] = rIVec[4] = rJVec[0] = mol2Coord[3*j]

         rJVec[1] = mol2Coord[3*j+1]
        
         rJVec[2] = mol2Coord[3*j+2]

         rJVec[3] = mol1Coord[1]
        
         rJVec[4] = mol1Coord[2]
 
 

    return rIVec, rJVec

Is this what you had in your mind? I am not sure if this is efficient in numba.prange ?

I would think you would want to add a ‘j’ dimension to your numpy arrays.

Do you mean adding an additional j dimension in rIVec/rJVec? The prange loop is used to extract a smaller subset of coordinates from rJVec and in my complete function I don’t actually need to preserve rIVec/rJVec beyond the outer loop. Can you give an example what do you mean?

Not really… I don’t understand what you’re trying to do based on the examples and descriptions

I am attaching another bit of code:

import numpy as np
import numba
import time
from numba import njit

@njit(cache=True)
def pbc_r2(i,j,cellsize):
    inv_cellsize = 1.0 / cellsize
    xdist = j[0]-i[0]
    ydist = j[1]-i[1]
    zdist = j[2]-i[2]

    xk =  xdist-cellsize[0]*np.rint(xdist*inv_cellsize[0])
    yk =  ydist-cellsize[1]*np.rint(ydist*inv_cellsize[1])
    zk =  zdist-cellsize[2]*np.rint(zdist*inv_cellsize[2])

    return xk**2+yk**2+zk**2


@njit('float64[:](float64[:,:], float64[:,:], float64[:])',cache=True,parallel=True)
def pbc_r2_vec(i,j,cellsize):
    assert i.size == j.size
    r2=np.zeros((i.shape[0]),dtype=numba.float64)
    inv_cellsize = 1.0 / cellsize
    for idx in numba.prange(i.shape[0]):

        xdist = j[idx,0]-i[idx,0]
        ydist = j[idx,1]-i[idx,1]
        zdist = j[idx,2]-i[idx,2]
    
        xk =  xdist-cellsize[0]*np.rint(xdist*inv_cellsize[0])
        yk =  ydist-cellsize[1]*np.rint(ydist*inv_cellsize[1])
        zk =  zdist-cellsize[2]*np.rint(zdist*inv_cellsize[2])
     
        r2[idx]=xk**2+yk**2+zk**2
    return r2


#@njit('float64[:,:](float64[:,:], float64[:,:], float64[:])',cache=True, parallel=True)
@njit('float64[:,:](float64[:,:], float64[:,:], float64[:])', cache=True)
def myfunc(mol1Coord,mol2Coord,cellsize):

    r2Vec=np.zeros((int(mol2Coord.shape[0]/3),5), dtype=np.float64)
    for j in numba.prange(int(mol2Coord.shape[0]/3)):

        rIVec=np.zeros((5,3), dtype=np.float64)
        rJVec=np.zeros((5,3), dtype=np.float64)
        rIVec=np.zeros((5,3), dtype=np.float64)

        rIVec[0] = rIVec[1] = rIVec[2] = mol1Coord[0]
        rIVec[3] = rIVec[4] = rJVec[0] = mol2Coord[3*j]
        rJVec[1] = mol2Coord[3*j+1]
        rJVec[2] = mol2Coord[3*j+2]
        rJVec[3] = mol1Coord[1]
        rJVec[4] = mol1Coord[2]
  

        #r2Vec[0]=pbc_r2(rIVec[0],rJVec[0],cellsize)
        #r2Vec[1]=pbc_r2(rIVec[1],rJVec[1],cellsize)
        #r2Vec[2]=pbc_r2(rIVec[2],rJVec[2],cellsize)

        #r2Vec[3]=pbc_r2(rIVec[3],rJVec[3],cellsize)
        #r2Vec[4]=pbc_r2(rIVec[4],rJVec[4],cellsize)           
 
        
        r2Vec[j]=pbc_r2_vec(rIVec,rJVec,cellsize)

    return r2Vec


mol1Coord=np.array([[-3.51539114, -0.54503553, 13.00695019],
 [-3.73028985, -1.42658899, 12.47799689],
 [-4.06549919, -0.53207096, 13.91130745]])

mol2Coord=np.array([[-1.08085597, -2.37911769, 13.22645743],
 [-0.42385161, -2.1416567 , 13.96915362],
 [-1.90155566, -1.82187172, 13.16176887],
 [-3.76115629, -2.90836846, 11.50626325],
 [-3.38328776, -3.81238934, 11.42368819],
 [-4.46727316, -2.84933116, 10.74846771],
 [-1.81306444, -1.03291112, 10.46160559],
 [-1.97258684, -0.25140428, 11.11214725],
 [-2.46598435, -1.79344582, 10.6389096 ],
 [-5.13711051, -0.30008118, 15.31035477],
 [-5.36262698, -0.45602657, 16.30489416],
 [-6.05201763,  0.01835542, 14.87068595],
 [-1.86112701,  1.45820731, 12.22619147],
 [-2.45066116,  1.78841197, 11.49926148],
 [-2.4930514 ,  0.87405741, 12.82700715]])



cellsize=np.array([26.40,26.40,70])
timesave=[]
iterations=1000

print(myfunc(mol1Coord,mol2Coord,cellsize))
print("myfunc => ", myfunc.nopython_signatures)
print("pbc_r2_vec => ", pbc_r2_vec.nopython_signatures)

for i in range(iterations):
    start = time.perf_counter()
    myfunc(mol1Coord,mol2Coord,cellsize)
    timesave.append(time.perf_counter()-start)
print("Took {:.3g} ms".format(np.mean(timesave[:1])*1000.0))


So currently myfunc(mol1Coord,mol2Coord,cellsize) takes about 1.2 ms per execution while with parallel=True the time is 0.335 ms. I had initially thought that calling distance calculation pbc_r2 five times in the function might be contributing to the overhead so I tried to combine all distance calculations in one function, but there is no speedup.

Dear @mykd

I see way too many things in your code that I would like to address. But I’m not quite sure I understand your problem correctly, so I’ll focus on the (in my opinion) most important ones.

Is the size of your arrays representative of your actual problem? This is very important, because it can make sense to parallelize either pbc_r2_vec or myfunc. Using prange in both makes no sense and slows down the execution drastically in your case.

It is very hard to say how to optimize your code, since the problem size is probably not realistic. I get much better performance running your example serially because your arrays are so small. It also looks like your task is pretty memory bound.

Based on your example, I can give you the tip not to parallelize at all. This already made it an order of magnitude faster for me.
If you run it serially, you can also rewrite myfunc like this:

@numba.njit
def myfunc(mol1Coord,mol2Coord,cellsize):
    niter = int(mol2Coord.shape[0]/3)
    r2Vec=np.empty((niter,5), dtype=np.float64)
    rIVec=np.empty((5,3), dtype=np.float64)
    rJVec=np.empty((5,3), dtype=np.float64)
    for j in range(niter):
        rIVec[0] = rIVec[1] = rIVec[2] = mol1Coord[0]
        rIVec[3] = rIVec[4] = rJVec[0] = mol2Coord[3*j]
        rJVec[1] = mol2Coord[3*j+1]
        rJVec[2] = mol2Coord[3*j+2]
        rJVec[3] = mol1Coord[1]
        rJVec[4] = mol1Coord[2]
        r2Vec[j]=pbc_r2_vec(rIVec,rJVec,cellsize)
    return r2Vec

However, the allocation outside the loop hardly matters because the arrays are so small and the loop is executed only a few times.

Maybe you can clear up my ambiguity, and I am sorry if I completely misunderstood your question.

Edit: Is there a reason why you allocate rIVec twice each iteration?

1 Like

That should read r2Vec instead of rIVec

Ok, in this case I guess I can use the serial equivalent for pbc_r2_vec. As you can see I needed to call the pbc_r2 five times in order to calculate the full distance array. So my thinking was that maybe it is faster to combine all vectors and then parse it to the function once.

So the arrays really have these shapes in your real application? Then I understand your problem even less. Because your code executes super fast. Why would you bother optimizing it? The only reason I can see is that you call this piece of code many times. But then you’d want to parallelize and probably optimize at a larger level.

Also, it looks to me like you’re doing unnecessary copies in myfunc, E.g.
rIVec[0] = rIVec[1] = rIVec[2] = mol1Coord[0]
This copies the same values to multiple locations. Consequently, the same values enter your inner function more than once:
r2Vec[j]=pbc_r2_vec(rIVec,rJVec,cellize)
I don’t understand why you would want to do that, since pbc_r2_vec doesn’t change the data. You mentioned that there is a typo. Maybe it is because of that, and I am misunderstanding your code. That may well be the case, as I am admittedly a bit too lazy to think through in detail.

1 Like

Hi @sschaer, I tried to implement your suggestions in my code. I am attaching a link to my directory where you can see the whole test code.

In short, I am processing molecular simulation data for water molecules and labelling them based on their hydrogen bonding pattern. This will be further used for generating spectroscopic data over a long trajectory, hence I was trying to reduce any overhead.

With your help now my entire test function takes about 1.24 ms for a single simulation frame. I guess I might need to use multiprocessing or Dask for overall parallelization, but if you have any ideas or suggestions please let me know. I am sorry if our discussion felt weird or strange. Thanks for your help!!

one of the most reliable parallelization patterns is that you should only parallelize the outmost loop in your problem. That means that if this code will be inside another loop that you did not include here, then don’t add prange to any of this code and either use prange of the outer loop (if also in numba and multithreading works for you) or use joblib or dask if you want to use multiprocessing.

Luk