Creating numpy arrays from arrays

mykd · January 29, 2023, 12:54pm

Hi, in reference to issue: #4470 , are there any workarounds to create numpy arrays from arrays. My sample code #4470 is:

import numpy as np
import numba

@numba.njit
def myfunc(mol1Coord,mol2Coord):

    for j in numba.prange(int(mol2Coord.shape[0]/3)):

        rIVec=np.array([mol1Coord[0],mol1Coord[0],mol1Coord[0],mol2Coord[3*j],mol2Coord[3*j]], dtype=numba.float64)
        rJVec=np.array([mol2Coord[3*j],mol2Coord[3*j+1],mol2Coord[3*j+2],mol1Coord[1],mol1Coord[2]], dtype=numba.float64)
        
        print(rIVec,rJVec)



dim=3; NI=3; NJ=12

vecI = np.random.random((NI, dim))
vecJ = np.random.random((NJ, dim))

print(myfunc(vecI,vecJ))

nelson2005 · January 29, 2023, 5:49pm

I don’t know to the answer to the specific question asked, but assuming the goal is to get rIVec an rJVec out of the function and you know all the dimensions, why not pre-allocate them and write into them inside the loop?

mykd · January 29, 2023, 6:28pm

So, my the dimension of vecI is fixed (3,3) and vecJ is a larger vector (3*N,3) where N is variable. To create rIVec I know the first three values (the first and the last two values for rJVec) , but the last two values are extracted by looping over mol2Coord/vecJ (same for rJVec). I can rewrite them as:

def myfunc(mol1Coord,mol2Coord):

    rIVec=np.zeros((5,3), dtype=numba.float64)
    rJVec=np.zeros((5,3), dtype=numba.float64)
    for j in numba.prange(int(mol2Coord.shape[0]/3)):

         rIVec[0] = rIVec[1] = rIVec[2] = mol1Coord[0]
        
         rIVec[3] = rIVec[4] = rJVec[0] = mol2Coord[3*j]

         rJVec[1] = mol2Coord[3*j+1]
        
         rJVec[2] = mol2Coord[3*j+2]

         rJVec[3] = mol1Coord[1]
        
         rJVec[4] = mol1Coord[2]
 
 

    return rIVec, rJVec

Is this what you had in your mind? I am not sure if this is efficient in numba.prange ?

nelson2005 · January 30, 2023, 3:51pm

I would think you would want to add a ‘j’ dimension to your numpy arrays.

mykd · January 30, 2023, 4:17pm

Do you mean adding an additional j dimension in rIVec/rJVec? The prange loop is used to extract a smaller subset of coordinates from rJVec and in my complete function I don’t actually need to preserve rIVec/rJVec beyond the outer loop. Can you give an example what do you mean?

nelson2005 · January 30, 2023, 4:29pm

Not really… I don’t understand what you’re trying to do based on the examples and descriptions

mykd · January 30, 2023, 5:15pm

I am attaching another bit of code:

import numpy as np
import numba
import time
from numba import njit

@njit(cache=True)
def pbc_r2(i,j,cellsize):
    inv_cellsize = 1.0 / cellsize
    xdist = j[0]-i[0]
    ydist = j[1]-i[1]
    zdist = j[2]-i[2]

    xk =  xdist-cellsize[0]*np.rint(xdist*inv_cellsize[0])
    yk =  ydist-cellsize[1]*np.rint(ydist*inv_cellsize[1])
    zk =  zdist-cellsize[2]*np.rint(zdist*inv_cellsize[2])

    return xk**2+yk**2+zk**2


@njit('float64[:](float64[:,:], float64[:,:], float64[:])',cache=True,parallel=True)
def pbc_r2_vec(i,j,cellsize):
    assert i.size == j.size
    r2=np.zeros((i.shape[0]),dtype=numba.float64)
    inv_cellsize = 1.0 / cellsize
    for idx in numba.prange(i.shape[0]):

        xdist = j[idx,0]-i[idx,0]
        ydist = j[idx,1]-i[idx,1]
        zdist = j[idx,2]-i[idx,2]
    
        xk =  xdist-cellsize[0]*np.rint(xdist*inv_cellsize[0])
        yk =  ydist-cellsize[1]*np.rint(ydist*inv_cellsize[1])
        zk =  zdist-cellsize[2]*np.rint(zdist*inv_cellsize[2])
     
        r2[idx]=xk**2+yk**2+zk**2
    return r2


#@njit('float64[:,:](float64[:,:], float64[:,:], float64[:])',cache=True, parallel=True)
@njit('float64[:,:](float64[:,:], float64[:,:], float64[:])', cache=True)
def myfunc(mol1Coord,mol2Coord,cellsize):

    r2Vec=np.zeros((int(mol2Coord.shape[0]/3),5), dtype=np.float64)
    for j in numba.prange(int(mol2Coord.shape[0]/3)):

        rIVec=np.zeros((5,3), dtype=np.float64)
        rJVec=np.zeros((5,3), dtype=np.float64)
        rIVec=np.zeros((5,3), dtype=np.float64)

        rIVec[0] = rIVec[1] = rIVec[2] = mol1Coord[0]
        rIVec[3] = rIVec[4] = rJVec[0] = mol2Coord[3*j]
        rJVec[1] = mol2Coord[3*j+1]
        rJVec[2] = mol2Coord[3*j+2]
        rJVec[3] = mol1Coord[1]
        rJVec[4] = mol1Coord[2]
  

        #r2Vec[0]=pbc_r2(rIVec[0],rJVec[0],cellsize)
        #r2Vec[1]=pbc_r2(rIVec[1],rJVec[1],cellsize)
        #r2Vec[2]=pbc_r2(rIVec[2],rJVec[2],cellsize)

        #r2Vec[3]=pbc_r2(rIVec[3],rJVec[3],cellsize)
        #r2Vec[4]=pbc_r2(rIVec[4],rJVec[4],cellsize)           
 
        
        r2Vec[j]=pbc_r2_vec(rIVec,rJVec,cellsize)

    return r2Vec


mol1Coord=np.array([[-3.51539114, -0.54503553, 13.00695019],
 [-3.73028985, -1.42658899, 12.47799689],
 [-4.06549919, -0.53207096, 13.91130745]])

mol2Coord=np.array([[-1.08085597, -2.37911769, 13.22645743],
 [-0.42385161, -2.1416567 , 13.96915362],
 [-1.90155566, -1.82187172, 13.16176887],
 [-3.76115629, -2.90836846, 11.50626325],
 [-3.38328776, -3.81238934, 11.42368819],
 [-4.46727316, -2.84933116, 10.74846771],
 [-1.81306444, -1.03291112, 10.46160559],
 [-1.97258684, -0.25140428, 11.11214725],
 [-2.46598435, -1.79344582, 10.6389096 ],
 [-5.13711051, -0.30008118, 15.31035477],
 [-5.36262698, -0.45602657, 16.30489416],
 [-6.05201763,  0.01835542, 14.87068595],
 [-1.86112701,  1.45820731, 12.22619147],
 [-2.45066116,  1.78841197, 11.49926148],
 [-2.4930514 ,  0.87405741, 12.82700715]])



cellsize=np.array([26.40,26.40,70])
timesave=[]
iterations=1000

print(myfunc(mol1Coord,mol2Coord,cellsize))
print("myfunc => ", myfunc.nopython_signatures)
print("pbc_r2_vec => ", pbc_r2_vec.nopython_signatures)

for i in range(iterations):
    start = time.perf_counter()
    myfunc(mol1Coord,mol2Coord,cellsize)
    timesave.append(time.perf_counter()-start)
print("Took {:.3g} ms".format(np.mean(timesave[:1])*1000.0))

So currently myfunc(mol1Coord,mol2Coord,cellsize) takes about 1.2 ms per execution while with parallel=True the time is 0.335 ms. I had initially thought that calling distance calculation pbc_r2 five times in the function might be contributing to the overhead so I tried to combine all distance calculations in one function, but there is no speedup.

sschaer · January 31, 2023, 12:16am

Dear @mykd

I see way too many things in your code that I would like to address. But I’m not quite sure I understand your problem correctly, so I’ll focus on the (in my opinion) most important ones.

Is the size of your arrays representative of your actual problem? This is very important, because it can make sense to parallelize either pbc_r2_vec or myfunc. Using prange in both makes no sense and slows down the execution drastically in your case.

It is very hard to say how to optimize your code, since the problem size is probably not realistic. I get much better performance running your example serially because your arrays are so small. It also looks like your task is pretty memory bound.

Based on your example, I can give you the tip not to parallelize at all. This already made it an order of magnitude faster for me.
If you run it serially, you can also rewrite myfunc like this:

@numba.njit
def myfunc(mol1Coord,mol2Coord,cellsize):
    niter = int(mol2Coord.shape[0]/3)
    r2Vec=np.empty((niter,5), dtype=np.float64)
    rIVec=np.empty((5,3), dtype=np.float64)
    rJVec=np.empty((5,3), dtype=np.float64)
    for j in range(niter):
        rIVec[0] = rIVec[1] = rIVec[2] = mol1Coord[0]
        rIVec[3] = rIVec[4] = rJVec[0] = mol2Coord[3*j]
        rJVec[1] = mol2Coord[3*j+1]
        rJVec[2] = mol2Coord[3*j+2]
        rJVec[3] = mol1Coord[1]
        rJVec[4] = mol1Coord[2]
        r2Vec[j]=pbc_r2_vec(rIVec,rJVec,cellsize)
    return r2Vec

However, the allocation outside the loop hardly matters because the arrays are so small and the loop is executed only a few times.

Maybe you can clear up my ambiguity, and I am sorry if I completely misunderstood your question.

Edit: Is there a reason why you allocate rIVec twice each iteration?

mykd · January 31, 2023, 12:30am

That should read r2Vec instead of rIVec…

mykd · January 31, 2023, 12:35am

Ok, in this case I guess I can use the serial equivalent for pbc_r2_vec. As you can see I needed to call the pbc_r2 five times in order to calculate the full distance array. So my thinking was that maybe it is faster to combine all vectors and then parse it to the function once.

sschaer · January 31, 2023, 9:48am

So the arrays really have these shapes in your real application? Then I understand your problem even less. Because your code executes super fast. Why would you bother optimizing it? The only reason I can see is that you call this piece of code many times. But then you’d want to parallelize and probably optimize at a larger level.

Also, it looks to me like you’re doing unnecessary copies in myfunc, E.g.
rIVec[0] = rIVec[1] = rIVec[2] = mol1Coord[0]
This copies the same values to multiple locations. Consequently, the same values enter your inner function more than once:
r2Vec[j]=pbc_r2_vec(rIVec,rJVec,cellize)
I don’t understand why you would want to do that, since pbc_r2_vec doesn’t change the data. You mentioned that there is a typo. Maybe it is because of that, and I am misunderstanding your code. That may well be the case, as I am admittedly a bit too lazy to think through in detail.

mykd · January 31, 2023, 7:14pm

Hi @sschaer, I tried to implement your suggestions in my code. I am attaching a link to my directory where you can see the whole test code.

In short, I am processing molecular simulation data for water molecules and labelling them based on their hydrogen bonding pattern. This will be further used for generating spectroscopic data over a long trajectory, hence I was trying to reduce any overhead.

With your help now my entire test function takes about 1.24 ms for a single simulation frame. I guess I might need to use multiprocessing or Dask for overall parallelization, but if you have any ideas or suggestions please let me know. I am sorry if our discussion felt weird or strange. Thanks for your help!!

luk-f-a · February 1, 2023, 9:22pm

one of the most reliable parallelization patterns is that you should only parallelize the outmost loop in your problem. That means that if this code will be inside another loop that you did not include here, then don’t add prange to any of this code and either use prange of the outer loop (if also in numba and multithreading works for you) or use joblib or dask if you want to use multiprocessing.

Luk

Topic		Replies	Views
Prange, race condition and writing to arrays : part 2 Support: How do I do ...?	6	508	January 27, 2023
Is numba suitable to map a list of array into an array in parallel? Support: How do I do ...?	9	2618	February 16, 2021
Weird parallel prange behaviour Community Support	14	2649	July 22, 2020
How do I parallelize this code? Support: How do I do ...?	13	1751	December 20, 2021
Feedback on tips for first-timers Community Support	14	976	August 15, 2023

Creating numpy arrays from arrays

Related topics