Re-execution of the same jitted function causes re-compilation for same type of data

rohaniitj · October 21, 2020, 10:28am

I am calculating the properties of two graphs such that the graph X has x1 nodes and x2 edges while graph Y has x2 nodes and y2 edges. To the jitted function, first time, I am passing the edges of X and, on re-execution of the code i.e. second time, I am passing the edges of Y. But these edges are in the form of ndarray.

Now, my observation is that with cache=True and nopython=True option, if I run the same code along with jitted function multiple times for both X and Y, I am observing the following. For X, the execution time is 82 seconds for first time and then nearly 66 seconds rest of the times of re-execution of the code. For Y, the execution time is 123 second for first time and then nearly 102 seconds rest of the times. I believe that the difference of 82-66=16 (for X) and 123-102=21 (for Y ) is because of the fact that, due to cache=True, the data is not compiled again for X and Y again, and hence I am able to get lesser time in rest of the executions for X and Y.

The reason, I am so much concerned about it, is that I have a third data set Z having 15 times the number of edges in Y. Hence, the execution of Z would take a lot of time if it would re-compile the code again for Z also. Note that the edges in X,Y and Z are ndarrays and, with the help of numba.typeof, I have obtained the data type of edge lists in the following code as array(int16, 2d, C)
The code is as follows.

@njit(cache=True)
def case1(edge_list,signs,i,j,rows,ppp,nnn,ppn_I,ppn_II,ppn_III,ppn_IV,pnn_I,pnn_II,pnn_III,pnn_IV):       
        for k in prange(j+1,rows):
                if (edge_list[i,1]==edge_list[k,0] and edge_list[j,1]==edge_list[k,1]) or (edge_list[i,1]==edge_list[k,1] and edge_list[j,1]==edge_list[k,0]):
                    if np.sum(signs[i]+signs[j]+signs[k])==3:               	
			ppp=ppp+1
                    elif np.sum(signs[i]+signs[j]+signs[k])==1:
                        x=np.zeros((3,1),dtype=np.int8)
                        if edge_list[i,1]==edge_list[k,0]:  
				x[0]=signs[i]
                        elif edge_list[j,1]==edge_list[k,0]:  		
				x[0]=signs[j]                        
                        if x[0,0]==1 and x[1,0]==1 and x[2,0]==-1: 		
				ppn_I=ppn_I+1
                        elif x[0,0]==1 and x[1,0]==-1 and x[2,0]==1: 
				ppn_II=ppn_II+1
                        elif x[0,0]==-1 and x[1,0]==1 and x[2,0]==1: 
				ppn_III=ppn_III+1                        
                    elif np.sum(signs[i]+signs[j]+signs[k])==-1:
                        x=np.zeros((3,1),dtype=np.int8)
                        if edge_list[i,1]==edge_list[k,0]: 			
				x[0]=signs[i]
                        elif edge_list[j,1]==edge_list[k,0]: 			
				x[0]=signs[j]                        
                        if x[0,0]==-1 and x[1,0]==-1 and x[2,0]==1:         	
				pnn_I=pnn_I+1
                        elif x[0,0]==-1 and x[1,0]==1 and x[2,0]==-1:      	
				pnn_II=pnn_II+1
                        elif x[0,0]==1 and x[1,0]==-1 and x[2,0]==-1:     	
				pnn_III=pnn_III+1                        
                    elif np.sum(signs[i]+signs[j]+signs[k])==-3:        	
			nnn=nnn+1      
        return ppp,nnn, ppn_I,ppn_II,ppn_III,ppn_IV,pnn_I,pnn_II,pnn_III,pnn_IV

@njit(cache=True)
def signed_directed_triangles_numba1(edge_list):    
    edge_list=edge_list[:,0:3]
    rows_edge_lists=edge_list.shape    
    rows=rows_edge_lists[0]   
    signs=np.zeros((rows_edge_lists[0],1),dtype=np.int8);
    for ii in range(0,rows_edge_lists[0]):       
        if edge_list[ii,2]>=0: 		            signs[ii]=1;
        elif edge_list[ii,2]<0:			    signs[ii]=-1;        
    edge_list=edge_list[:,0:2]
    extracted_triangles=np.zeros((rows_edge_lists[0]-2,10),dtype=np.uint16);
    for i in range(0,rows_edge_lists[0]-2):
        flag_i1_j1=0
        ppp,nnn=0,0
        ppn_I,ppn_II,ppn_III,ppn_IV=0,0,0,0
        pnn_I,pnn_II,pnn_III,pnn_IV=0,0,0,0
        for j in range(i+1,rows-1):#rows_edge_lists[0]-1):
            if edge_list[i,0]==edge_list[j,0]:
                flag_i1_j1=1
		ppp,nnn,ppn_I,ppn_II,ppn_III,ppn_IV,pnn_I,pnn_II,pnn_III,pnn_IV=case1(edge_list,signs,i,j,rows,ppp,nnn,ppn_I,ppn_II,ppn_III,ppn_IV,pnn_I,pnn_II,pnn_III,pnn_IV)
        extracted_triangles[i]=np.array([ppp,nnn,ppn_I,ppn_II,ppn_III,ppn_IV,pnn_I,pnn_II,pnn_III,pnn_IV],dtype=np.uint16)    
    triangles_each_type=np.sum(extracted_triangles,axis=0)
    print(triangles_each_type)
    return 

#edge_list = np.genfromtxt(r'C:\Users\alpha.csv', delimiter=",")
# edge_list=np.genfromtxt(r'C:\Users\bitcoinotc.csv', delimiter=",") 
# edge_list = np.genfromtxt(r'C:\Users\epinions.csv', delimiter=",");
#edge_list = edge_list.astype(np.int16)
edge_list=np.array([[7188,1,10,1],[430,1,10,2],[3134,1,10,3],[3026,1,10,4],[3010,1,10,5],[7188,5,10,6],[7188,3,5,7],[430,5,2,8],[3134,6,10,9],[3134,100,10,10],[1,5,-2,11],[1,5,-2,12],[1,3,5,13],[6,100,-4,14]],dtype=np.int16)
#edge_list=np.array([[6,2,4,1],[6,5,2,2],[1,15,1,4],[4,3,7,4],[13,16,8,5],[13,10,8,6],[7,5,1,7],[2,5,-1,8],[2,100,-10,9],[13,100,10,10],[16,10,-6,11],[16,100,-4,12],[5,100,-3,13],[13,200,-4,14],[100,200,3,15],[7,300,1,16],[7,400,1,17],[5,300,-5,18],[5,400,3,19],[111,222,-4,20]],dtype=np.int16)
signed_directed_triangles_numba1(edge_list)

Here X refers to the edge lists obtained by edge_list = np.genfromtxt(r’C:\alpha.csv’, delimiter=",")
Y refers to edge list obtained by edge_list=np.genfromtxt(r’C:\bitcoinotc.csv’, delimiter=",")
Z refers to edge list obtained by edge_list = np.genfromtxt(r’C:\epinions.csv’, delimiter=",");

Please note that I have shortened the code (and also skipped some other functions) such that the number of lines at here would not be much and my problem is being understood by the readers.

luk-f-a · October 21, 2020, 8:44pm

hi @rohaniitj, I ran your example. I cannot confirm the problem that you mentioned. I can see the cache working: placing the code on a stand alone script, the first time I ran it took longer, and subsequent executions ran much faster.
Are you running this from a jupyter notebook?

rohaniitj · October 21, 2020, 11:49pm

hi @luk-f-a, I am running the program from the Spyder in Anaconda and time is being displayed with the help of

x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

After seeing your reply, I ran again and I noted that, with

edge_list=np.array([[7188,1,10,1],[430,1,10,2],[3134,1,10,3],[3026,1,10,4],[3010,1,10,5],[7188,5,10,6],[7188,3,5,7],[430,5,2,8],[3134,6,10,9],[3134,100,10,10],[1,5,-2,11],[1,5,-2,12],[1,3,5,13],[6,100,-4,14]],dtype=np.int16)

I got the output on first time execution as

Output= [1 0 0 5 0 0 0 0 0 0]
Time= 14.676990995001688

On subsequent executions with respect to same input, the output remains same but the time gets reduced as

[1 0 0 5 0 0 0 0 0 0]
0.014539080999384169

When I re-executed the code with 2nd input i.e.

edge_list=np.array([[6,2,4,1],[6,5,2,2],[1,15,1,4],[4,3,7,4],[13,16,8,5],[13,10,8,6],[7,5,1,7],[2,5,-1,8],[2,100,-10,9],[13,100,10,10],[16,10,-6,11],[16,100,-4,12],[5,100,-3,13],[13,200,-4,14],[100,200,3,15],[7,300,1,16],[7,400,1,17],[5,300,-5,18],[5,400,3,19],[111,222,-4,20]],dtype=np.int16)

The output I got on first time execution with respect to this input is

[1 1 1 4 0 0 0 0 0 0]
15.15219955000066

On subsequent re-executions with respect to 2nd input, the time is reduced as

[1 1 1 4 0 0 0 0 0 0]
0.012335349001659779

My question was that when I have first time executed with first type of data

edge_list=np.array([[7188,1,10,1],[430,1,10,2],[3134,1,10,3],[3026,1,10,4],[3010,1,10,5],[7188,5,10,6],[7188,3,5,7],[430,5,2,8],[3134,6,10,9],[3134,100,10,10],[1,5,-2,11],[1,5,-2,12],[1,3,5,13],[6,100,-4,14]],dtype=np.int16)

a cached copy got save into the memory and it is re-used with respect to the jitted function for this same input but if I am changing the input to second type of data (who is having the same type of data type)

edge_list=np.array([[6,2,4,1],[6,5,2,2],[1,15,1,4],[4,3,7,4],[13,16,8,5],[13,10,8,6],[7,5,1,7],[2,5,-1,8],[2,100,-10,9],[13,100,10,10],[16,10,-6,11],[16,100,-4,12],[5,100,-3,13],[13,200,-4,14],[100,200,3,15],[7,300,1,16],[7,400,1,17],[5,300,-5,18],[5,400,3,19],[111,222,-4,20]],dtype=np.int16)

why the first time execution for the second data is not using the cached jitted function and instead of that it is re-compiling the jitted function (generated with the first data) again and thus generating the time of 15.15219955000066 seconds for 1st time (for first time execution of second data) and later generating (on re-execution of second data) the time of 0.012335349001659779.

It seems to me that the jitted function is bound to the data while compilation and hence, on giving the new data, it again gets compile with the new data. As far as I know for the compilation of languages like C, the code gets compiled while the data can be given at run time (or simply compilation procedure is independent of data)

luk-f-a · October 22, 2020, 8:46am

Sorry, I just cannot reproduce your problem. I added this code to the end of yours

edge_list=np.array([[7188,1,10,1],[430,1,10,2],[3134,1,10,3],[3026,1,10,4],[3010,1,10,5],[7188,5,10,6],[7188,3,5,7],[430,5,2,8],[3134,6,10,9],[3134,100,10,10],[1,5,-2,11],[1,5,-2,12],[1,3,5,13],[6,100,-4,14]],dtype=np.int16)

x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

edge_list=np.array([[6,2,4,1],[6,5,2,2],[1,15,1,4],[4,3,7,4],[13,16,8,5],[13,10,8,6],[7,5,1,7],[2,5,-1,8],[2,100,-10,9],[13,100,10,10],[16,10,-6,11],[16,100,-4,12],[5,100,-3,13],[13,200,-4,14],[100,200,3,15],[7,300,1,16],[7,400,1,17],[5,300,-5,18],[5,400,3,19],[111,222,-4,20]],dtype=np.int16)

x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

and got these results

[0 0 0 0 0 0 0 0 0 0]
4.913718208
[0 0 0 0 0 0 0 0 0 0]
0.00013826200000011113
[0 0 0 0 0 0 0 0 0 0]
0.00012260000000097193
[0 0 0 0 0 0 0 0 0 0]
0.00010855700000078627

Clearly the compiled function is stored in memory and re-used for the second input.

After it ran once, I ran it a second time, to see if the on-disk cache is also working. I can see the cache working

[0 0 0 0 0 0 0 0 0 0]
0.3244764819999997
[0 0 0 0 0 0 0 0 0 0]
7.669199999993381e-05
[0 0 0 0 0 0 0 0 0 0]
6.859100000022877e-05
[0 0 0 0 0 0 0 0 0 0]
6.319000000010178e-05

Are you working with the latest numba? Did you install it via conda or pip?

cheers,

rohaniitj · October 22, 2020, 9:31am

Hi @luk-f-a, your approach to execute the code is little different than mine and hence the output is different from mine. In my approach, I first run the code for first data while commenting the 2nd data, e.g.

edge_list=np.array([[7188,1,10,1],[430,1,10,2],[3134,1,10,3],[3026,1,10,4],[3010,1,10,5],[7188,5,10,6],[7188,3,5,7],[430,5,2,8],[3134,6,10,9],[3134,100,10,10],[1,5,-2,11],[1,5,-2,12],[1,3,5,13],[6,100,-4,14]],dtype=np.int16)

x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

# edge_list=np.array([[6,2,4,1],[6,5,2,2],[1,15,1,4],[4,3,7,4],[13,16,8,5],[13,10,8,6],[7,5,1,7],[2,5,-1,8],[2,100,-10,9],[13,100,10,10],[16,10,-6,11],[16,100,-4,12],[5,100,-3,13],[13,200,-4,14],[100,200,3,15],[7,300,1,16],[7,400,1,17],[5,300,-5,18],[5,400,3,19],[111,222,-4,20]],dtype=np.int16)

# x=time.perf_counter()
# signed_directed_triangles_numba1(edge_list)
# y=time.perf_counter()
# print(y-x)

# x=time.perf_counter()
# signed_directed_triangles_numba1(edge_list)
# y=time.perf_counter()
# print(y-x)

In this case, the output is

[1 0 0 5 0 0 0 0 0 0]
13.426606996000032
[1 0 0 5 0 0 0 0 0 0]
0.00022101300004351287

Then, I comment the first data and re-run the 2nd data as follows.

# edge_list=np.array([[7188,1,10,1],[430,1,10,2],[3134,1,10,3],[3026,1,10,4],[3010,1,10,5],[7188,5,10,6],[7188,3,5,7],[430,5,2,8],[3134,6,10,9],[3134,100,10,10],[1,5,-2,11],[1,5,-2,12],[1,3,5,13],[6,100,-4,14]],dtype=np.int16)
# x=time.perf_counter()
# signed_directed_triangles_numba1(edge_list)
# y=time.perf_counter()
# print(y-x)

# x=time.perf_counter()
# signed_directed_triangles_numba1(edge_list)
# y=time.perf_counter()
# print(y-x)

edge_list=np.array([[6,2,4,1],[6,5,2,2],[1,15,1,4],[4,3,7,4],[13,16,8,5],[13,10,8,6],[7,5,1,7],[2,5,-1,8],[2,100,-10,9],[13,100,10,10],[16,10,-6,11],[16,100,-4,12],[5,100,-3,13],[13,200,-4,14],[100,200,3,15],[7,300,1,16],[7,400,1,17],[5,300,-5,18],[5,400,3,19],[111,222,-4,20]],dtype=np.int16)
x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

x=time.perf_counter()
signed_directed_triangles_numba1(edge_list)
y=time.perf_counter()
print(y-x)

In this case, the output is

[1 1 1 4 0 0 0 0 0 0]
13.651586431000055
[1 1 1 4 0 0 0 0 0 0]
0.0007291729999678864

This is the reason that your output is different than mine. I shall follow your approach to generate the output but I was afraid of heating up of the machine or unresponsiveness of machine and hence, I have not thought about it. I shall follow this approach.

But my approach is somewhat equivalent to your approach, because you had also re-executed the same code again. I think the Numba makes a fresh compilation of the code if it finds modification in the source code after the compilation because, in your case, you re-execute the same code without modification and, in my case, I modify by commenting one of the input and un-commenting the other. So, Numba treats it as fresh input.

I am now working on the latest Numba (0.51.2) which I have updated on 20th October via Conda from version 0.43. But now, I am facing one new problem after the latest update. In the original codes

@njit(cache=True)
def case1(edge_list,signs,i,j,rows,ppp,nnn,ppn_I,ppn_II,ppn_III,ppn_IV,pnn_I,pnn_II,pnn_III,pnn_IV):

I posted in the first post on this thread, I was able to run earlier it in a parallel manner with the help of parallel=True and prange. But now, it is giving a following error .

TypingError: Internal error at <numba.core.typeinfer.CallConstraint object at 0x000001F463923C18>.
Failed in nopython mode pipeline (step: convert to parfors)
maximum recursion depth exceeded while calling a Python object
During: resolving callee type: type(CPUDispatcher(<function case1_flagi1_j1 at 0x000001F462EBC620>))

Enable logging at debug level for details.

This code was running successfully with desired output on 0.43 version of Numba with the help of parallel=True and prange but now I am getting this error. I don’t know how to roll back the updates from latest version to 0.43. Please guide me on it.

luk-f-a · October 22, 2020, 10:10am

in conclusion, it seems that as long as you don’t comment code out you’ll be able to enjoy the cache and lower compilation times.

rohaniitj · October 23, 2020, 6:26am

Please also guide in executing the code in a parallel manner on latest version of Numba 0.51.2 which I have updated from 0.43 and I have mentioned about this error on the previous post. My code was executing successfully with 0.43 but, with the latest version, I am getting the mentioned error.

luk-f-a · October 23, 2020, 7:34am

I don’t know what the problem is with the parfor. I also got the same error message.

rohaniitj · October 23, 2020, 8:06am

Then, it may be the bug in the latest version of Numba. Thanks for your help. I shall install the old version of Anaconda because I am not able to roll back the updates.

luk-f-a · October 23, 2020, 8:26am

please report it as a bug on Github. thanks!

Topic		Replies	Views
Passing first-type class jitted method is very slow Numba	1	219	October 13, 2021
Blurhash-numba project feedback Numba	4	363	October 3, 2020
Some questions about Numba behavior Community Support	1	389	October 18, 2021
Numba Warm-up Speed With Cached Functions Support: How do I do ...?	3	894	January 17, 2023
Why am i getting different performance speeds for the "same" decorator? Community Support	11	768	March 4, 2021

Re-execution of the same jitted function causes re-compilation for same type of data

Related Topics