How do I parallelize this code?

stef1611 · December 19, 2021, 10:46am

Hello,

I am learning numba and I try to parallelize a code that calculates y=f(x,parameters) for a large number of x values. So x is chunked. The following code is an example that calculates y=a*exp(b*x) for 3 parameters (a,b) and a few x values but there is not parallelization. It is only to be sure that it works.

import numpy as np
from numba import jit, prange

@jit(('float64[:], float64[:,:], int64'), nopython=True, parallel=False)
def myFoo(x, par, numthreads):
     result = np.empty((par.shape[0],len(x)), dtype=np.float64)
     chunklen = (len(x) + numthreads - 1) // numthreads
     for i in range(numthreads):
         result[:,i * chunklen:(i + 1) * chunklen] = par[:,0:1]*np.exp(x[i * chunklen:(i + 1) * chunklen]*par[:,0:1])
     return result
     
     
 x=np.array(np.arange(0,10,1.0))
 par=np.array([[1.0,0.5],[2.0,3.0],[3.0,5.0]])
 myFoo(x, par, 4)

This code works. In my real problem, I have a few parameter sets and a lot ot x values. So I set the parallel flag to True and I replaced range by prange

import numpy as np
from numba import jit, prange

@jit(('float64[:], float64[:,:], int64'), nopython=True, parallel=True)
def myFoo(x, par, numthreads):
    result = np.empty((par.shape[0],len(x)), dtype=np.float64)
    chunklen = (len(x) + numthreads - 1) // numthreads
    for i in prange(numthreads):
        result[:,i * chunklen:(i + 1) * chunklen] = par[:,0:1]*np.exp(x[i * chunklen:(i + 1) * chunklen]*par[:,0:1])
    return result
    
    
x=np.array(np.arange(0,10,1.0))
par=np.array([[1.0,0.5],[2.0,3.0],[3.0,5.0]])
myFoo(x, par, 4)

And I got the following error

AssertionError: Sizes of $80binary_subscr.12, $106binary_subscr.26 do not match on /tmp/ipykernel_17011/815571837.py (9)

How can I solve it ?

Thanks for answer.

Hannes · December 19, 2021, 11:18am

Hi @stef1611,

could I briefly ask which version of numba you are using? I seem to be unable to reproduce the error you are experiencing.

Also slightly offtopic: Would you mind sharing, what encouraged you to define the function signature in the jit decorator? I see a lot of people doing this, especially when they are new to numba, and since it is generally not needed, and often maybe even discouraged, I am curious to understand why so many people define a signature when they could just be lazy instead :’)

Cheers!

stef1611 · December 19, 2021, 9:57pm

Hi Hannes,

My numba version is 0.53.1
I agree with you about the function signature. I think I do this because I program often in C. But, I am becoming lazy and it is better.

Cheers,

Stephane

stef1611 · December 19, 2021, 10:01pm

Hi Hannes,

The code I posted with paralllel=True and prange gives no error on your config ?

Sheers,

Stephane

Hannes · December 20, 2021, 7:18am

Hi Stephane,

I first tested on 0.54.0 and now on 0.55.0rc1 and I was a bit careless yesterday. The parallel code runs, but produces different (and wrong) results.
I think there is a good chance you have hit a bug here, I will try to play around a little more. But prange is known to be a bit fragile in some cases. There have been a few issues raised on Github in the past.

Cheers

PS: Oh and thanks for sharing your signature motivation

Hannes · December 20, 2021, 8:34am

Some observations:

Should be irrelevant for the problem, but I think you are indexing par wrong (you take the same column for both a and b)
If I run with parallel=True, but with range instead of prange I can reproduce your error!
Slightly refactoring your code and giving names to the various slices and chunks the error message becomes a bit more helpful. It is complaining about a size mismatch between the parameter slice and the x value slice. But this is only raised when trying to run in parallel and it should really be valid numpy broadcasting syntax.
If I run with parallel=False and prange the result is correct (that is expected since prange should just behave like range in that case)
There is an indexing problem in your loop. On the last iteration you are accessing elements that are out of bounds. Without compilation numpy seems to take care of this, but I don’t know why to be honest. Compiling with boundscheck=True does NOT catch this either, that is weird to me. I think it is because these are slices and not singular indices.
Fixing the indexing by limiting the column index for the last iteration does NOT solve the problem.
If I change result = np.empty(...) to result = np.zeros(...), most of the (seemingly random) entries in the result turn to 0 when running in parallel, I think this points at some problem with the value assignment in the loop. It almost looks as if only the last chunk fills the result array correctly.
If I extend the length of x by one (to make the last chunk have a length of 2 instead of 1, everything is broken)

So I get the vibe that there is more than one problem here, or it is some weird interaction

prange steamrolls the error messages I see with range, but the returned result it wrong
There seems to be a broadcasting issue when running with parallel=True
Probably because of the broadcasting issues the result values are not filled into the output array
boundscheck=True seems to behave like numpy but not necessarily as I expected for slices

Ping @stuartarchibald , @sklam, @sseibert : I feel like this is about as deep as I am able to dig, maybe one of you has an idea?

Last but not least, here is my modified version of your example. Unless I made a mistake it should do the same as your original code (EXCEPT for the out of bounds index, and the initial output allocation). The lazy f-strings I used are not supported in compiled mode yet, but you can uncomment them in pure Python (>=3.8) mode for a little debug output

import numpy as np
from numba import jit, prange, njit

@njit(parallel=True, boundscheck=True)
def myFoo(x, par, numthreads):
    n_par = par.shape[0]
    n_x = len(x)
    result = np.zeros((n_par, n_x), dtype=np.float64)
    chunklen = (len(x) + numthreads - 1) // numthreads
    
    a = par[:,0:1]
    b = par[:,0:1]  # This is probably wrong but should not have an effect on the problem
        
#     print(f"{n_par = }")
#     print(f"{n_x = }")
#     print(f"{result.shape = }")   
#     print(f"{chunklen = }")   
#     print(f"{a = }")
#     print(f"{b = }")
    
    for i in prange(numthreads):
        start_col = i * chunklen
        stop_col = min(n_x, (i + 1) * chunklen)  # Here I fixed the index issue

        x_chunk = x[start_col:stop_col]
        chunk = a*np.exp(x_chunk*b)    
        result[:,start_col:stop_col] = chunk
        
#         print(f"\nchunk_id = {i}")
#         print(f"{start_col = }")   
#         print(f"{stop_col = }")   
#         print(f"{chunk.shape = }")
    
    return result
    
    
x=np.array(np.arange(0,11,1.0))
par=np.array([[1.0,0.5],[2.0,3.0],[3.0,5.0]])
myFoo(x, par, 4)

stef1611 · December 20, 2021, 9:42am

Hi Hannes,

You are right. The second parameter must be par[:,1:2]. But I agree with you, it is irrevelant for the problem.

Concerning the problem of the size of last chunk, it is also surprising for me that it works. This code is based on from Examples — Numba 0.50.1 documentation. And I try to use prange instead of using threading explicitely. I planned to correct this problem. Thanks to have done it.

Something that perhaps could help you. This very very simple code.

from numba import jit
import numpy as np

@jit(nopython=True, parallel=True)
def myFoo(par,x):
    par[:,0:1]*x
    return 1

x=np.array(np.arange(0,10,2.0))
par=np.array([[i*10.0,i*4.0] for i in range(1,3)])
myFoo(par,x)

outputs the same error

AssertionError: Sizes of $18binary_subscr.10, x do not match on /tmp/ipykernel_33664/2489146421.py (6)

But it works when parallel is set to False.

Thanks to your help.

Cheers

Stephane

stef1611 · December 20, 2021, 10:07am

Hi,
In addition to my previous post. This code works

@jit(nopython=True, parallel=True)
def myFoo(par,x):
    print(np.outer(par[:,0],x))
    return 1

Hannes · December 20, 2021, 10:10am

Hi,

that’s a great simplification of the reproducer. To me this really looks like a binary operator broadcasting problem when working in parallel mode.

~~I think this should go to the issue tracker.~~ Nevermind, Graham was faster <3

Cheers

gmarkall · December 20, 2021, 10:10am

Many thanks for the discussion of the issue so far. I’ve created issues for the items brought up in the thread:

github.com/numba/numba

Fail to compile `prange` with slice in variable resulting in attempt to `getitem` with wrong number of indices

opened 10:02AM - 20 Dec 21 UTC

gmarkall

ParallelAccelerator bug - failure to compile

Whilst attempting to minimize the test case given in https://numba.discourse.gro…up/t/how-do-i-parallelize-this-code/1085/6?u=gmarkall I came across this issue. The following: ```python import numpy as np from numba import prange, njit @njit(parallel=True) def myFoo(x, par, numthreads): n_par = par.shape[0] n_x = len(x) result = np.zeros((n_par, n_x), dtype=np.float64) print(result.shape) chunklen = (len(x) + numthreads - 1) // numthreads for i in prange(numthreads): start = i * chunklen stop = (i + 1) * chunklen result[:, start:stop] = x[start:stop] * par[:] return result x = np.array(np.arange(0, 12, 1.0)) par = np.array([1.0, 2.0, 3.0]) print(myFoo.py_func(x, par, 4)) print(myFoo(x, par, 4)) ``` when run with `parallel=False` produces: ``` (3, 12) [[ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.] [ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.] [ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.]] (3, 12) [[ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.] [ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.] [ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.]] ``` When run with `parallel=True` gives: ``` (3, 12) [[ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.] [ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.] [ 0. 2. 6. 3. 8. 15. 6. 14. 24. 9. 20. 33.]] OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. Traceback (most recent call last): File "/home/gmarkall/numbadev/issues/discourse-1085/repro.py", line 24, in <module> print(myFoo(x, par, 4)) File "/home/gmarkall/numbadev/numba/numba/core/dispatcher.py", line 468, in _compile_for_args error_rewrite(e, 'typing') File "/home/gmarkall/numbadev/numba/numba/core/dispatcher.py", line 409, in error_rewrite raise e.with_traceback(None) numba.core.errors.TypingError: Failed in nopython mode pipeline (step: native lowering) Failed in nopython mode pipeline (step: nopython frontend) No implementation of function Function(<built-in function getitem>) found for signature: >>> getitem(array(float64, 1d, C), UniTuple(int64 x 2)) There are 22 candidate implementations: - Of which 20 did not match due to: Overload of function 'getitem': File: <numerous>: Line N/A. With argument(s): '(array(float64, 1d, C), UniTuple(int64 x 2))': No match. - Of which 2 did not match due to: Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 166. With argument(s): '(array(float64, 1d, C), UniTuple(int64 x 2))': Rejected as the implementation raised a specific error: NumbaTypeError: cannot index array(float64, 1d, C) with 2 indices: UniTuple(int64 x 2) raised from /home/gmarkall/numbadev/numba/numba/core/typing/arraydecl.py:88 During: typing of intrinsic-call at /home/gmarkall/numbadev/issues/discourse-1085/repro.py (16) File "repro.py", line 16: def myFoo(x, par, numthreads): <source elided> stop = (i + 1) * chunklen result[:, start:stop] = x[start:stop] * par[:] ^ During: lowering "id=3[LoopNest(index_variable = parfor_index.50, range = (0, numthreads, 1))]{80: <ir.Block at /home/gmarkall/numbadev/issues/discourse-1085/repro.py (13)>}Var(parfor_index.50, repro.py:13)" at /home/gmarkall/numbadev/issues/discourse-1085/repro.py (13) ``` Could you have a look at this please @DrTodd13 ?

github.com/numba/numba

Incorrect results with `prange` and sliced array

opened 10:05AM - 20 Dec 21 UTC

gmarkall

ParallelAccelerator bug - incorrect behavior

This is a simplified version of the reproducer from https://numba.discourse.grou…p/t/how-do-i-parallelize-this-code/1085/6?u=gmarkall. The following: ```python import numpy as np from numba import prange, njit @njit(parallel=True) def myFoo(x, par, numthreads): n_par = par.shape[0] n_x = len(x) result = np.zeros((n_par, n_x), dtype=np.float64) chunklen = (len(x) + numthreads - 1) // numthreads for i in prange(numthreads): start = i * chunklen stop = (i + 1) * chunklen result[:, start:stop] = x[start:stop] * par[:, 0:1] return result x = np.array(np.arange(0, 12, 1.0)) par = np.array([[1.0, 0.5], [2.0, 3.0], [3.0, 5.0]]) print(myFoo(x, par, 4)) print(myFoo.py_func(x, par, 4)) ``` when run with `parallel=False` gives: ``` [[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22.] [ 0. 3. 6. 9. 12. 15. 18. 21. 24. 27. 30. 33.]] [[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22.] [ 0. 3. 6. 9. 12. 15. 18. 21. 24. 27. 30. 33.]] ``` but when run with `parallel=True` gives: ``` [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22.] [ 0. 3. 6. 9. 12. 15. 18. 21. 24. 27. 30. 33.]] ``` Could you take a look please @DrTodd13 ?

github.com/numba/numba

`parallel=True` causes `AssertionError` in slicing even with no parallel transformation

opened 10:09AM - 20 Dec 21 UTC

gmarkall

ParallelAccelerator bug - failure to compile

A slightly modified version of the reproducer from https://numba.discourse.group…/t/how-do-i-parallelize-this-code/1085/7?u=gmarkall: ```python from numba import njit import numpy as np @njit(parallel=True) def myFoo(par,x): par[:,0:1]*x return 1 x=np.array(np.arange(0,10,2.0)) par=np.array([[i*10.0,i*4.0] for i in range(1,3)]) myFoo(par,x) ``` produces: ``` /home/gmarkall/numbadev/numba/numba/core/typed_passes.py:331: NumbaPerformanceWarning: The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible. To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help. File "repro2.py", line 5: @njit(parallel=True) def myFoo(par,x): ^ warnings.warn(errors.NumbaPerformanceWarning(msg, Traceback (most recent call last): File "/home/gmarkall/numbadev/issues/discourse-1085/repro2.py", line 11, in <module> myFoo(par,x) AssertionError: Sizes of $18binary_subscr.10, x do not match on /home/gmarkall/numbadev/issues/discourse-1085/repro2.py (6) ``` (running with `parallel=False` produces no error) Could you take a look please @DrTodd13 ?

stef1611 · December 20, 2021, 10:54am

Another addition. I tried to remove binary operator * for my program using np.outer, np.dot, np.multiply, … And I faced a new problem but related, I think.
This is my new minimalist reproducer of the problem. This code works with parallel set to False.

@jit(nopython=True, parallel=True)
def myFoo(par,x):
    print(np.outer(par[:,0],x))   # Ok, as before
    print(np.multiply(par[:,0:1],x)) # Sizes of $52binary_subscr.27, x do not match
    return 1

stef1611 · December 20, 2021, 10:57am

Hi,

Thanks a lot for your help. I posted another minimalist reproducer.

Best

Stephane.

stef1611 · December 20, 2021, 11:08am

@Hannes, @gmarkall

Not sure it is related but we never know (I am a new numba user so perhaps it is out context but it concerns slicing).

In python, to keep the second dimension to an array when I extract a column, I used
par[:,i,None] # or np.newaxis (it is the same)
But it does not work with numba (parallel or not parallel). So, I used this syntax
par[:,i:i+1]

Cheers

Hannes · December 20, 2021, 11:29am

@stef1611

The newaxis thingy is a known limitation. Numpy provides quite some indexing and slicing magic that goes beyond standard python abilities, and I guess reimplementing it is not always easy.

I had actually stumbled over that way of indexing because I didn’t know that it preserves the shape and I think that is a great trick In the past I always had to resort to using np.atleast_2d

Topic		Replies	Views
Advice in parallelizing Support: How do I do ...?	2	1810	September 5, 2022
Help improving performance of embarassingly parallel loop Community Support	8	666	February 28, 2024
Is numba suitable to map a list of array into an array in parallel? Support: How do I do ...?	9	2524	February 16, 2021
Numba Prange Not Working as Expected Community Support	3	1033	May 30, 2022
Prange, race condition and writing to arrays : part 2 Support: How do I do ...?	6	405	January 27, 2023

How do I parallelize this code?

Related topics