Advice in parallelizing

FrancescAlted · September 5, 2022, 11:59am

Hi,

I am trying to parallelize the next code:

import numpy as np
import numba as nb
import math


# Params for array construction
shape = (4_000, 4_000)
dtype = np.float32

@nb.jit
def circle_filter(val: float, row: int, col: int, nrows: int, ncols: int) -> float:
    x = (2. * row / nrows) - 1.
    y = (2. * col / ncols) - 1.
    if ((x ** 2 + y ** 2) <= 1) and val >= 0.5:
        return 1.
    return math.nan

@nb.jit
#@nb.jit(parallel=True)
def circle_fun(out, vals, nrows: int, ncols: int) -> int:
    n = out.shape[0]
    m = out.shape[1]
    for i in range(n):
        for j in range(m):
            out[i, j] = circle_filter(vals[i, j], i, j, nrows, ncols)
    return 0

#@nb.jit   # this works!
@nb.jit(parallel=True)
def circle_fun2(out, vals, nrows: int, ncols: int) -> int:
    n, m = out.shape
    for i in range(n):
        for j in range(m):
            x = (2. * i / nrows) - 1.
            y = (2. * j / ncols) - 1.
            if ((x ** 2 + y ** 2) <= 1) and vals[i, j] >= 0.5:
                out[i, j] = 1.
            else:
                out[i, j] = math.nan
    return 0


def numpy_rand():
    rng = np.random.default_rng()
    return rng.random(shape, dtype=dtype)

rand_data = numpy_rand()

circle = np.empty(shape, dtype)

def numba_computations():
    circle_fun2(circle, rand_data, *shape)
    circle_fun2.parallel_diagnostics(level=4)

numba_computations()

def numpy_reductions():
    area_circle = np.nansum(circle)
    area_square = np.nansum(rand_data)
    print(f"PI value: {4 * area_circle / area_square}")

numpy_reductions()

But when executing, I am getting this warning:

NumbaPerformanceWarning: 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

After activating the parallel_diagnostics(), I am getting this output:

/Users/faltet/miniconda3/envs/iron-array-python/bin/python /Users/faltet/iarray/iron-array-python/iron-array-notebooks/blogs/numba-unable-paralellize.py 
/Users/faltet/miniconda3/envs/iron-array-python/lib/python3.9/site-packages/numba/core/typed_passes.py:329: NumbaPerformanceWarning: 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.readthedocs.io/en/stable/user/parallel.html#diagnostics for help.

File "numba-unable-paralellize.py", line 30:
@nb.jit(parallel=True)
def circle_fun2(out, vals, nrows: int, ncols: int) -> int:
^

  warnings.warn(errors.NumbaPerformanceWarning(msg,
 
================================================================================
 Parallel Accelerator Optimizing:  Function circle_fun2, 
/Users/faltet/iarray/iron-array-python/iron-array-notebooks/blogs/numba-unable-
paralellize.py (29)  
================================================================================
No source available
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel structure is already optimal.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 
---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found

Instruction hoisting:
No instruction hoisting found
--------------------------------------------------------------------------------
PI value: 3.1421794012211026

Process finished with exit code 0

which sheds not a great light to me.

Actually, I was trying to parallelize circle_fun(), which calls circle_filter(), but I was having the same problem, so this is why I went with fusion both into circle_fun2(), but the issue persists.

Any advice will be welcome!

gmarkall · September 5, 2022, 12:46pm

When using parallel=True, to parallelise loops you need to use prange to specify which loops to paralellise - see: Automatic parallelization with @jit — Numba 0.56.2+0.gd6731f6d2.dirty-py3.7-linux-x86_64.egg documentation - so maybe you would want to do this for your loop over j for example.

FrancescAlted · September 5, 2022, 4:44pm

Sure. I tend to forget that numba should magically parallelize range, when a explicit prange is actually needed. Thank you!

For what is worth the best results for me is when using prange in the outer loop, which makes sense.

Topic		Replies	Views
How do I parallelize this code? Support: How do I do ...?	13	1572	December 20, 2021
Help improving performance of embarassingly parallel loop Community Support	8	521	February 28, 2024
Is numba suitable to map a list of array into an array in parallel? Support: How do I do ...?	9	2424	February 16, 2021
Optimizing parallelized similarity function Support: How do I do ...?	8	880	June 1, 2023
Numba JIT becoming slower with List Support: How do I do ...?	3	503	April 13, 2022

Advice in parallelizing

Related Topics