Numba parallel loops does not execute by ascending index number unlike CUDA

kherzieandal · May 26, 2023, 7:20am

When I try to run a parallel loop using numba and print the index i,j – the result is random sequence of the index. I’m just wondering if there’s a way to implement this through their ascending index number such as when running with numba.cuda (all blocks and threads are done in ascending order).

Code:

from numba import njit, prange
import numpy as np

A = np.ones((3, 3))

@njit(parallel=True)
def sum(array):
    s = 0
    for i in prange(array.shape[0]):
        for j in prange(array.shape[0]):
            print(i, j)
            s += array[i,j]
    return s

Result:

Thanks!

gmarkall · May 26, 2023, 9:29pm

Not sure about the answer to your question, but:

such as when running with numba.cuda (all blocks and threads are done in ascending order).

This isn’t the case - blocks and threads are not scheduled in ascending order in CUDA, you cannot expect any ordering in the scheduling in particular.

kherzieandal · May 27, 2023, 4:30am

I think I got the wrong implementation of my code and understanding of CUDA. I was trying to use random_seed() with parallel=True so that each index in the array would have a corresponding random value generated from the random seed. I’ll try other ways to implement this. Thank you!

cako · June 2, 2023, 6:41am

Also note that in nested pranges, all but the outermost prange is converted into a standard range, that is, they are serialized. See Loop serialization in Automatic parallelization with @jit — Numba 0.57.0+0.g4fd4e39c6.dirty documentation

Topic		Replies	Views
Weird parallel prange behaviour Community Support	14	2112	July 22, 2020
Numba Prange Not Working as Expected Community Support	3	818	May 30, 2022
Issue with Parallel Execution of Numba prange Community Support	1	205	August 20, 2023
Prange loop with a call for i+1 Support: How do I do ...?	4	306	June 14, 2021
Help improving performance of embarassingly parallel loop Community Support	8	178	February 28, 2024

Numba parallel loops does not execute by ascending index number unlike CUDA

Related Topics