Numba parallel loops does not execute by ascending index number unlike CUDA

When I try to run a parallel loop using numba and print the index i,j – the result is random sequence of the index. I’m just wondering if there’s a way to implement this through their ascending index number such as when running with numba.cuda (all blocks and threads are done in ascending order).

Code:

from numba import njit, prange
import numpy as np

A = np.ones((3, 3))

@njit(parallel=True)
def sum(array):
    s = 0
    for i in prange(array.shape[0]):
        for j in prange(array.shape[0]):
            print(i, j)
            s += array[i,j]
    return s

Result:

01 0
1 1
1 2
 0
0 1
0 2
2 0
2 1
2 2

Thanks!

Not sure about the answer to your question, but:

such as when running with numba.cuda (all blocks and threads are done in ascending order).

This isn’t the case - blocks and threads are not scheduled in ascending order in CUDA, you cannot expect any ordering in the scheduling in particular.

1 Like

I think I got the wrong implementation of my code and understanding of CUDA. I was trying to use random_seed() with parallel=True so that each index in the array would have a corresponding random value generated from the random seed. I’ll try other ways to implement this. Thank you!

1 Like

Also note that in nested pranges, all but the outermost prange is converted into a standard range, that is, they are serialized. See Loop serialization in Automatic parallelization with @jit — Numba 0.57.0+0.g4fd4e39c6.dirty documentation