How does Numba parallelize reductions (e.g. dot products)

burakmartin · January 26, 2021, 1:19pm

Hey,

Im not sure about how Numba parallelizes reductions. The docs state, that there are supported reductions such as “+=” . Which means, the following runs in parallel even though that should normaly result in a “race condition”:

@numba.njit(parallel=True)
def add():
    x = np.arange(0, 10000)
    res = 0
    for i in numba.prange(10000):
        res += x[i]

For research I’d like to know where I can find informations from a offical numba source relating to this so I can cite them. I scanned the docs, but I couldnt find explanations regarding to what exactly happens here.

Thanks in advanced

stuartarchibald · January 26, 2021, 1:51pm

Hi @burakmartin,

Perhaps have a read of the related developer docs here and here and see if that answers your question. I’m pretty sure that the description is in the latter link (former is for context)!

Hope this helps?

burakmartin · January 26, 2021, 2:25pm

Hey @stuartarchibald ,

Yes that helped! For anyone coming across the same question, this is the answer:

Parallel reductions are not natively provided by GUFuncs but the parfor lowering strategy allows us to use GUFuncs in a way that reductions can be performed in parallel. To accomplish this, for each reduction variable computed by a parfor, the parallel GUFunc and the code that calls it are modified to make the scalar reduction variable into an array of reduction variables whose length is equal to the number of Numba threads. In addition, the GUFunc still contains a scalar version of the reduction variable that is updated by the parfor body during each iteration. One time at the end of the GUFunc this local reduction variable is copied into the reduction array. In this way, false sharing of the reduction array is prevented. Code is also inserted into the main function after the parallel GUFunc has returned that does a reduction across this smaller reduction array and this final reduction value is then stored into the original scalar reduction variable.

Which, in my own words, means: Numba switches my reduction variable “res” in the given example into a array res = [0,0,0,…], and on every iteration, each thread, instead of working with the real reduction variable, works with a copy of that variable inside the array. At the end of each iteration, the original reduction variable is updated again, which for my example means, that the values inside the smaller array are reduced and the result of that is put into the real reduction variable.

Topic		Replies	Views
Help improving performance of embarassingly parallel loop Community Support	8	947	February 28, 2024
Is numba suitable to map a list of array into an array in parallel? Support: How do I do ...?	9	2621	February 16, 2021
Using dynamic shared memory Numba	6	1093	August 22, 2023
Parallel shift/roll operations with JIT Community Support	3	935	February 17, 2022
Unstable performance of arithmetic ops inside a rolling window Support: How do I do ...?	6	370	September 6, 2021

How does Numba parallelize reductions (e.g. dot products)

Related topics