Parallelized code yield different results, what to be careful for?

Benjamin-Fouquet · January 28, 2021, 10:55am

Hello everybody,

I have a code I got from a Phd student that I try to parallelize using Numba. The result given when parallelized is different from the one without (both in Nopython and object mode). As I am starting with Numba, what should I check for to understand why it is failing ? I already checked for a race condition on the loop, but it seems to be none. Any checklist of “what could go wrong” during parallelization ?

Thanks in advance !

Hannes · January 28, 2021, 1:16pm

Hello

Without an example of your code this will be very difficult to answer in general.

I am recalling an issue that was raised recently: Parallel loop prange gives wrong outcomes · Issue #6597 · numba/numba · GitHub

Could your problem be related to the bug discussed there?

Also: How different are the results? Significantly different or more on the order of machine precision?

Benjamin-Fouquet · January 29, 2021, 8:52am

Hi Hannes,

Thanks for the swift answer. So far I could not reproduce the bug on a smaller scale and the original code is quite long. I’ll post it if I can get something reasonably short.

We are talking about a 20% difference in expected values give or take

goldmosh · January 29, 2021, 11:10am

Hans, Hi.

What you describe in your post is a known problem in parallel programming: parallel programs are non deterministic because of their unordered execution (two runs of the same parallel program, on the same input data, will give different results because of the non-sequential order of execution of its statements). In order to overcome this problem, the research group of which I am member, at the our FlexComp Lab, developed what we called “Flexible Algorithms” (FAs) or “Flexible Computation” (FC). This approach to parallel computation ensures deterministic results, independent of the order of execution. For more details about our work, go to our website (www.flexcomp.jct.ac.il), where you will be able to find all our publications and also two version of the pre-compiler of our Embedded Flexible Language (EFL) which allows you to write your parallel programs according to the FC approach. We have two versions for the EFL pre-compiler. Now that I know about Numba, it will be a good idea to have a Numba version of the EFL pre-compiler.
I hope all the above will help you, solving your problem. I am ready to collaborate with you, too.

All the best.
Moshe Goldstein

stuartarchibald · January 29, 2021, 1:56pm

This sounds like it could be a Numba bug, it is entirely and randomly wrong (on average out by X with a large deviation), systematically wrong (all out by about X with a small deviation) or just part of it is wrong (most values are ok but a subsection are massively out in some way)?

Just to try and narrow it down, you could…

See what happens if you turn on bounds checking, you can set it globally with that environment variable.
Check that the nopython compiled version produces the same result as with no jit decorator at all (you can disable the JIT globally with this env var
Try changing @njit(parallel=<dict>) where <dict> is key-value pairs for switching on and off various bits of parallel transform, valid keys are here Numba architecture — Numba 0+untagged.4124.gd4460fe.dirty documentation, the values are boolean.

Finally, if you can, try splitting your code into functions/smaller parts and testing those individually.

Hannes · January 29, 2021, 3:56pm

The only way I could think of a 20% difference occurring as a result of machine precision issues would be if there are places in your function where you encounter effects like catastrophic cancellation. With the unordered execution in parallel small errors could have big consequences for the final result. Just to throw another idea into the room

Topic		Replies	Views
Numba Prange Not Working as Expected Community Support	3	823	May 30, 2022
Weird parallel prange behaviour Community Support	14	2119	July 22, 2020
Would this result in a race condition? Community Support	5	240	August 31, 2021
Strange behaviour of parallel execution of function in which one result is correct and the other not Support: How do I do ...?	1	144	September 8, 2023
Advice in parallelizing Support: How do I do ...?	2	1319	September 5, 2022

Parallelized code yield different results, what to be careful for?

Related Topics