Parallelized code yield different results, what to be careful for?

Hello everybody,

I have a code I got from a Phd student that I try to parallelize using Numba. The result given when parallelized is different from the one without (both in Nopython and object mode). As I am starting with Numba, what should I check for to understand why it is failing ? I already checked for a race condition on the loop, but it seems to be none. Any checklist of “what could go wrong” during parallelization ?

Thanks in advance !

Hello :slight_smile:

Without an example of your code this will be very difficult to answer in general.

I am recalling an issue that was raised recently: Parallel loop prange gives wrong outcomes · Issue #6597 · numba/numba · GitHub

Could your problem be related to the bug discussed there?

Also: How different are the results? Significantly different or more on the order of machine precision?

Hi Hannes,

Thanks for the swift answer. So far I could not reproduce the bug on a smaller scale and the original code is quite long. I’ll post it if I can get something reasonably short.

We are talking about a 20% difference in expected values give or take

Hans, Hi.

What you describe in your post is a known problem in parallel programming: parallel programs are non deterministic because of their unordered execution (two runs of the same parallel program, on the same input data, will give different results because of the non-sequential order of execution of its statements). In order to overcome this problem, the research group of which I am member, at the our FlexComp Lab, developed what we called “Flexible Algorithms” (FAs) or “Flexible Computation” (FC). This approach to parallel computation ensures deterministic results, independent of the order of execution. For more details about our work, go to our website (www.flexcomp.jct.ac.il), where you will be able to find all our publications and also two version of the pre-compiler of our Embedded Flexible Language (EFL) which allows you to write your parallel programs according to the FC approach. We have two versions for the EFL pre-compiler. Now that I know about Numba, it will be a good idea to have a Numba version of the EFL pre-compiler.
I hope all the above will help you, solving your problem. I am ready to collaborate with you, too.

All the best.
Moshe Goldstein

This sounds like it could be a Numba bug, it is entirely and randomly wrong (on average out by X with a large deviation), systematically wrong (all out by about X with a small deviation) or just part of it is wrong (most values are ok but a subsection are massively out in some way)?

Just to try and narrow it down, you could…

  • See what happens if you turn on bounds checking, you can set it globally with that environment variable.
  • Check that the nopython compiled version produces the same result as with no jit decorator at all (you can disable the JIT globally with this env var
  • Try changing @njit(parallel=<dict>) where <dict> is key-value pairs for switching on and off various bits of parallel transform, valid keys are here Numba architecture — Numba 0+untagged.4124.gd4460fe.dirty documentation, the values are boolean.

Finally, if you can, try splitting your code into functions/smaller parts and testing those individually.

The only way I could think of a 20% difference occurring as a result of machine precision issues would be if there are places in your function where you encounter effects like catastrophic cancellation. With the unordered execution in parallel small errors could have big consequences for the final result. Just to throw another idea into the room :stuck_out_tongue: