Why numba.cuda has different results with CPU for loop?

BretHart · March 22, 2024, 3:14pm

I’ve been researching on CT reconstruction these days, which is solving a linear equations Ax=b. And weighted matrix A need compute first. Because the matrix A is very large, it needs to be computed by the GPU.

I take two different methods to compute A, and their results are identical in CPU for loop. But when i change for loop to parallel in numba.cuda, the two GPU results are different, and none of them identical with CPU results.

So I’m here for your help, how can I rewrite my code to correct the GPU result.

I have used some operators and functions in cuda kernel function, including:

math.ceil()
int()
< 
!=
* / + -
**
math.cos(), math.sin()
max(), min()

In addition, i have two questions:

why result of math.ceil() is int in CPU, but float in numba.cuda?
cpu computation has precision problem(as below), but why cpu can get correct result finally?

-1.3 - 0.1 = -1.4000000000000001
1.3 + 0.1 = 1.4000000000000001
math.floor(-2.7755575615628914e-14) = -1

shoud i use round function in GPU to truncate result?

BretHart · March 27, 2024, 11:29pm

its my fault. i forget to use atomic.add rather than + in back projection.

Topic		Replies	Views
Numba cuda: for vs while in kernel performance difference Community Support	1	1428	February 1, 2022
Usage of CUDA Python, Linear Algebra on GPU and Computational Code Community Support	7	3280	December 31, 2021
CPU vs GPU version Numba	2	438	July 28, 2020
Single thread GPU vs CPU performance as a function of calculation complexity Numba	4	1706	August 30, 2022
Unusual 20x slowdown between nearly identical calculations with CUDA Community Support	5	504	August 19, 2022

Why numba.cuda has different results with CPU for loop?

Related Topics