CUDA Atomic Operations On Multiple Values

I can’t think of a way to do this unless you have a lock and in the critical section you compute and update P and I.

You can either write a lock with CAS. Or, like what you said in:

use .view to reinterpret cast a 32-bit float into uint32. Extend it to uint64 and bitwise join the two numbers. Use CAS to update a single uint64 slot for the combined P and I.

1 Like