@cuda.jit
def velocity_verlet(aa, dt_gpu, L_gpu, r_gpu, dr_gpu, a_gpu, v_gpu):
start_x = cuda.grid(1)
stride_x = cuda.gridsize(1)
tid = start_x
for i1 in range(start_x, r_gpu.shape[0], stride_x):
for i2 in range(i1+1, r_gpu.shape[0]):
for k in range(0,3): #Periodic boundary conditions
dr_gpu[k] = r_gpu[i1][k] - r_gpu[i2][k]
if dr_gpu[k] > L_gpu[k]/2:
dr_gpu[k] = dr_gpu[k] - L_gpu[k]
elif dr_gpu[k] < -L_gpu[k]/2:
dr_gpu[k] = dr_gpu[k] + L_gpu[k]
#LJ_potential_gpu(dr_gpu, aa)
#a_gpu[i1] = + aa # from i2 on i1
#a_gpu[i2] = - aa # from i1 on i2
for k in range(3):
v_gpu[i1][k] += a_gpu[i1][k] * dt_gpu/2;
The above is a cuda function and all of the inputs and outputs are device arrays defined as follows:
dt = 0.001
N = 400
r = np.zeros((N,3))
v = np.zeros((N,3))
dt_gpu = cuda.to_device(np.float32(dt))
L_gpu = cuda.to_device(np.float32(np.array([5, 5, 5])))
r_gpu = cuda.to_device(np.float32(r))
a_gpu = cuda.device_array(v.shape, dtype=np.float32) # Store calculated accelerations
v_gpu = cuda.device_array(v.shape, dtype=np.float32) # Store calculated speeds
dr_gpu = cuda.device_array(3, dtype=np.float32) # Store calculated dr
aa = cuda.device_array(3, dtype=np.float32)
velocity_verlet[128, 128](aa,dt_gpu, L_gpu, r_gpu, dr_gpu, a_gpu, v_gpu)
I however get an error related to the last line of my function where numba has trouble multiplying the scaler dt_gpu to the elements of a_gpu which it preceives as 0D float arrays (array(float32, 0d, C)) as per the following error. I am pretty sure this is a very simple issue and I’m a noob as I could not find anything on it online. Can someone tell me how to fix this?
TypingError: Failed in cuda mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function mul>) found for signature:
>>> mul(float32, array(float32, 0d, C))
There are 8 candidate implementations:
- Of which 2 did not match due to:
Operator Overload in function 'mul': File: unknown: Line unknown.
With argument(s): '(float32, array(float32, 0d, C))':
No match for registered cases:
* (int64, int64) -> int64
* (int64, uint64) -> int64
* (uint64, int64) -> int64
* (uint64, uint64) -> uint64
* (float32, float32) -> float32
* (float64, float64) -> float64
* (complex64, complex64) -> complex64
* (complex128, complex128) -> complex128
- Of which 6 did not match due to:
Overload of function 'mul': File: <numerous>: Line N/A.
With argument(s): '(float32, array(float32, 0d, C))':
No match.