I found that kernel function also accept np.array, it is recommended to use np.array as input to reduce the lines of code or even get performance gain?
No, the opposite - it’s recommended to use device arrays so that you don’t force implicit data transfers and synchronization with the device.
If you do pass NumPy arrays, Numba emits a warning:
NumbaPerformanceWarning: Host array used in CUDA kernel will incur copy overhead to/from device.
thanks for the advice! btw the, is there any way init device array with certain value? now I have to use numpy method and then move the data to gpu