In order to write GPU-based simulation code for using in deep learning, I am interested in NUMBA_CUDA.
The code attached below is an example code of addition that I simply wrote to check Numba’s parallel computation performance.
cuda that I run
grid_size = (N_x, N_y, N_z)
a = np.random.random(grid_size)
b = np.random.random(grid_size)
a_g = cuda.to_device(a)
b_g = cuda.to_device(b)
c_g = cuda.device_array_like(a)
@cuda.jit
def f(a, b, c):
xid, yid, zid = cuda.grid(3)
size = a.shape
if xid < size[0] and yid < size[1] and zid<size[2]:
c[xid][yid][zid] = a[xid][yid][zid] + b[xid][yid][zid]
threads_per_block = (16,16, 4)
blockspergrid_x = math.ceil(grid_size[0] / threads_per_block[0])
blockspergrid_y = math.ceil(grid_size[1] / threads_per_block[1])
blockspergrid_z = math.ceil(grid_size[2] / threads_per_block[2])
blocks_per_grid = (blockspergrid_x, blockspergrid_y, blockspergrid_z)
print(f"CUDA threads: {threads_per_block}, blocks: {blocks_per_grid}, grid_size: {grid_size}")
for i in range(iter):
start_time_cuda = time.time()
f[blocks_per_grid, threads_per_block](a_g, b_g, c_g)
cuda.synchronize()
end_time_cuda = time.time()
cuda_execution_time = end_time_cuda - start_time_cuda
print(f"CUDA Execution Time: {cuda_execution_time:.6f} seconds")
and Results
result 1
Used memory[GB]: 4.802609152
grid_size: (400, 400, 100), CUDA threads: (16, 16, 4), blocks: (25, 25, 25)
CUDA Execution Time: 0.004559 seconds
result 2
Used memory[GB]: 7.877230592
grid_size: (800, 800, 200), CUDA threads: (16, 16, 4), blocks: (50, 50, 50)
CUDA Execution Time: 0.038483 seconds
I wonder if it is acceptable to accept that increasing the GPU’s operation time proportionally as the matrix size increases is a problem caused by the small CUDA core of the running GPU.
I Used TITAN RTX (cudacore~4600). Will using rtx 3090 with more cuda cores fix that issue?
Or, should I use an approach that calculates parts of the matrix separately with multiple titanrtx?
Personally, I think it takes lot of time to calculate the matrix than I thought to create a grid-based simulation solver.