Understanding memory usage of a kernel

Hi

I am trying to understand the measurement of memory usage in Numba.

I am allocating here an array with 10^9 float 32 numbers, which I expect to allocate 4gb of memory.

However, when trying to measure the allocation, i get an apparenty much smaller allocation.

MemoryInfo(free=6140526592, total=6241124352)

Here is the code I use:

from numba import cuda
import numpy as np

@cuda.jit
def cuda_run(arr):
	thread = cuda.grid(1)
	arr[thread] = 1.0

tot_calc = 10**9
arr = np.zeros(tot_calc, dtype = 'float32')

print(cuda.select_device(0))

threadsperblock = 128
blockspergrid = math.ceil(tot_calc / threadsperblock)
start = time.time()
cuda_run[blockspergrid, threadsperblock](arr)
end = time.time()

print(cuda.current_context().get_memory_info())

del cuda_run
print(cuda.current_context().get_memory_info())

cuda.current_context().reset()
print(cuda.current_context().get_memory_info())

cuda.driver.driver.reset()
print(cuda.current_context().get_memory_info())

Is this the expected behavior? Is my memory calculation wrong? Or the way I calculate it?

Thanks and any help is appreciated!

Ok, I think I managed to understand this myself. Apparently Numba in the background does all the necessary cleanup of memory, so the numbers before do not actually reflect proper memory allocation. To get an accurate measure, explicitly transferring the arrays to the device using the .to_device methods delivered the numbers I was expecting.

Still, any confirmation here would be very appreciated! Thanks!

Your explanation sounds correct - thanks for following up with the answer!