This, amazingly, works:
from numba import cuda
import numpy as np
@cuda.jit
def kernel(arr):
idx = cuda.grid(1)
print(len(arr))
n = 100
arr = cuda.to_device(np.arange(n))
threadsperblock = 32
blockspergrid = (n + threadsperblock - 1) // threadsperblock
kernel[blockspergrid, threadsperblock](arr)
Printing 100. How is this possible? Internally, how is the shape of the array arr passed into the kernel?