Hi @gmarkall, this is great. I hadn’t considered defining the cuda.jit
decorated function within a gen_kernel
function itself.
(edited to avoid shadowing issue mentioned by gmarkall)
Slightly modifying this seems to work as well:
import numpy as np
from numba import cuda, types
arr1 = np.zeros(2)
arr2 = np.zeros(3)
def gen_kernel(x):
local_size = len(x)
@cuda.jit
def f():
x = cuda.local.array(local_size, types.int32)
print("Length of local array is", len(x))
return f
gen_kernel(arr1)[1, 1]()
gen_kernel(arr2)[1, 1]()
cuda.synchronize()
If I want to be able to use f in other contexts, I can also do the following with a device function:
import numpy as np
from numba import cuda, types, jit
arr1 = np.zeros(2)
arr2 = np.zeros(3)
@jit
def f2(local_size):
x = cuda.local.array(local_size, types.int32)
print("Length of local array is", len(x))
def gen_kernel(x):
local_size = len(x)
@cuda.jit
def f():
return f2(local_size)
return f
gen_kernel(arr1)[1, 1]()
gen_kernel(arr2)[1, 1]()
cuda.synchronize()
It seems to break down if I don’t pass in the local_size
as a parameter explicitly to the device function.
import numpy as np
from numba import cuda, types, jit
arr1 = np.zeros(2)
arr2 = np.zeros(3)
@jit
def f2(arr):
local_size = len(arr)
x = cuda.local.array(local_size, types.int32)
print("Length of local array is", len(x))
def gen_kernel(x):
local_size = len(x)
@cuda.jit
def f():
return f2(x)
return f
gen_kernel(arr1)[1, 1]()
gen_kernel(arr2)[1, 1]()
cuda.synchronize()
Failed in cuda mode pipeline (step: nopython frontend)
Internal error at <numba.core.typeinfer.CallConstraint object at 0x000001A7FD29EEE0>.
e[1me[1me[1mmodule, class, method, function, traceback, frame, or code object was expected, got CPUDispatchere[0m
e[0me[1mDuring: resolving callee type: Function(<numba.cuda.compiler.DeviceDispatcher object at 0x000001A7FD29E7F0>)e[0m
e[0me[1mDuring: typing of call at c:\Users\sterg\Documents\GitHub\scratch\discover-size-check\mat_discover\ElM2D\local_array3.py (20)
e[0m
Enable logging at debug level for details.
e[1m
File "mat_discover\ElM2D\local_array3.py", line 20:e[0m
e[1m def f():
e[1m return f2(x)
e[0m e[1m^e[0me[0m
File "C:\Users\sterg\Documents\GitHub\scratch\discover-size-check\mat_discover\ElM2D\local_array3.py", line 27, in <module> (Current frame)
gen_kernel(x)[1, 1]()
It’s not immediately obvious why this last one doesn’t work, so the workaround would seem to be that local_size
needs to be passed in explicitly as a parameter:
import numpy as np
from numba import cuda, types, jit
arr1 = np.zeros(2)
arr2 = np.zeros(3)
@jit(debug=True)
def f2(arr, local_size):
x = cuda.local.array(local_size, types.int32)
print("Length of local array is", len(x))
def gen_kernel(x):
local_size = len(x)
@cuda.jit(debug=True)
def f():
return f2(x, local_size)
return f
gen_kernel(arr1)[1, 1]()
gen_kernel(arr2)[1, 1]()
cuda.synchronize()
A minimal sacrifice though.