Dynamically adjust size of cuda.local.array without environment variables

Hi @gmarkall, this is great. I hadn’t considered defining the cuda.jit decorated function within a gen_kernel function itself.

(edited to avoid shadowing issue mentioned by gmarkall)

Slightly modifying this seems to work as well:

import numpy as np
from numba import cuda, types

arr1 = np.zeros(2)
arr2 = np.zeros(3)

def gen_kernel(x):
    local_size = len(x)

    @cuda.jit
    def f():
        x = cuda.local.array(local_size, types.int32)
        print("Length of local array is", len(x))

    return f

gen_kernel(arr1)[1, 1]()
gen_kernel(arr2)[1, 1]()
cuda.synchronize()

If I want to be able to use f in other contexts, I can also do the following with a device function:

import numpy as np
from numba import cuda, types, jit

arr1 = np.zeros(2)
arr2 = np.zeros(3)

@jit
def f2(local_size):
    x = cuda.local.array(local_size, types.int32)
    print("Length of local array is", len(x))

def gen_kernel(x):
    local_size = len(x)

    @cuda.jit
    def f():
        return f2(local_size)
    return f

gen_kernel(arr1)[1, 1]()
gen_kernel(arr2)[1, 1]()
cuda.synchronize()

It seems to break down if I don’t pass in the local_size as a parameter explicitly to the device function.

import numpy as np
from numba import cuda, types, jit

arr1 = np.zeros(2)
arr2 = np.zeros(3)

@jit
def f2(arr):
    local_size = len(arr)
    x = cuda.local.array(local_size, types.int32)
    print("Length of local array is", len(x))

def gen_kernel(x):
    local_size = len(x)

    @cuda.jit
    def f():
        return f2(x)
    return f
gen_kernel(arr1)[1, 1]()
gen_kernel(arr2)[1, 1]()
cuda.synchronize()
Failed in cuda mode pipeline (step: nopython frontend)
Internal error at <numba.core.typeinfer.CallConstraint object at 0x000001A7FD29EEE0>.
e[1me[1me[1mmodule, class, method, function, traceback, frame, or code object was expected, got CPUDispatchere[0m
e[0me[1mDuring: resolving callee type: Function(<numba.cuda.compiler.DeviceDispatcher object at 0x000001A7FD29E7F0>)e[0m
e[0me[1mDuring: typing of call at c:\Users\sterg\Documents\GitHub\scratch\discover-size-check\mat_discover\ElM2D\local_array3.py (20)
e[0m
Enable logging at debug level for details.
e[1m
File "mat_discover\ElM2D\local_array3.py", line 20:e[0m
e[1m    def f():
e[1m        return f2(x)
e[0m        e[1m^e[0me[0m
  File "C:\Users\sterg\Documents\GitHub\scratch\discover-size-check\mat_discover\ElM2D\local_array3.py", line 27, in <module> (Current frame)
    gen_kernel(x)[1, 1]()

It’s not immediately obvious why this last one doesn’t work, so the workaround would seem to be that local_size needs to be passed in explicitly as a parameter:

import numpy as np
from numba import cuda, types, jit

arr1 = np.zeros(2)
arr2 = np.zeros(3)

@jit(debug=True)
def f2(arr, local_size):
    x = cuda.local.array(local_size, types.int32)
    print("Length of local array is", len(x))

def gen_kernel(x):
    local_size = len(x)
    @cuda.jit(debug=True)
    def f():
        return f2(x, local_size)
    return f

gen_kernel(arr1)[1, 1]()
gen_kernel(arr2)[1, 1]()
cuda.synchronize()

A minimal sacrifice though.