Need help with cannot determine Numba type

I have a python newbie question related to a compiler error I received. I am trying to refactor cuda atomic test cases to reduce code duplication.

In (https://github.com/numba/numba/blob/master/numba/cuda/tests/cudapy/test_atomics.py)

There are two functions that illustrate the code duplication I want to reduce:

def atomic_add(ary):
    tid = cuda.threadIdx.x
    sm = cuda.shared.array(32, uint32)
    sm[tid] = 0
    cuda.syncthreads()
    bin = ary[tid] % 32
    cuda.atomic.add(sm, bin, 1)
    cuda.syncthreads()
    ary[tid] = sm[tid]

def atomic_sub(ary):
    tid = cuda.threadIdx.x
    sm = cuda.shared.array(32, uint32)
    sm[tid] = 0
    cuda.syncthreads()
    bin = ary[tid] % 32
    cuda.atomic.sub(sm, bin, 1)
    cuda.syncthreads()
    ary[tid] = sm[tid]

The only difference between the function is the call to either “cuda.atomic.add” or “cuda.atomic.sub”. Recognizing this I wrote a function to capture the commonality which accepts a function pointer and rewrote “atomic_add”:

def atomic_binary_1dim_shared(ary, op2, ary_dtype, ary_nelements,
binop_func):
     tid = cuda.threadIdx.x
     sm = cuda.shared.array(ary_nelements, ary_dtype)
     sm[tid] = 0
     cuda.syncthreads()
     bin = ary[tid] % ary_nelements
     binop_func(sm, bin, op2)
     cuda.syncthreads()
     ary[tid] = sm[tid]

def atomic_add(ary):
     atomic_binary_1dim_shared(ary, 1, uint32, 32, cuda.atomic.add)

However when I try this I receive an error from the compiler:

Untyped global name ‘atomic_binary_1dim_shared’: cannot determine Numba type of <class ‘function’>

File “numba/cuda/tests/cudapy/test_atomics.py”, line 24:
def atomic_add(ary):
atomic_binary_1dim_shared(ary, 1, uint32, 32, cuda.atomic.add)

I could use some help understanding and resolving this problem.

none of the functions here seem to have the cuda.jit decorator, could that be the problem?

Could you share a complete example, ie something that could be executed by someone else locally?

I will try the cuda.jit decorator. To reproduce the problem, simply substitute my code in test_aatomics.py then run the atomics tests from the command line:

python -m numba.runtests

numba.cuda.tests.cudapy.test_atomics.TestCudaAtomics

if this is about numba’s own tests cases, then you should wait for a core dev to reply. I thought this was about your own code.

Hi @testhound,

I think some changes may need making to the CUDA target to do this refactor but not sure yet. The following provides examples of what does/does not work with a short explanation.

from numba import cuda
import numpy as np

def try_print(func, *args):
    pyfn = getattr(func, 'py_func', func)
    name = getattr(pyfn, '__name__', pyfn)
    print("Trying function: {}".format(name))
    try:
        func[1, 1](*args)
    except Exception as e:
        print("Exception: {}: {}".format(type(e), str(e)))
    else:
        print("Success!")

@cuda.jit
def foo(x, fn):
    pass

def bar():
    pass

@cuda.jit
def baz():
    pass

# Fail, compiler doesn't know what to do with a pure python function, this is
# the case on the CPU target too.
try_print(foo, 1, bar)

# Fail, compiler doesn't know what to do with a dispatcher instance
# (can't pass function as arg in CUDA).
try_print(foo, 1, baz)

@cuda.jit
def boz(x):
    pass

@cuda.jit
def foo2(x):
    boz(x)

# Fail, this is a kernel launching a kernel, unsupported
try_print(foo2, 1)

@cuda.jit(device=True)
def buz(x):
    pass

@cuda.jit
def foo3(x):
    buz(x)

# Ok, can call a device function directly from a kernel
try_print(foo3, 1)

@cuda.jit
def foo4(x, fn):
    fn(x)

# Fail, cannot call a device function supplied as an arg
# (same can't pass function as arg in CUDA problem as above).
try_print(foo4, 1, buz) 

Hope this helps.

The atomic_add and atomic_sub functions are compiled with the cuda.jit decorator (see TestCudaAtomics.test_atomic_add and TestCudaAtomics.test_atomic_sub - i.e. they are used as CUDA kernels, which can only call other jitted functions. However, in your modified version, the function atomic_binary_1dim_shared has no decorator, so it is a pure Python function. If you add the @cuda.jit decorator with device=True (to create a device function rather than a kernel) then you can replace atomic_add and atomic_sub with:

@cuda.jit(device=True)
def atomic_binary_1dim_shared(ary, op2, ary_dtype, ary_nelements, binop_func):
     tid = cuda.threadIdx.x
     sm = cuda.shared.array(ary_nelements, ary_dtype)
     sm[tid] = 0 
     cuda.syncthreads()
     bin = ary[tid] % ary_nelements
     binop_func(sm, bin, op2)
     cuda.syncthreads()
     ary[tid] = sm[tid]

def atomic_add(ary):
     atomic_binary_1dim_shared(ary, 1, uint32, 32, cuda.atomic.add)

def atomic_sub(ary):
    atomic_binary_1dim_shared(ary, 1, uint32, 32, cuda.atomic.sub)

with the tests continuing to pass:

$ python -m numba.runtests numba.cuda.tests.cudapy.test_atomics.TestCudaAtomics.test_atomic_add
.
----------------------------------------------------------------------
Ran 1 test in 0.140s

OK

$ python -m numba.runtests numba.cuda.tests.cudapy.test_atomics.TestCudaAtomics.test_atomic_sub
.
----------------------------------------------------------------------
Ran 1 test in 0.180s

OK

Thanks @gmarkall, much appreciated.