Can't use basic NumPy functions with CUDA, like zeros or empty

Hello.

I’m a newbie with Python and Numba, and I’m quite stuck trying to use even the most basic functions in Numpy (things like empty, zeros or ones). This very simple code, for example:

from numba import cuda
import numba
import numpy as np

@cuda.jit
def my_kernel(G):
    G = np.zeros((128, 128), dtype=np.float64)

G_shape = (128, 128)
G = np.empty(G_shape)
d_G = cuda.device_array(G_shape, dtype=np.float64)

threadsperblock=32
blockspergrid = (G.size + (threadsperblock - 1))
my_kernel[blockspergrid, threadsperblock](d_G)
cuda.synchronize()

d_G.copy_to_host(G)

If I try to run it, Numba refuses to compile with this error:

Traceback (most recent call last):
  File "test.py", line 18, in <module>
    my_kernel[blockspergrid, threadsperblock](d_G)
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/cuda/compiler.py", line 798, in __call__
    kernel = self.specialize(*args)
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/cuda/compiler.py", line 809, in specialize
    kernel = self.compile(argtypes)
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/cuda/compiler.py", line 825, in compile
    **self.targetoptions)
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/cuda/compiler.py", line 61, in compile_kernel
    cres = compile_cuda(pyfunc, types.void, args, debug=debug, inline=inline)
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/cuda/compiler.py", line 50, in compile_cuda
    locals={})
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/compiler.py", line 551, in compile_extra
    return pipeline.compile_extra(func)
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/compiler.py", line 331, in compile_extra
    return self._compile_bytecode()
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/compiler.py", line 393, in _compile_bytecode
    return self._compile_core()
  File "/home/marc/anaconda3/envs/mudpy/lib/python2.7/site-packages/numba/compiler.py", line 373, in _compile_core
    raise e
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Use of unsupported NumPy function 'numpy.zeros' or unsupported use of the function.

File "test.py", line 9:
def my_kernel(G):
    G = np.zeros(shape=(128, 128), dtype=np.float64)
    ^

[1] During: typing of get attribute at test.py (9)

File "test.py", line 9:
def my_kernel(G):
    G = np.zeros(shape=(128, 128), dtype=np.float64)
    ^

According to the documentation, these methods are supported and should work. I’ve tried with and without dtype, I’ve tried with different types (both from Numba and NumPy), different sizes… Nothing seems to work. What am I doing wrong? Can anybody help me, please?

By the way, I’m using Numba 0.47, becauseI’m working with a Python 2.7 code base.

Marc

The list of supported Python features, libraries, and functions differs for the CUDA target. The list supported by the CUDA target is: Supported Python features in CUDA Python — Numba 0.53.0-py3.7-linux-x86_64.egg documentation

For this particular example, you can’t do anything that allocates memory on the GPU, like np.zeros - in general most NumPy functions that operate on arrays are also not available.

You will instead need to allocate G outside the kernel and pass it in (or you can use local / shared memory for small arrays that need only be used within the kernel).

Oh, OK. I didn’t know the subset of Python features was different for CUDA. Thanks!

At the very bottom of that doc is a list of supported numpy features, which includes the following:

scalar ufuncs that have equivalents in the math module; i.e. np.sin(x[0]) , where x is a 1D array.

However I tried running the following code block:

import numpy as np
from numba import cuda
import math

test_in2 = cuda.to_device(np.array([0, math.pi/3, math.pi]))
test_out2 = cuda.to_device(np.array([0,0,0]))

@cuda.jit
def test_math_sin_numpy(in_array, out_array):
    i = cuda.grid(1)
    if i < in_array.size:
        if np.sin(in_array[i]) < .5:
            out_array[i] = 100

test_math_sin_numpy.forall(test_in2.size)(test_in2, test_out2)
print(test_out2.copy_to_host())

And I got a similar error:

Traceback (most recent call last):
  File "test.py", line 16, in <module>
    test_math_sin_numpy.forall(test_in2.size)(test_in2, test_out2)
  File "/home/dev/workspace/numba/numba/cuda/compiler.py", line 493, in __call__
    kernel = self.kernel.specialize(*args)
  File "/home/dev/workspace/numba/numba/cuda/compiler.py", line 1051, in specialize
    specialization = Dispatcher(self.py_func, [types.void(*argtypes)],
  File "/home/dev/workspace/numba/numba/cuda/compiler.py", line 929, in __init__
    self.compile(sigs[0])
  File "/home/dev/workspace/numba/numba/cuda/compiler.py", line 1099, in compile
    kernel = _Kernel(self.py_func, argtypes, link=self.link,
  File "/home/dev/workspace/numba/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/dev/workspace/numba/numba/cuda/compiler.py", line 538, in __init__
    cres = compile_cuda(self.py_func, types.void, self.argtypes,
  File "/home/dev/workspace/numba/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/dev/workspace/numba/numba/cuda/compiler.py", line 163, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
  File "/home/dev/workspace/numba/numba/core/compiler.py", line 675, in compile_extra
    return pipeline.compile_extra(func)
  File "/home/dev/workspace/numba/numba/core/compiler.py", line 419, in compile_extra
    return self._compile_bytecode()
  File "/home/dev/workspace/numba/numba/core/compiler.py", line 483, in _compile_bytecode
    return self._compile_core()
  File "/home/dev/workspace/numba/numba/core/compiler.py", line 462, in _compile_core
    raise e
  File "/home/dev/workspace/numba/numba/core/compiler.py", line 453, in _compile_core
    pm.run(self.state)
  File "/home/dev/workspace/numba/numba/core/compiler_machinery.py", line 343, in run
    raise patched_exception
  File "/home/dev/workspace/numba/numba/core/compiler_machinery.py", line 334, in run
    self._runPass(idx, pass_inst, state)
  File "/home/dev/workspace/numba/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/dev/workspace/numba/numba/core/compiler_machinery.py", line 289, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "/home/dev/workspace/numba/numba/core/compiler_machinery.py", line 262, in check
    mangled = func(compiler_state)
  File "/home/dev/workspace/numba/numba/core/typed_passes.py", line 105, in run_pass
    typemap, return_type, calltypes, errs = type_inference_stage(
  File "/home/dev/workspace/numba/numba/core/typed_passes.py", line 83, in type_inference_stage
    errs = infer.propagate(raise_errors=raise_errors)
  File "/home/dev/workspace/numba/numba/core/typeinfer.py", line 1074, in propagate
    raise errors[0]
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Use of unsupported NumPy function 'numpy.sin' or unsupported use of the function.

File "test.py", line 13:
def test_math_sin_numpy(in_array, out_array):
    <source elided>
    if i < in_array.size:
        if np.sin(in_array[i]) < .5:
        ^

During: typing of get attribute at test.py (13)

File "test.py", line 13:
def test_math_sin_numpy(in_array, out_array):
    <source elided>
    if i < in_array.size:
        if np.sin(in_array[i]) < .5:
        ^

I am unsure if this is a bug or if this is incorrect usage. Replacing the call to np.sin with math.sin appears to make the code work properly.

This was built from source from github from the master branch.

Thanks for reporting this - it’s a bug. I’ve created NumPy Ufuncs on scalar values don't work in CUDA · Issue #7112 · numba/numba · GitHub accordingly.