Can't use @cuda.reduce decorator inside Kernel

Andrii · April 12, 2024, 6:01am

I have a summ reduction function

@cuda.reduce def sum_reduce(a, b): return a + b

Which I want to use inside a Kernel:

@cuda.jit
def IC(res):
res[0] = sum_reduce(res)

However it returns an error:

ailed in cuda mode pipeline (step: nopython frontend)
Untyped global name ‘sum_reduce’: Cannot determine Numba type of <class ‘numba.cuda.kernels.reduction.Reduce’>

Is there a way to use @reduce decorated functions in Kernel?

gmarkall · April 17, 2024, 11:06am

Unfortunately you can’t use a @cuda.reduce function inside a kernel because the granularity of parallelism doesn’t match inside a kernel and outside of it - a @cuda.reduce function uses an entire grid to compute the reduction, whereas you program from the perspective of a single thread inside a kernel.

You would need to implement your own in-kernel reduction - a strategy for reasonable performance involves synchronizing within a warp, then within a block, then within the whole grid. This gist computes multiple reductions using warp-aggregated atomics - it’s not exactly what you need but might provide a bit of a useful reference.

If you need all threads to be able to synchronize and use the result of the reduction, you will need to use a grid group to synchronize once the reduction is complete. See Cooperative Groups — Numba 0+untagged.2155.g9ce83ef.dirty documentation

gmarkall · April 17, 2024, 11:07am

I forgot to mention, better support for in-kernel reductions is a work-in-progress - unfortunately I don’t have anything I can share right now, but we would like it to be easy to write in-kernel reductions in future.

Topic		Replies	Views
Cuda vs CPU maintenance Community Support	1	446	June 15, 2020
Best practices for developing large Numba-dependent projects with both CPU and GPU compatibility Numba	0	358	October 20, 2021
Kernel within a kernel Support: How do I do ...?	6	993	September 30, 2022
Cannot Able to access another function in Numba? Support: How do I do ...?	1	331	September 4, 2020
Extending Numba for CUDA inside Cppyy Support: How do I do ...?	20	82	July 26, 2024

Can't use @cuda.reduce decorator inside Kernel

Related Topics