Hello everyone,
On Feb 25, Ashwin Srinath, senior software engineer at Nvidia, will present a talk on cuda.cooperative
. The talk will take place during the office-hour/dev-meeting time slot. The session will be recorded and published on our Youtube channel at https://www.youtube.com/@numba_jit. To join live, please see our calendar for instruction: Numba Dev Meetings.
Here’s the abstract for the talk:
Faster and simpler Numba CUDA kernels using CUB and cuda.cooperative
Numba has become the de facto standard tool for writing CUDA kernels in Python, providing easy access to many CUDA features.
However, a key feature set available in CUDA C++ that is missing from Numba CUDA are the generic collectives provided by the CUB library. These enable CUDA C++ developers to easily author complex kernels that are optimized for their specific CUDA architecture.
This presentation will introduce cuda.cooperative, a set of Numba CUDA extensions that bring CUB’s functionality to Numba. I’ll demonstrate how it simplifies writing Numba CUDA kernels for complex algorithms like segmented sorts and scans, reducing hundreds of lines of code to just a few -while achieving up to 2x speedup.