Numba for CUDA Programmers course released

gmarkall · April 23, 2021, 3:43pm

A new tutorial covering the use of Numba for CUDA Programming is now available, at:

This is an adapted version of one delivered internally at NVIDIA - its primary audience is those who are familiar with CUDA C/C++ programming, but perhaps less so with Python and its ecosystem. That said, it should be useful to those familiar with the Python and PyData ecosystem - those unfamiliar with CUDA may want to build a base understanding by working through Mark Harris’s An Even Easier Introduction to CUDA blog post, and briefly reading through the CUDA Programming Guide Chapters 1 and 2 (Introduction and Programming Model).

The course is broken into 5 sessions:

Session 1: An introduction to Numba and CUDA Python

Covers the basics:

An introduction to Numba
CUDA kernels and ufuncs
CUDA memory management basics

Session 2: Typing

Explains Numba’s type system and how type inference works:

How to understand what the typing is doing
CUDA-specific typing issues and performance optimization through typing

Session 3: Porting strategies, performance, interoperability, debugging

Various tools and techniques for going from unoptimized Python code to an optimized CUDA implementation.

Red flags: watching out for code that won’t port well to CUDA
Step-by-step porting process: pure Python → Object mode → Nopython mode → CUDA → Optimization
Dealing with NumPy array operations and using CuPy
Interoperability with other CUDA Python libraries
Managing data movement
Useful components in the CUDA target
Performance measurement
Debugging

Session 4: Extending Numba

How to write an extension for the Numba CUDA target so you can use your own data types and classes in CUDA kernels. Includes dealing with the following for members, attributes, properties, methods, etc:

Typing
Data models
Lowering

Session 5: Memory Management

Explains how Numba’s internal memory management works, and how to replace it with your own memory management:

Internals: garbage collection, finalizers, deferred deallocation
Using an External Memory Management (EMM) Plugin such as the RAPIDS Memory Manager (RMM)
Writing an EMM Plugin, with examples:
- Using CuPy’s memory pool
- A simple wrapper around the C runtime API

Topic		Replies	Views
Usage of CUDA Python, Linear Algebra on GPU and Computational Code Community Support	7	3280	December 31, 2021
Graphics API interop Community Support	14	967	March 12, 2021
Tutorial on supporting Python User-Defined Functions in CUDA-accelerated Applications with Numba Showcase	0	445	March 25, 2022
BUG: Numba using a lot of GPU memory Development	1	558	October 12, 2020
CUDA ctypes library Community Support	4	405	January 27, 2021