Numba cuda slower than cuda c -- turn off memory safety checks?

UmerHA · May 6, 2024, 10:02pm

Hi all,

I’d like to use numba cuda to write fast cuda kernels, because of its nice dev experience.

However, I find that numba-cuda is consistently slower than cuda-c. Comparing the ptx, this seems because numba-cuda adds memory safety checks.

My question: Is it possible to get cuda-c speed (by disabling memory safety checks), or is numba-cuda not meant as an alternative to cuda-c?

For example, for this toy kernel

@cuda.jit()
def mul2(x):
    x[cuda.threadIdx.x] *= 2.0

signature = (float32[:],)
ptx = cuda.compile_ptx_for_current_device(mul2, signature)

here’s a comparison of its ptx with the ptx of the equivalent cuda-c code:

The section “compute memory address” is way larger.

Thanks!

Topic		Replies	Views
Numba cuda: for vs while in kernel performance difference Community Support	1	1505	February 1, 2022
Writing to global memory is slow? Support: How do I do ...?	3	341	August 2, 2023
Blog: 28000x speedup with Numba.CUDA Showcase	1	1024	April 23, 2021
Usage of CUDA Python, Linear Algebra on GPU and Computational Code Community Support	7	3453	December 31, 2021
Numba for CUDA Programmers course released Announcements	0	730	April 23, 2021