CUDA: Experimental branch with support for -dlcm=cg

gmarkall · August 31, 2022, 3:44pm

I have an experimental (quickly hacked together) branch with support for an option like the -dlcm=cg option to ptxas (the default load cache modifier, PTX Compiler APIs :: CUDA Toolkit Documentation). This is the branch: GitHub - gmarkall/numba at cuda-linker-options

You can use the dlcm kwarg to the @cuda.jit decorator with it, like so:

# Use with https://github.com/gmarkall/numba/tree/cuda-linker-options

from numba import cuda, float32, void


def axpy(r, a, x, y):
    start = cuda.grid(1)
    step = cuda.gridsize(1)

    for i in range(start, len(r), step):
        r[i] = a * x[i] + y[i]


sig = void(float32[::1], float32, float32[::1], float32[::1])
with_dlcm_cg = cuda.jit(sig, dlcm='cg')(axpy)
with_dlcm_ca = cuda.jit(sig, dlcm='ca')(axpy)
with_dlcm_xx = cuda.jit(sig)(axpy)

with open('dlcm_cg.sass', 'w') as f:
    f.write(with_dlcm_cg.inspect_sass()[sig.args])

with open('dlcm_ca.sass', 'w') as f:
    f.write(with_dlcm_ca.inspect_sass()[sig.args])

with open('dlcm_xx.sass', 'w') as f:
    f.write(with_dlcm_xx.inspect_sass()[sig.args])

then you can inspect the differences between the SASS, e.g.:

-        /*0c60*/                   LDG.E R9, [R8.64] ;
-        /*0c70*/                   LDG.E R16, [R16.64] ;
+        /*0c60*/                   LDG.E.STRONG.SM R9, [R8.64] ;
+        /*0c70*/                   LDG.E.STRONG.SM R16, [R16.64] ;

(or even better, measure performance!)

Questions:

Is this useful for others / should it be made into a proper PR?
Would other ptxas / linker options be helpful? E.g. dscm?

Topic		Replies	Views
Extending Numba for CUDA inside Cppyy Support: How do I do ...?	22	563	August 5, 2024
Add support for linking with in-memory PTX code Development	3	200	July 20, 2024
Numba cuda slower than cuda c -- turn off memory safety checks? Support: How do I do ...?	0	189	May 6, 2024
Making Awkward Arrays work in the CUDA target Community Support	4	1339	March 8, 2023
Equivalent of Tex2D or Tex3D and Texture Memory in Current Numba? Support: How do I do ...?	12	2569	July 20, 2022

CUDA: Experimental branch with support for -dlcm=cg

Related topics