CUDA: cache compiled device functions separately

shaunc · February 8, 2024, 8:10pm

I have CUDA kernels which take over a minute, sometimes two, to compile. I am using caching, but when I am debugging a kernel, any change to a library function requires invalidating the cache. This slows debugging down considerably.

I’m wondering if, under the hood, there might be some way to reused (relink) compiled device functions so that it would be possible to invalidate only the particular device function I used when building a kernel?

At this point I am probably willing to mess around with the internals some, but probably don’t have ability and/or time to rewrite the compiler if that is what it really requires.

Topic		Replies	Views
@Cuda.jit is there cache=True? Support: How do I do ...?	3	1145	August 7, 2020
CUDA ctypes library Community Support	4	372	January 27, 2021
Cuda vs CPU maintenance Community Support	1	402	June 15, 2020
Since input type of kernel function could be np.array, so we dont need to do to_device explicitly? Support: How do I do ...?	2	573	November 2, 2022
Blog: 28000x speedup with Numba.CUDA Showcase	1	907	April 23, 2021

CUDA: cache compiled device functions separately

Related Topics