For fast startup for @cuda.jit, provide full working detailed examples, for fast math

Thanks for the feedback. We are aware that the documentation is short on examples and would like to add more, but time and resources are quite constrained at the moment, which makes it difficult to add more alongside other ongoing work. Some of the feature documentation contains examples too, such as the RNG docs, the Cooperative Groups docs, and the Atomic operations docs, but I do think it would be good to have a collection of these all in one place.

Some other examples of using the CUDA target include:

  • Comcast Rapid IP Checker - this code is fairly short so it can be completely understood in a short amount of time, yet performs a useful function (checking whether a list of network addresses is part of a list of network ranges).
  • Convolution comparisons - These convolution benchmarks include CPU and GPU versions of the same operations, so it’s relatively easy to compare the differences between them.
  • GPU kernels in STUMPY - all the files prefixed with gpu_ implement very clearly-written and well-documented CUDA kernels that also have a CPU counterpart. They are a little long, but IMO they represent good examples of well-written CUDA kernels.

support gpu compute shaders, not just cuda

A significant amount of effort goes into supporting the CUDA target (I work on it full time and it receives contributions from various other individuals) - supporting other compute shaders would be a very large task on top of this that there isn’t the resourcing to prioritize at present - can you outline use cases / requirements for supporting compute shaders?

resume amd driver support

There is some discussion of the issues around this in this thread.

1 Like