It seems like there have been quite a few issues with the CUDA matrix multiplication example. See python - How to generalize fast matrix multiplication on GPU using numba - Stack Overflow among several GitHub issues over the last year or so. It’s great to see examples (this was one of the first places I read in more depth when I started looking into Numba/CUDA), but I wonder if part of this has to do with the code being embedded in a .rst document rather than say a .py file. I’ve been thinking about it and wondering if there’s a way to increase the # and quality of examples without running into the same as this example over the last few years. I’m wondering if Jupyter notebooks might be a good way to go? nbsphinx is a Sphinx extension that might be able to be integrated into the docs workflow. I haven’t used it (nor have I really used Sphinx before - I’ve been learning .rst so I can make some PRs).
There are great teaching examples out there (I tried to compile a list of several in my SO answer), which usually start with buggy or unoptimized code and get fixed/optimized in ways that highlight common mistakes and how to fix them. I think these could be converted to Jupyter notebooks in the examples section with pretty minimal effort.
Perhaps with a boilerplate comment at the end of each notebook (at least early on) with a link to GitHub’s “Create New Issue” if they find a mistake in the code and a link to some basic instructions (specific to Numba) for how to write and submit a pull request.
I think the benefit would be two-fold - users get working, optimized Numba code for their needs faster and with less frustration, and developers won’t need to spend as much time working through the basics with many who are new to Numba (which I have seen a lot and I think is awesome that the developers have been so responsive). Both of these benefits seem fairly in line with the 2020 Roadmap Discussion and Planning. Developers, what are your thoughts?
Btw, I’ve been using Numba almost non-stop since I discovered it (about a week ago). Fantastic package. I was able to @njit a SciPy function with minimal code modification/effort and see tremendous speed improvement. That got me hooked, but I’ve had a lot more trouble with converting this to a CUDA implementation and would like to help where I can.