Let me first say that I am a super-fan of Numba and I think you guys are super-heroes, so I hope you will take this as constructive feedback.
I need the nan
(not-a-number) version of np.argmax
but it is currently not implemented in Numba. I figured it would be quite easy, so I took a look in your code-base to see if I could make that. The relevant Python module is here. But it is nearly completely undocumented, so it takes great effort for outsiders to try and understand.
Furthermore, some of the code is very strange (buggy?), like this for-loop which exists in all of these argmax/argmin
functions:
for v in arry.flat:
min_value = v
min_idx = 0
break
if np.isnan(min_value):
return min_idx
This looks bizarre to me. And as there is no comment on what you intend to do here, I am really confused when I see strange code like this. Because it looks like you are basically just doing the following:
if np.isnan(arry.flat[0]):
return 0
I would now like to argue for more extensive documentation of the code-base itself, not just the user-facing API.
I actually program in two languages that are interleaved: I write clear English comments on what is going on for nearly all the Python-lines. This may seem excessive, but it means that practically anyone can read my code and understand what is going on in a few minutes. I can read code that I wrote 5-10 years ago and understand it very quickly, even though I’ve completely forgotten what it does. And if there is bizarre / buggy code, then my intention is explained clearly in English so it is more obvious what is wrong.
Writing good comments is actually a rare and very under-appreciated skill, just like writing good docs or good text-books. Not everyone can do it. But everyone can make a serious effort to explain what they are doing to the next person who will look at the code. It makes it much easier for others to understand what is going on in the code, and in the long-run it is well worth the invested time. And writing code that is beautiful and easy to understand for others, often means that problems are polished away, because ugly code is hard to explain.
Let me give you a few examples of my code.
The first example is from TensorFlow where I added a small function several years ago. Compare it to the other functions in that file, whose code-lines are only sparsely documented, if at all. Which do you find easier to understand?
The second example is more recent, where I use Numba to speed-up a new algorithm I made. See e.g. this function and this function where I go into great detail explaining what happens in the algorithms, and even “trivial” code is explained in the comments as well. The idea is that you can read the English comments alone without reading any of the Python code, and you will understand exactly what is going on in the code.
I find it much easier to read the code when nearly everything is commented really well in English, rather than having to switch between reading a few short broken English comments and Python code. That is why I say that I program in two interleaved languages: English and Python.
I probably cannot convince you to go into “full Shakespeare mode” like I do with my comments. But please consider whether your code is easy or hard to understand for the next person who will have to read and maintain it.
A few of the functions in arraymath.py
actually do have extensive code-comments, but some of them seem to have been written in a hurry and are quite confusing like these that seem to belong to different code-lines.
As a bare minimum I would like to suggest that every source-file has a header that explains what it contains and how it fits in with the rest of the project. And each function has a doc-string that explains what the function does. Ideally the crucial / difficult code-lines would also be explained.
In my opinion, good code comments are almost as important as good code itself.
Once again, I meant for this to be constructive feedback from a Numba super-fan.