Hello,
this is my tech stack:
Py: 3.10.2
Numpy: 2.0.2 or 1.26.4
Numba: 0.60.0 (latest)
I have a standard Py class with a few class methods that heavily uses numpy. The class does the following:
(1) it receives a set of data points, usually between 1,000-7,000, and a multi-dimensional lookup table (numpy array)
(2) it loops over all points and then runs a minimizer algorithm for each point
(3) an objective function compares the current guessed/estimated input from the minimizer in the lookup table until a tolerance delta threshold is met towards the desired target for the current data point - the max number of lookups per data point is 5,000. The lookup in the lookup table is via trilinear interpolation.
The above can lead to millions of loop cycles and lookups.
I’ve now converted the class over to be numba compatible, starting off with making the class a @jitclass.
After a long adventure (I’ll share details in another post), the code is now finally functional, and it performas faster than standard Py code, but what I found along the way is that certain numpy functionality performs way worse in numba that one would expect - it decreases performance, by a lot. Naturally, I’m looking to optimize my code for best performance.
Here are my questions:
(A) Numpy
What are the best practices when adopting numpy based code for numba, specifically optimizing the code for peak performance?
Which numpy functions should be avoided in numba (hence: rewritten in plain Py) because they don’t perform well?
(B) Loops
What are the best practices for optimization of loops in numba?
(C) Caching
Can I cache the entire @jitclass? if so, every time a new @jitclass instance is created with different class parameters (different set of data points and different lookup table) but data type is obviously always the same, would it create a new cache version or would it re-use the first cache?
Or do I need to cache individual member functions?
(D) Parallelization
What is the best approach to integrate parallelization for this class? Can I run the entire class in parallel, or should I focus on the functions that include loops, or the functions that the loops are calling in each loop cycle?
Thanks!