Njit with np.argwhere, np.logical_and, and np.sum Numba v0.50.0

I tried to use @njit speed up a function I call quite a bit. Originally the inputs were pandas arrays, but I rewrote everything in numpy so that it might work with numba. However, adding the decorator gives the following traceback. It could be that these are unsupported features. I upgraded to 0.50.0 and cross checked with https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html. I see that np.sum is not listed so am thinking that might be the issue. Perhaps a minor tweak could make the njit decorator work?

#Input: numpy arrays of same length
import numpy as np
from numba import njit
def get_result(args, a, b, c, d):
    args_one = np.argwhere( np.logical_and( a[args] == 1 , b[args] < d ) ) #.flatten() #this fixes it
    args_two = np.argwhere( np.logical_and( a[args] == 2 , b[args] > d ) ) #.flatten() #this fixes it
    one_val = np.sum( ( d - b[args_one] ) * c[args_one]  )
    two_val = np.sum( ( b[args_two] - d ) * c[args_two]  )
    return one_val + two_val

def test():
    a = np.random.randint(1,3, 100)
    b = np.random.uniform(50,100, 100)
    c = np.random.randint(1,2000, 100)
    d = 75
    args = np.asarray(np.arange(10,20))
    res = get_result(args, a, b, c, d)

if __name__ == '__main__':

# Traceback (most recent call last):
#   File "test_numba.py", line 24, in <module>
#     test()
#   File "test_numba.py", line 20, in test
#     res = get_result(args, a, b, c, d)
#   File "/home/user/.local/lib/python3.6/site-packages/numba/core/dispatcher.py", line 415, in _compile_for_args
#     error_rewrite(e, 'typing')
#   File "/home/user/.local/lib/python3.6/site-packages/numba/core/dispatcher.py", line 358, in error_rewrite
#     reraise(type(e), e, None)
#   File "/home/user/.local/lib/python3.6/site-packages/numba/core/utils.py", line 80, in reraise
#     raise value.with_traceback(tb)
# numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
# No implementation of function Function(<built-in function getitem>) found for signature:
#  >>> getitem(array(float64, 1d, C), array(int64, 2d, A))
# There are 16 candidate implementations:
#      - Of which 14 did not match due to:
#      Overload in function 'getitem': File: <built-in>: Line <N/A>.
#        With argument(s): '(array(float64, 1d, C), array(int64, 2d, A))':
#       No match.
#      - Of which 2 did not match due to:
#      Overload in function 'getitem': File: <built-in>: Line <N/A>.
#        With argument(s): '(array(float64, 1d, C), array(int64, 2d, A))':
#       Rejected as the implementation raised a specific error:
#         TypeError: unsupported array index type array(int64, 2d, A) in [array(int64, 2d, A)]
#   raised from /home/user/.local/lib/python3.6/site-packages/numba/core/typing/arraydecl.py:69
# During: typing of intrinsic-call at test_numba.py (9)
# File "test_numba.py", line 9:
# def get_result(args, a, b, c, d):
#     <source elided>
#     args_two = np.argwhere( np.logical_and( a[args] == 2 , b[args] > d ) )
#     one_val = np.sum( ( d - b[args_one] ) * c[args_one]  )
#     ^

hi there,

the key part of the error message is getitem(array(float64, 1d, C), array(int64, 2d, A)). the problem is the shape of args_one (and also args_two). Both are 2d, but they are slicing a 1d array. I don’t know how numpy interprets that, but I can see why the compiler will not like it.
Printing args_one.shape I saw that it’s (4,1) so adding the line args_one = args_one[:,0] should solve it.


1 Like

Thanks a bunch! I added .flatten() to the end of the first two lines in get_result and that fixed it. The script is now ~10x faster because of this fix.

glad to know that it worked well for you

1 Like

You could consider writing the looping a little more explicit yourself. For me that’s even a bit faster, about 3x. Perhaps Numba saves a some time by avoiding the creation of intermediate arrays.

Note that in your example, with seed=0, the “two_val” part never gets done becuase the criteria aren’t met for any of the elements.

def get_result(args, a, b, c, d):
    one_val = 0.0
    two_val = 0.0
    for i, arg in np.ndenumerate(args):
        if a[arg] == 1 and b[arg] < d:
            one_val += (d-b[i]) * c[i]
        if a[arg] == 2 and b[arg] > d:
            two_val += (b[i]-d) * c[i]
    return one_val + two_val

If you call it a lot, the above function also makes it a little easier to move to a “guvectorize” version if you are able to “batch” the a,b,c in a way that allows you to run it in parallel over a dimension.


I appreciate your help @RutgerK !

Yup, your approach is definitely faster and more elegant. Thanks for the insight, I’ve never used ndenumerate before. It’s a really nice solution thanks for the followup! It’s definitely batch-able, I’ll have to look into guvectorize if I decide to optimize further. Right now, I’m plenty happy with this ~40x speedup over something that was already 20x faster (by going from pandas to numpy).

Since numpy doesn’t work on strings I did a neat thing you might be amused by to convert the strings into numbers first. The args come from another list in a loop outside of this function where the strings are equal.

def STR_to_int(STR):
    return int( "".join(  [ str(ord(i)) for i in STR ]  ) )

numpy does not work well with python strings, but it has its own (more limited) strings that can be useful too.

1 Like

@randompast Bodo uses Numba to provide acceleration for Pandas code so you wouldn’t need to rewrite your code. It supports strings as well. Also, you can scale on multiple cores for additional speed up. Could you post your original Pandas code without all these rewrites so I can suggest how it would look like in Bodo?