TypingError when using np.sum for 4D array

Hi~ I want to sum a 4D array along the last two axis but a TypingError happens during compiling. This is my code and I wonder how can I modify the code to avoid the error.

import numpy as np
from numba import njit, prange

@njit(['float64[:,:](float64[:,:,:,:],int64[:],int64[:])'],parallel=True)
def sum4d(A,ind_row,ind_col):
    ny = A.shape[0]
    nx = A.shape[1]
    im = np.empty((ny,nx))
    for i in prange(ny):
        for j in range(nx):
            im[i,j] = np.sum(A[i,j,ind_row,ind_col])
    return im

The error message is

TypingError: Invalid use of Function(<built-in function getitem>) with argument(s) of type(s): (array(float64, 4d, A), Tuple(int64, int64, array(int64, 1d, A), array(int64, 1d, A)))

A is the 4D data. ind_row and ind_col are 1D index arrays containing the row and column indices to be summed, respectively. The function I want to achieve is just like the following pure numpy code
im = np.sum(A[:,:,ind_row,ind_col],axis=2)

hi @shz.

The problem is related to the use of advanced indexing. Only one index is supported in Numba (https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html#array-access).

I think I managed to find a workaround for your case (update: the code below only works if you can express your problem as array slicing, without advanced indexing)

import numpy as np
from numba import njit, prange

@njit(['float64[:,:](float64[:,:,:,:],int64[:],int64[:])'], parallel=True)
def sum4d(A,ind_row,ind_col):
    ny = A.shape[0]
    nx = A.shape[1]
    im = np.empty((ny,nx))
    for i in prange(ny):
        for j in range(nx):
            tmp = A[i,:,:,:][j,:,:][ind_row,:][:, ind_col]
            im[i,j] = np.sum(tmp)
    return im

Does this do what you want?

btw, if you upgrade to Numba 0.50 you’ll get better error messages.

update: if you must work with advanced indexing, Numba is limited to only one index (including the integer ones). Since what you need is the sum of those elements, and based on @randompast solution below, I think this might work for you

@njit(['float64[:,:](float64[:,:,:,:],int64[:],int64[:])'], parallel=True)
def sum4d(A,ind_row,ind_col):
    ny = A.shape[0]
    nx = A.shape[1]
    im = np.zeros((ny,nx))
    for i in prange(ny):
        for j in range(nx):
            for k, l in zip(ind_row, ind_col):
                im[i,j] += A[i, j, k, l]
    return im

How about using four for loops? I’m not sure if this is really your calculation, but often doing this sort of thing was a lot faster for me, especially when I used @cuda.jit.

@njit
def sum4d(A):
    #flipped nx and ny
    nx = A.shape[0]
    ny = A.shape[1]
    im = np.zeros((nx,ny))
    for i in range(nx):
        for j in range(ny):
            for k in range(A.shape[2]):
                for l in range(A.shape[3]):
                    im[i,j] += A[i,j,k,l]
    return im

Thanks a lot for your kind reply. This is exactly what I want. What’s more, the second solution is about 10 times faster than the first one.

Thanks a lot. The solution really helps.

Hey, I’m glad it helped, I found that the naive python solution without any numpy was often a lot better and was also easier to use @cuda.jit with for my use case. I was doing convolution which is very similar to this summation operation. I made a repo here comparing various implementations, maybe it’d be useful to you too: https://github.com/randompast/python-convolution-comparisons

OK, thank you for sharing it to me.