Batched Dot Products

Hi @fishbotics

Matrix-vector multiplication can always be done the naive way like so:

@nb.njit(fastmath=True, parallel=False)
def nb_multiply(a, b):
    n, m, l = a.shape
    out  = np.empty((n, m))
    for i in nb.prange(n):
        for j in range(m):
            val = 0
            for k in range(l):
                val += a[i, j, k] * b[k]
            out[i, j] = val 
    return out 

It will even almost always (maybe even always) perform best because you have full control over parallelization and no overhead.

Here also something related that you might find interesting: Help needed to re-implement np.matmul for 4D and 5D matrix - #2 by sschaer

1 Like