Hi @fishbotics
Matrix-vector multiplication can always be done the naive way like so:
@nb.njit(fastmath=True, parallel=False)
def nb_multiply(a, b):
n, m, l = a.shape
out = np.empty((n, m))
for i in nb.prange(n):
for j in range(m):
val = 0
for k in range(l):
val += a[i, j, k] * b[k]
out[i, j] = val
return out
It will even almost always (maybe even always) perform best because you have full control over parallelization and no overhead.
Here also something related that you might find interesting: Help needed to re-implement np.matmul for 4D and 5D matrix - #2 by sschaer