Matrix multiplication with pre-allocated output array

I want to perform matrix multiplication with a pre-allocated output array (i.e. use the out parameter to numpy.matmul). However, it doesn’t seem possible when the output array is in Fortran-order.

Numpy has two ways to perform a matrix multiply with an out argument: numpy.dot and numpy.matmul. Using matmul works with any combination of memory orders, while dot only works for a C-order output array.

Numba duplicates the functionality of numpy.dot, however it doesn’t support numpy.matmul, so there is no way to perform a multiplication with a Fortran-order output.

Here’s a gist with a minimal example. If the if statement at the end of the loop is removed, the numba_dot call will fail for any case where order_out is F.

Is there any workaround that I’m missing?

Edit: I’m not able to include the link to the gist. Just add https:// to this.

gist.github.com/joshayers/3db3315684442aa9c22fe7959cafeee3

Add example code directly instead of gist link.

import itertools
import numba
import numpy as np
   
    
@numba.jit
def numba_dot(arr1, arr2, out):
    np.dot(arr1, arr2, out=out)


def main():
    orders = itertools.product('CF', 'CF', 'CF')

    for order1, order2, order_out in orders:
        arr1 = np.ones((3, 5), 'f8', order=order1)
        arr2 = np.ones((5, 3), 'f8', order=order2)
        out = np.zeros((3, 3), 'f8', order=order_out)

        # np.matmul works for any memory order
        np.matmul(arr1, arr2, out=out)
        
        # np.dot only works when order_out is 'C'
        try:
            np.dot(arr1, arr2, out=out)
        except ValueError:
            print((order1, order2, order_out))
        
        # numba_dot also only works when order_out 'C'
        if order_out == 'F':
            continue
        numba_dot(arr1, arr2, out)


if __name__ == '__main__':
    main()