Does numba support non-constant slicing

I tried to use numba to realize array multiplication with symmetry, e.g., A[i,j] B[j,k,l] = C[i,k,l], and B[j,k,l]=B[j,l,k]. I tried to use loop to control the last two indices of array B and use numba to accelerate it. For the following code

import numpy as np
import time
from numba import njit

# Tensor shapes
i_dim = 20
j_dim = 20
k_dim = 20
l_dim = 20

# Create random tensors
np.random.seed(0)  # Fix the random seed for reproducibility
A = np.random.rand(i_dim, j_dim)

# Create a random tensor B_raw
B_raw = np.random.rand(j_dim, k_dim, l_dim)

# Symmetrize B_raw along the c and d axes
B = np.zeros_like(B_raw)
for k in range(k_dim):
    for l in range(l_dim):
        B[:, k, l] = 0.5 * (B_raw[:, k, l] + B_raw[:, l, k])

# Direct einsum method
start_time = time.time()
C_einsum = np.einsum("ij,jkl->ikl", A, B)
end_time = time.time()
print("Direct einsum method:")
print(f"Time taken: {end_time - start_time:.5f} seconds")

@njit
def symm_einsum(A, B, C):
    i_dim, j_dim = A.shape
    j_dim, k_dim, l_dim = B.shape

    for l in range(l_dim):
        C[:, :l+1, l] = np.einsum("ij,jk->ik", A, B[:, :l+1, l])
        for k in range(l):
            C[:, l, k] = C[:, k, l]

    return C

# Custom loop with symmetry constraint and Numba
C_symm = np.zeros((i_dim, k_dim, l_dim))

start_time = time.time()
C_symm = symm_einsum(A, B, C_symm)
end_time = time.time()
print("\nCustom loop with symmetry constraint and Numba:")
print(f"Time taken: {end_time - start_time:.5f} seconds")

# Check if the results are equal (within a tolerance)
print("\nAre the results equal?", np.allclose(C_einsum, C_symm, rtol=1e-05, atol=1e-08))

# Compare timings
einsum_time = end_time - start_time
symm_time = end_time - start_time
speedup_factor = einsum_time / symm_time
print(f"\nSpeed-up factor: {speedup_factor:.2f}")

I got ```Use of unsupported NumPy function ‘numpy.einsum’ or unsupported use of the function. (comment out @njit will work)

File “slicing.py”, line 37:
def symm_einsum(A, B, C):

for l in range(l_dim):
C[:, :l+1, l] = np.einsum(“ij,jk->ik”, A, B[:, :l+1, l])
^````

chatgpt told me numba has limitation in slicing/array support, is there any solution in numba? if not, could it be possible to add this feature?

The slicing doesn’t seem to be the issue here. The error you’ve got is pretty clear that the use of np.einsum is not supported. An overview of what is supported can be found at:
https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html#

I’m no expert on the einsum function but I think it’s especially powerful to vectorize some matrix operations when using Python/Numpy, avoiding explicit loops and therefore gaining performance.

Fortunately in Numba those explicit loops are no problem (and sometimes actually improve performance), so you should be able to explicitly write out the computation you want to perform.

The concise notation of the einsum function might still be nice-to-have of course.

As said, I’m no expert at all, but I think a naive translation would be something like instead of using:

C[:, :l+1, l] = np.einsum("ij,jk->ik", A, B[:, :l+1, l])

use:

for i in range(i_dim):
    for j in range(j_dim):
        C[i, :l+1, l] += A[i,j] * B[j, :l+1, l]

That works but doesn’t give great performance. Rearranging the order of loops can help a lot, if it improves how the memory is accessed, which also depends on how the array is structured in terms of memory layout.

This function gives me slightly better performance than using Python + np.einsum for the same calculation (excluding compilation time):

@njit
def symm_einsum(A, B, C):
    i_dim, j_dim = A.shape
    j_dim, k_dim, l_dim = B.shape

    for i in range(i_dim):
        for j in range(j_dim):
            for l in range(l_dim):
                for k in range(l+1):
                    C[i, k, l] += A[i,j] * B[j, k, l]
        
                for k in range(l):
                    C[i, l, k] = C[i, k, l]

    return C

I’m not at all sure if the above is optimal in terms of the access pattern. And of course check the results to make sure the output is still what you expect.

edit;
It’s probably not a good idea to use time.time() for a performance benchmark, I would at least use time.perf_counter(). But the timeit module (or %timeit magic in a notebook) is probably even more convenient.

And given how your timing related code is structured, the speedup_factor will always be 1x because you use the exact same timings in the comparison:

einsum_time = end_time - start_time
symm_time = end_time - start_time
1 Like

Thanks for your suggestions. I heard einsum wraps a couple of BLAS, that optimized for matrix/vector multiplications. Nested loops are unlikely out perform BLAS. If numba does not support einsum, perhaps I need to look for julia or C/Fortran wrapper to Python.

If I use the nested loop symm_einsum and time.perf_counter I got

Direct einsum method:
Time taken: 0.00407 seconds

Custom loop with symmetry constraint and Numba:
Time taken: 2.06986 seconds

Custom loop with symmetry constraint and Numba:
Time taken: 0.02790 seconds