Why this assignment of subarray is so slow?

pauljurczak · July 23, 2022, 5:27am

Functions f0 and f1 produce the same result, but differ in method of assignment of subarray:

import numpy as np
import numba as nb
import timeit as ti

@nb.njit(fastmath=True)
def f0(a, i, j):
  a[i, j] = [4, 2]


@nb.njit(fastmath=True)
def f1(a, i, j):
  a[i, j, 0] = 4
  a[i, j, 1] = 2


@nb.njit(fastmath=True)
def g0(a):
  for i in range(a.shape[0]):
    for j in range(a.shape[1]):
      f0(a, i, j)


@nb.njit(fastmath=True)
def g1(a):
  for i in range(a.shape[0]):
    for j in range(a.shape[1]):
      f1(a, i, j)


for i in range(2):
  a = np.zeros((100, 100, 2))
  fun = f'g{i}(a)'
  t = 1000 * np.array(ti.repeat(stmt=fun, setup=fun, globals=globals(), number=1, repeat=100))
  print(f'{fun}:  {np.amin(t):6.3f}ms  {np.median(t):6.3f}ms')

The performance difference is huge:

g0(a):   0.718ms   0.719ms
g1(a):   0.005ms   0.005ms

Any ideas why? Is there a more performant replacement of a[i, j] = [4, 2] other than the tedious method of f1?

I tried a[i, j, :] = [4, 2] and it is almost as slow.

nelson2005 · July 23, 2022, 9:56pm

I’d hazard a guess that there’s some memory allocation going on with f0, and f1 is a direct memory access through a pointer dereference. But I’m just guessing

If you’re an assembly or LLVM expert (I’m not) you could dig a little deeper with inspect_asm or inspect_llvm after the timing loop, like

print(f1.inspect_asm(f1.signatures[0]))

pauljurczak · July 24, 2022, 6:39am

I don’t know enough to make sense of it. Surprisingly, a[i, j] = (4, 2) is as fast or only 2x slower (depending on the context) than the a[i, j] = [4, 2]. That should work for me.

Hannes · July 25, 2022, 8:48am

Hi @pauljurczak

my best guess here is that the intermediate initialisation of a list in f0 is a performance trap.
Lists are inherently more complex than tuples, since mutation is expected by design.

In a tuple, my best guess is that the values or pointers to them are stored as direct neighbours in memory. Here you are even using constant literal entries in the tuple, which may allow for even more compilation time optimisations. For a list the same kind of promises cannot be made and their implementation is likely generally slower than an immutable data structure. (For f1 assigning scalar literal values directly, the same (maybe even more optimal) is true)

Cheers

Topic		Replies	Views
Slow Array Element Assignment to Variable Community Support	3	386	July 23, 2020
Curious performance loss of NumPy function with Numba Numba	11	520	December 26, 2023
Timings for arr[:, i] seem much slower in numba Community Support	5	176	March 14, 2024
Faster for loop - help pls Community Support	7	519	October 17, 2020
Why this function call is faster than inlined version? Numba	3	375	May 15, 2023

Why this assignment of subarray is so slow?

Related Topics