Are the results of this verification using NamedTuple correct?

# numba                  0.62.1
# llvmlite               0.45.1
# numpy                  2.3.4
# OS                     Windows10
# processor              AMD Ryzen 7 PRO 4750GE with Radeon Graphics   3.10 GHz
# RAM                    64.0 GB

from typing import NamedTuple

import numpy as np
from numba import njit, prange
from numpy.typing import NDArray

np.random.seed(42)

########### CASE 1 ###################
a = np.empty(10**6, dtype=np.float64)
b = np.empty(10**6, dtype=np.float64)
c = np.empty(10**6, dtype=np.float64)
d = np.empty(10**6, dtype=np.float64)
e = np.empty(10**6, dtype=np.float64)

@njit(parallel=True)
def f1(a, b, c, d, e):
    for i in prange(a.shape[0]):
        a[i] = a[i] * 100
        b[i] = b[i] * 100
        c[i] = c[i] * 100
        d[i] = d[i] * 100
        e[i] = e[i] * 100
    return a, b, c, d, e

f1(a, b, c, d, e)
%timeit f1(a, b, c, d, e) # 2.77 ms ± 52 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

########### CASE 2 ###################

A = NamedTuple(
    "A",
    [
        ("a", NDArray),
        ("b", NDArray),
        ("c", NDArray),
        ("d", NDArray),
        ("e", NDArray),
    ]
)

@njit(parallel=True)
def f1(a):
    for i in prange(a.a.shape[0]):
        a.a[i] = a.a[i] * 100
        a.b[i] = a.b[i] * 100
        a.c[i] = a.c[i] * 100
        a.d[i] = a.d[i] * 100
        a.e[i] = a.e[i] * 100

    return a

a = A(
    a = np.empty(10**6, dtype=np.float64),
    b = np.empty(10**6, dtype=np.float64),
    c = np.empty(10**6, dtype=np.float64),
    d = np.empty(10**6, dtype=np.float64),
    e = np.empty(10**6, dtype=np.float64),
)
f1(a)
%timeit f1(a) # 115 μs ± 1.12 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I ran the above code to see how much speed difference there is when using NamedTuple for updating multiple array values.
I expected that using NamedTuple would slow things down, but the results were completely different: NamedTuple was overwhelmingly faster.
Is this verification result correct?

I believe the conclusion, that using NamedTuple is overwhelmingly faster, is incorrect.

I found it very hard to draw a conclusion from the benchmark code as-is, so I made some changes to be able to tease things apart a bit:

  • Initializing the data instead of operating on arrays created with np.empty(), which will contain whatever was in the memory before.
  • Using separate copies of the data for each function and the warmup - this helps to isolate cache behaviour from the benchmark.
  • Using separate copies of data for each function under test - since they update the data in place, running both versions on the same data isn’t measuring the same thing.
  • Validating that both versions of the benchmark produce the same output
  • Making use of parallel execution optional, making it possible to remove that as a factor.
  • Renaming the A class and the a that is a named tuple - reusing the same names for multiple things always makes it more confusing to think about and easier to make a mistake.

This resulted in the following code:

from typing import NamedTuple

import numpy as np
from numba import njit, prange
from numpy.typing import NDArray

PARALLEL = False

# Input data

np.random.seed(42)

a = np.random.random(10**6)
b = np.random.random(10**6)
c = np.random.random(10**6)
d = np.random.random(10**6)
e = np.random.random(10**6)

########### CASE 1 ###################

@njit(parallel=PARALLEL)
def f1(a, b, c, d, e):
    for i in prange(a.shape[0]):
        a[i] = a[i] * 100
        b[i] = b[i] * 100
        c[i] = c[i] * 100
        d[i] = d[i] * 100
        e[i] = e[i] * 100
    return a, b, c, d, e

a_warmup = a.copy()
b_warmup = b.copy()
c_warmup = c.copy()
d_warmup = d.copy()
e_warmup = e.copy()

f1(a_warmup, b_warmup, c_warmup, d_warmup, e_warmup)


a_f1 = a.copy()
b_f1 = b.copy()
c_f1 = c.copy()
d_f1 = d.copy()
e_f1 = e.copy()

%timeit f1(a_f1, b_f1, c_f1, d_f1, e_f1) 

########### CASE 2 ###################

NT = NamedTuple(
    "NT",
    [
        ("a", NDArray),
        ("b", NDArray),
        ("c", NDArray),
        ("d", NDArray),
        ("e", NDArray),
    ]
)

@njit(parallel=PARALLEL)
def f2(nt):
    for i in prange(nt.a.shape[0]):
        nt.a[i] = nt.a[i] * 100
        nt.b[i] = nt.b[i] * 100
        nt.c[i] = nt.c[i] * 100
        nt.d[i] = nt.d[i] * 100
        nt.e[i] = nt.e[i] * 100

    return a

nt_warmup = NT(
    a = a.copy(),
    b = b.copy(),
    c = c.copy(),
    d = d.copy(),
    e = e.copy(),
)

f2(nt_warmup)

nt_f2 = NT(
    a = a.copy(),
    b = b.copy(),
    c = c.copy(),
    d = d.copy(),
    e = e.copy(),
)

%timeit f2(nt_f2)


np.testing.assert_equal(a_f1, nt_f2.a)
np.testing.assert_equal(b_f1, nt_f2.b)
np.testing.assert_equal(c_f1, nt_f2.c)
np.testing.assert_equal(d_f1, nt_f2.d)
np.testing.assert_equal(e_f1, nt_f2.e)

when run as-is, with PARALLEL = False, I get:

$ ipython repro.ipy
1.17 ms ± 3.88 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
1.17 ms ± 2.43 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

So the named tuple makes no difference in performance.

With PARALLEL = True, I get:

$ ipython repro.ipy
580 μs ± 3.79 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
15.3 μs ± 334 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

--> 100 np.testing.assert_equal(a_f1, nt_f2.a)

AssertionError: 
Arrays are not equal

+inf location mismatch:
 ACTUAL: array([inf, inf, inf, ..., inf, inf, inf], shape=(1000000,))
 DESIRED: array([0.37454 , 0.950714, 0.731994, ..., 0.418072, 0.428671, 0.929449],
      shape=(1000000,))

The named tuple version is much faster here, but it’s because enabling parallel computation results in some miscompilation or runtime error that produces gibberish results.

I think there is a bug here - either this pattern is not supported in the parallel target, and it should probably produce an error message on the attempt to compile, or perhaps it is supported but some error is introduced in the transformation.

I appreciate your sophisticated solution.
Upon further testing, I too have noticed unexpected behavior when using @njit(parallel=True).
I have submitted the following issue on GitHub under the name ‘kuri-menu’.
Since it has been officially marked as a bug, I look forward to seeing a fix in an upcoming release.