Are the results of this verification using NamedTuple correct?

greenball · November 18, 2025, 1:03am

# numba                  0.62.1
# llvmlite               0.45.1
# numpy                  2.3.4
# OS                     Windows10
# processor              AMD Ryzen 7 PRO 4750GE with Radeon Graphics   3.10 GHz
# RAM                    64.0 GB

from typing import NamedTuple

import numpy as np
from numba import njit, prange
from numpy.typing import NDArray

np.random.seed(42)

########### CASE 1 ###################
a = np.empty(10**6, dtype=np.float64)
b = np.empty(10**6, dtype=np.float64)
c = np.empty(10**6, dtype=np.float64)
d = np.empty(10**6, dtype=np.float64)
e = np.empty(10**6, dtype=np.float64)

@njit(parallel=True)
def f1(a, b, c, d, e):
    for i in prange(a.shape[0]):
        a[i] = a[i] * 100
        b[i] = b[i] * 100
        c[i] = c[i] * 100
        d[i] = d[i] * 100
        e[i] = e[i] * 100
    return a, b, c, d, e

f1(a, b, c, d, e)
%timeit f1(a, b, c, d, e) # 2.77 ms ± 52 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

########### CASE 2 ###################

A = NamedTuple(
    "A",
    [
        ("a", NDArray),
        ("b", NDArray),
        ("c", NDArray),
        ("d", NDArray),
        ("e", NDArray),
    ]
)

@njit(parallel=True)
def f1(a):
    for i in prange(a.a.shape[0]):
        a.a[i] = a.a[i] * 100
        a.b[i] = a.b[i] * 100
        a.c[i] = a.c[i] * 100
        a.d[i] = a.d[i] * 100
        a.e[i] = a.e[i] * 100

    return a

a = A(
    a = np.empty(10**6, dtype=np.float64),
    b = np.empty(10**6, dtype=np.float64),
    c = np.empty(10**6, dtype=np.float64),
    d = np.empty(10**6, dtype=np.float64),
    e = np.empty(10**6, dtype=np.float64),
)
f1(a)
%timeit f1(a) # 115 μs ± 1.12 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I ran the above code to see how much speed difference there is when using NamedTuple for updating multiple array values.
I expected that using NamedTuple would slow things down, but the results were completely different: NamedTuple was overwhelmingly faster.
Is this verification result correct?

gmarkall · November 25, 2025, 11:26am

I believe the conclusion, that using NamedTuple is overwhelmingly faster, is incorrect.

I found it very hard to draw a conclusion from the benchmark code as-is, so I made some changes to be able to tease things apart a bit:

Initializing the data instead of operating on arrays created with np.empty(), which will contain whatever was in the memory before.
Using separate copies of the data for each function and the warmup - this helps to isolate cache behaviour from the benchmark.
Using separate copies of data for each function under test - since they update the data in place, running both versions on the same data isn’t measuring the same thing.
Validating that both versions of the benchmark produce the same output
Making use of parallel execution optional, making it possible to remove that as a factor.
Renaming the A class and the a that is a named tuple - reusing the same names for multiple things always makes it more confusing to think about and easier to make a mistake.

This resulted in the following code:

from typing import NamedTuple

import numpy as np
from numba import njit, prange
from numpy.typing import NDArray

PARALLEL = False

# Input data

np.random.seed(42)

a = np.random.random(10**6)
b = np.random.random(10**6)
c = np.random.random(10**6)
d = np.random.random(10**6)
e = np.random.random(10**6)

########### CASE 1 ###################

@njit(parallel=PARALLEL)
def f1(a, b, c, d, e):
    for i in prange(a.shape[0]):
        a[i] = a[i] * 100
        b[i] = b[i] * 100
        c[i] = c[i] * 100
        d[i] = d[i] * 100
        e[i] = e[i] * 100
    return a, b, c, d, e

a_warmup = a.copy()
b_warmup = b.copy()
c_warmup = c.copy()
d_warmup = d.copy()
e_warmup = e.copy()

f1(a_warmup, b_warmup, c_warmup, d_warmup, e_warmup)


a_f1 = a.copy()
b_f1 = b.copy()
c_f1 = c.copy()
d_f1 = d.copy()
e_f1 = e.copy()

%timeit f1(a_f1, b_f1, c_f1, d_f1, e_f1) 

########### CASE 2 ###################

NT = NamedTuple(
    "NT",
    [
        ("a", NDArray),
        ("b", NDArray),
        ("c", NDArray),
        ("d", NDArray),
        ("e", NDArray),
    ]
)

@njit(parallel=PARALLEL)
def f2(nt):
    for i in prange(nt.a.shape[0]):
        nt.a[i] = nt.a[i] * 100
        nt.b[i] = nt.b[i] * 100
        nt.c[i] = nt.c[i] * 100
        nt.d[i] = nt.d[i] * 100
        nt.e[i] = nt.e[i] * 100

    return a

nt_warmup = NT(
    a = a.copy(),
    b = b.copy(),
    c = c.copy(),
    d = d.copy(),
    e = e.copy(),
)

f2(nt_warmup)

nt_f2 = NT(
    a = a.copy(),
    b = b.copy(),
    c = c.copy(),
    d = d.copy(),
    e = e.copy(),
)

%timeit f2(nt_f2)


np.testing.assert_equal(a_f1, nt_f2.a)
np.testing.assert_equal(b_f1, nt_f2.b)
np.testing.assert_equal(c_f1, nt_f2.c)
np.testing.assert_equal(d_f1, nt_f2.d)
np.testing.assert_equal(e_f1, nt_f2.e)

when run as-is, with PARALLEL = False, I get:

$ ipython repro.ipy
1.17 ms ± 3.88 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
1.17 ms ± 2.43 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

So the named tuple makes no difference in performance.

With PARALLEL = True, I get:

$ ipython repro.ipy
580 μs ± 3.79 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
15.3 μs ± 334 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

--> 100 np.testing.assert_equal(a_f1, nt_f2.a)

AssertionError: 
Arrays are not equal

+inf location mismatch:
 ACTUAL: array([inf, inf, inf, ..., inf, inf, inf], shape=(1000000,))
 DESIRED: array([0.37454 , 0.950714, 0.731994, ..., 0.418072, 0.428671, 0.929449],
      shape=(1000000,))

The named tuple version is much faster here, but it’s because enabling parallel computation results in some miscompilation or runtime error that produces gibberish results.

I think there is a bug here - either this pattern is not supported in the parallel target, and it should probably produce an error message on the attempt to compile, or perhaps it is supported but some error is introduced in the transformation.

greenball · November 26, 2025, 12:35am

I appreciate your sophisticated solution.
Upon further testing, I too have noticed unexpected behavior when using @njit(parallel=True).
I have submitted the following issue on GitHub under the name ‘kuri-menu’.
Since it has been officially marked as a bug, I look forward to seeing a fix in an upcoming release.

github.com/numba/numba

Cannot assign to NamedTuple array when @njit(parallel=True)

opened 10:33AM - 19 Nov 25 UTC

kuri-menu

ParallelAccelerator bug - incorrect behavior

```py # numba 0.62.1 # llvmlite 0.45.1 # numpy … 2.3.4 # OS Windows10 from typing import NamedTuple import numpy as np from numba import njit, prange from numpy.typing import NDArray A = NamedTuple( "A", [ ("age", NDArray[np.int64]), ] ) @njit(parallel=True) def fun(x, y): for i in prange(x.age.shape[0]): y.age[i] = x.age[i] return x, y a = A(age=np.array([1, 2, 3])) b = A(age=np.array([0, 0, 0])) x, y = fun(a, b) print(x) # A(age=array([1, 2, 3])) print(y) # A(age=array([0, 0, 0])) # The correct answer is A(age=array([1, 2, 3])) ``` When @njit(parallel=False) is used, it is substituted correctly.

Topic		Replies	Views
Passing namedTuple to a jitted function is slow Community Support	1	879	April 25, 2022
Heterogeneous data container with mutable elements in jitted code Support: How do I do ...?	12	2037	April 27, 2024
Namedtuple seems to cause a TypeError after a while Support: What is this error message?	11	228	October 1, 2024
Changes in the behavior of numba functions? Support: How do I do ...?	19	732	October 28, 2023
Return type "types.UniTuple(float64, 3)" vs float64[:] Support: How do I do ...?	6	723	November 30, 2023

Are the results of this verification using NamedTuple correct?

Related topics