I have noticed that there is a significant performance difference when indexing a numpy array with np.uint64 vs npint64. Take the following simple contrived example:

import time
import numpy as np
from numba import njit
@njit
def some_func(a, idx):
for _ in range(100_000_000_000):
a[idx] = 10.0
a = np.random.rand(5)
a_copy = a.copy()
idx = np.array([5], dtype=np.int64) # int64
uidx = np.array([5], dtype=np.uint64) # uint64
lst = []
for _ in range(6):
a_copy[:] = a
tic = time.time()
some_func(a_copy, idx)
toc = time.time()
lst.append(toc - tic)
print(np.mean(lst[1:]))
lst = []
for _ in range(6):
a_copy[:] = a
tic = time.time()
some_func(a_copy, uidx)
toc = time.time()
lst.append(toc - tic)
print(np.mean(lst[1:]))

In the np.int64 case, it takes an average time of 43s while, in the np.uint64 case, it takes an average time of 35s. So, the np.uint64 case was around 19% faster!

This was extremely surprising as I wouldâ€™ve expected a negligible difference using np.int64 to index into the numpy array.

My question is is this normal/expected/known behavior??

This is known and expected behavior. Think about it, if an index could be negative you have to check if it was negative and if so do the index wraparound calculation. This adds time but even if all the indices are positive then the presence of the checking code also prohibits vectorization. This is why parallel=True uses unsigned indices where they are known to be positive and people occasionally have problems when they add a signed number to an index and try to use that to index an array which wonâ€™t work because signed and unsigned unify as float.

@DrTodd13 Thank you so much for enlightening me! I learned something new. Would you happen to know what the type would be in for i in range(10)? Is i a uint64 or an int64? I plan to add i to another numpy array of dtype=np.uint64 and then use the resulting sum to index into another numpy array:

idx = np.array([1, 4, 9, 16, 25, 36, 49, 64, 81, 100], dtype=np.uint64)
a = np.random.rand(200)
for i in range(10):
j = idx[i] + i
a[j] = 5

Is there some way to ensure that i is also np.uint64?

thanks @seanlaw for bringing it up and @DrTodd13 for the explanation! I have a lot of code that uses indexing, I now wonder if I can squeeze some more performance using numba.uint64(i)

I am also wondering whether the type of range with 1 argument shouldnâ€™t be an unint64, since it will always be positive. The 3 arguments version can be negative, but I donâ€™t think the 1 argument version can be.

Thank you @DrTodd13. If I have many integers, I wonder if casting with numba.uint64(i) would create the same/similar cost as needing to check for negative integers before using the value as an array index?