Hi,
I am trying to use numba
to parallelize an operation on 2d array, where each row can be handled separately. However, it seems prange
causes wrong result. I took a look at some related topic on Numba’s github page and here, but I couldn’t find my answer (or, maybe there exist some related questions but I didn’t understand them!!!)
Below, I am providing a simple test so that it can be reproduced.
# python= 3.8.5, # numpy= 1.20.3, # numba 0.54.1
def naive_func(PA, PB, IA, IB):
"""
Merge two 2-dim arrays PA and PB, each with shape (n, m).
The values in each row is sorted in ascending ordrer.
This function update PA[i] by finding the top-m smallest values
in both PA[i] and PB[i], while prioritizing the values of PA[i].
Example:
For a row in PA and its corresponding row in PB:
Merging [0, 1, 2, 3, 4] and [0', 3', 5, 7, 10] is: [0, 0', 1, 2, 3]
(Note: the sign prime is used to just distinguish between the two zeros and two threes in two arrays
PA and PB)
Then we update IA based on the rearrangement that occured on
the merging process of PA and PB.
Parameters
----------
PA : numpy.ndarray
a 2-dim array, with shape (n, m), each row sorted in ascending order
PB : numpy.ndarray
a 2-dim array, with shape (n, m), each row sorted in ascending order
IA : numpy.ndarray
a 2-dim array, with shape (n, m)
IB : numpy.ndarray
a 2-dim array, with shape (n, m)
Returns
-------
None
"""
n, m = PA.shape
profile = np.column_stack((PA, PB))
indices = np.column_stack((IA, IB))
idx = np.argsort(profile, axis=1)
profile = np.take_along_axis(profile, idx, axis=1)
indices = np.take_along_axis(indices, idx, axis=1)
PA[:, :] = profile[:, : m]
IA[:, :] = profile[:, : m]
################
@njit(parallel=True)
def performant_func(PA, PB, IA, IB):
"""
n, m = PA.shape
for i in prange(n): # Updating rows in parallel mode
for j in range(m): # for each row, I think this inner loop is executed serially.
if PB[i, j] < PA[i, -1]:
idx = np.searchsorted(PA[i], PB[i, j], side="right")
PA[i, idx + 1 :] = PA[i, idx:-1]
PA[i, idx] = PB[i, j]
IA[i, idx + 1 :] = IA[i, idx:-1]
IA[i, idx] = IB[i, j]
################
def test_func():
PA = np.array(
[[0, 3, 3, 3, 4],
[0, 1, 2, 3, 4]]
)
IA = np.array(
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]
)
PB = np.array(
[[0, 1, 3, 3, 4],
[0, 0, 1, 4, 4]]
)
IB = np.array(
[[250, 251, 252, 253, 254],
[255, 256, 257, 258, 259]]
)
ref_P = PA.copy()
ref_I = IA.copy()
comp_P = PA.copy()
comp_I = IA.copy()
naive_func(ref_P, PB, ref_I, IB)
performant_func(comp_P, PB, comp_I, IB)
ref = np.column_stack((ref_P, ref_I))
comp = np.column_stack((comp_P, comp_I))
npt.assert_array_equal(ref, comp)
test_func()
>>>
AssertionError:
Arrays are not equal
Mismatched elements: 9 / 20 (45%)
Max absolute difference: 256
Max relative difference: 2.
x: array([[0, 0, 1, 3, 3, 0, 0, 1, 3, 3],
[0, 0, 0, 1, 1, 0, 0, 0, 1, 1]])
y: array([[ 0, 0, 1, 3, 3, 0, 250, 251, 1, 2],
[ 0, 0, 0, 1, 1, 5, 255, 256, 6, 7]])
One solution is to add .copy()
operation when updating slice of PA and IA, as follows:
PA[i, idx + 1 :] = PA[i, idx:-1].copy()
###
IA[i, idx + 1 :] = IA[i, idx:-1].copy()
###
However, theoretically speaking, there should be no problem with `prange` even without use of `.copy()`.
Please let me know if further information is required.