I am trying to decouple filtering and cumulative calculation that I have to perform over large arrays.
As there are different filtering strategies and also a variety of calculations that can be made, decoupling is important to simplify testing and allow composability.
I don’t want to generat an intermediate filtered array as they can be very large. I would like to filter and calculate on the fly. But I was not able to do this in a way that performs well.
This is a very simplified example:
@nb.njit
def myacum(elements1, elements2, el1_max, el2_max):
out = 0
for el1, el2 in zip(elements1, elements2):
if el1 < el1_max and el2 < el2_max:
out += el1 * el2
return out
which I then rewrote using generators:
def filter_by_max(elements1, elements2, el1_max, el2_max):
@nb.njit
def func():
for el1, el2 in zip(elements1, elements2):
if el1 < el1_max and el2 < el2_max:
yield el1, el2
return func
@nb.njit
def myacum2(it):
out = 0
for el1, el2 in it():
out += el1 * el2
return out
Is there a robust and performant way to achieve this decoupling?