Approach to parallel loops

Hello, I have an njit’ed function which works on daily files and I would like to know the best way to run this in parallel for multiple days. Some candidates are to use: (1) Parallel from Joblib (2) Parallel from Numba / prange or (3) Dask The code looks like this:

@njit(cache=True)
def run_calc_one_day(col_1: np.array, col_2: np.array):
   """ loop logic"""
   return stats

def run_days(dates: list[str]):
   stat_container = []
   for day in dates:
      df = pd.read_parquet(f"date={day}")
      stats = run_calc_one_day(df["col_1"], df["col_2"])
      stat_container.append({day: stat})
      

run_days(["2024-01-01", "2024-01-02"])  ...a few hundred days

Many thanks!

1 Like