_numba_unpickle takes very long

Kayne88 · August 30, 2023, 2:29pm

Hello

I am encountering a behavior with _numba_unpickle function, it’s taking very long. I pinned the problem down to this function:

@njit
def kim_statistics_nb(arr, tau):
  nom = ((1-tau)*len(arr))**(-2)
  denom = (tau*len(arr))**(-2)
  break_point = int(tau*len(arr))
  
  x0 = np.arange(1, break_point+1, 1, dtype=np.float_)
  a = np.vstack((x0, np.ones(len(x0), dtype=np.float_))).T
  m0, c0 = np.linalg.lstsq(a, arr[:break_point])[0]
  res0 = arr[:break_point] - m0*x0+c0

  x1 = np.arange(break_point+1, len(arr), 1, dtype=np.float_)
  a = np.vstack((x1, np.ones(len(x1), dtype=np.float_))).T
  m1, c1 = np.linalg.lstsq(a, arr[break_point+1:])[0]
  res1 = arr[break_point+1:] - m1*x1+c1

  cusum0_sq = np.cumsum(res0)**2
  cusum1_sq = np.cumsum(res1)**2
  return (nom*np.sum(cusum1_sq))/(denom*np.sum(cusum0_sq))

This function I call many times in another njit function, it’s basically a rolling calculation of the above function. The arrays passed into the above function are of size 40 and the overall array I am rolling over is 100k roughly.

Any ideas?

Cheers

nelson2005 · August 30, 2023, 4:46pm

Can you post a minimal complete program that reproduces the issue?

Kayne88 · August 31, 2023, 3:02pm

Here’s an example, how to roughly reproduce it (actual calculation is more involved using parallelization instead of for loop).

@njit
def kim_statistics_nb(arr, tau):
  nom = ((1-tau)*len(arr))**(-2)
  denom = (tau*len(arr))**(-2)
  break_point = int(tau*len(arr))
  
  x0 = np.arange(1, break_point+1, 1, dtype=np.float_)
  a = np.vstack((x0, np.ones(len(x0), dtype=np.float_))).T
  m0, c0 = np.linalg.lstsq(a, arr[:break_point])[0]
  res0 = arr[:break_point] - m0*x0+c0

  x1 = np.arange(break_point+1, len(arr), 1, dtype=np.float_)
  a = np.vstack((x1, np.ones(len(x1), dtype=np.float_))).T
  m1, c1 = np.linalg.lstsq(a, arr[break_point+1:])[0]
  res1 = arr[break_point+1:] - m1*x1+c1

  cusum0_sq = np.cumsum(res0)**2
  cusum1_sq = np.cumsum(res1)**2
  return (nom*np.sum(cusum1_sq))/(denom*np.sum(cusum0_sq))

@njit
def rolling_calc(arr, length, func, *args):
  result = np.full(arr.shape, np.nan)
  for i in range(length, len(arr)):
    result[i] = func(arr[i-length+1:i+1], *args)
  return result

@njit
def np_apply_along_axis(func1d, axis, arr):
  assert arr.ndim == 2
  assert axis in [0, 1]
  if axis == 0:
    result = np.empty(arr.shape[1])
    for i in range(len(result)):
      result[i] = func1d(arr[:, i])
  else:
    result = np.empty(arr.shape[0])
    for i in range(len(result)):
      result[i] = func1d(arr[i, :])
  return result

@njit
def rolling_kim_nb(arr, length=28, n_tau=10):
  taus = np.linspace(0.2, 0.8, n_tau)
  kim = np.full((len(arr), n_tau), np.nan)
  for n, tau in enumerate(taus):
    kim_tau = rolling_calc(arr, length, kim_statistics_nb, tau)
    kim[:, n] = kim_tau
  kim_stats = np_apply_along_axis(np.max, axis=1, arr=kim)
  return kim_stats

arr = np.random.random(100_000)
for _ in range(10):
  r = rolling_kim_nb(arr)

I don’t exactly know, how to show the time spent in _numba_unpickle, but I can see in google colab that this part takes tremendously long.

What does _numba_unpickle actually do? What objects are unpickled?

nelson2005 · August 31, 2023, 4:44pm

Thanks! I instrumented numba/core/serialize.py:_numba_unpickle but am unable to reproduce what you’re seeing. I’m using numba 0.56.4 on Windows. The instrumented code is below, perhaps I missed something.

def _numba_unpickle(address, bytedata, hashed):
    import time
    from timeit import default_timer as timer
    start = time.perf_counter()
    key = (address, hashed)
    try:
        obj = _unpickled_memo[key]
    except KeyError:
        _unpickled_memo[key] = obj = cloudpickle.loads(bytedata)
    print(time.perf_counter() - start)
    return obj

Kayne88 · August 31, 2023, 5:21pm

Ok interesting. How did you make numba use your performance measurement unpickle function?

How long does your code run to finish the 10 calculations?

Can you tell, where the code takes the most time?

I am pretty new to this whole profiling stuff.

Cheers

nelson2005 · August 31, 2023, 6:04pm

I edited the numba library function in serialize.py. Numba already calls that function so I didn’t need to do anything there.
On my laptop the sample code takes about 70 seconds or so.
I didn’t profile or look into where it spends its time, only took a look at the unpickle function since that’s what you asked about. Here is a conversation about profiling numba that may or may not be of interest to you.

Topic		Replies	Views
My numba code is slower than my original code Numba	1	96	February 11, 2025
Reproducibility of pickles Support: How do I do ...?	5	1431	March 15, 2021
Passing namedTuple to a jitted function is slow Community Support	1	871	April 25, 2022
Help with Optimisation Numba	1	202	December 1, 2023
Timings for arr[:, i] seem much slower in numba Community Support	5	230	March 14, 2024

_numba_unpickle takes very long

Related topics