Moving/Rolling linear regression

I am trying to create a moving linear regression and I wanted to utilize numba . However, I am struggling with the latter part as I lack the relevant experience.

Here’s what I have so far using pure numpy. It is working, however, without applying numba it is quite slow once you throw large arrays at it.

import numpy as np

def ols_1d(y, window):
    y_roll = np.lib.stride_tricks.sliding_window_view(y, window_shape=window)

    m = list()
    c = list()
    for row in np.arange(y_roll.shape[0]):
        A = np.vstack([np.arange(1, window + 1), np.ones(window)]).T
        tmp_m, tmp_c = np.linalg.lstsq(A, y_roll[row], rcond=None)[0]
        m.append(tmp_m)
        c.append(tmp_c)

    m, c = np.array([m, c])

    return np.hstack((np.full((window - 1), np.nan), m * window + c))

def ols_2d(y, window):
    out = list()
    for col in range(y.shape[1]):
        out.append(ols_1d(y=y[:, col], window=window))
    return np.array(out).T

if __name__ == "__main__":
    a = np.random.randn(
        10000, 10
    )  # function is slow once you really increse number of columns

    print(ols_2d(a, 10))

I need those functions to become as fast as possible. I am working with arrays that are potentially quite large (10,000 by 1,000,000).

Any help in applying numba is highly appreciated.

I think because this code might spend a lot of its time in compiled NumPy functions, there may be a limited speedup from using Numba here - that said, I have a couple of follow-on questions to try and understand what we can do here:

When I run your code, I get:

$ python repro.py 
[[        nan         nan         nan ...         nan         nan
          nan]
 [        nan         nan         nan ...         nan         nan
          nan]
 [        nan         nan         nan ...         nan         nan
          nan]
 ...
 [ 0.01638578 -0.08973367 -0.24285798 ...  0.81616557  0.60287684
   0.27096057]
 [ 0.61282818  0.68256055 -0.76067196 ...  0.93851723 -0.23058742
   0.05134814]
 [ 0.89330888  1.02812995 -0.5556358  ...  0.82519863 -0.46228677
  -1.22771044]]

Is this what you’d expect to see? I’m not sure it will be easy to experiment with the code with Numba and know that I haven’t modified its operation when it produces nan values in the output.

Can you post the code you’ve tried using Numba too, and how you’re measuring performance?

Many thanks in advance!

Hi @andi,

Some thoughts from a former transportation econometrician.

Rember in OLS the coefficients (Beta) are calculated as follows in matrix form (Greene, 66-90):

Beta = inv(X^T * X) * (X^T * Y)

Where X is the matrix of m-columns (explanatory variables) with n-rows (observations):

x_0,0    x_0,1    ...    x_0,m
x_1,0    x_1,1    ...    x_1,m
  ...      ...    ...      ...
x_n,0    x_n,1    ...    x_n,m

And Y is the vector of dependent variable with n-rows (observations).

Thus Beta is equal to:

Beta = 
inv(
  [ 
    sum_i_n(x_i,0^2)          sum_i_n(x_i,0 * x_i,1)    ...
    sum_i_n(x_i,1 * x_i,0)    sum_i_n(x_i,2^2)          ...
    ...                       ...                       sum_i_n(x_i,m^2)
  ]
) * 
[ sum_i_n(x_i,0 * y_i),    sum_i_n(x_i,1 * y_i),    ..., sum_i_n(x_i,m * y_i)))
] 

Thus when running regressions on a moving time windows using basic calls, you incur recalculations. For example say you are calculating in the following time periods: [i=0, n=10] and [i=1, n=11]. If you call numpy.linalg.lstsq you will have computed x_[1:10],0^2 twice. You will also have computed x_[1:10],1*x_[1:10],0 twice, and so on.

You can avoid recalculation by writing your own OLS implementation that saves precomputed values.

References
Greene, William H. “The Linear Regression Model: Least Squares,” in Econometric Analysis 66-90. Pearson, 2012.

Hope this helps!

BR,
Ryan

If all you’re after is the result of simple linear regression, you can gain a lot of performance by skipping the linear algebra needed for the general case.

Implementing this using guvectorize helps a little extra when compiling for the parallel target, but in my opinion the main benefit of guvectorize is the standard ufunc benefits you’ll get for free. Like the axis keyword to apply it along any axis of an n-D array. This allows you to implement only the 1D-case, and it will work for any n-D input, removing the need for “wrappers” like your ols_2d function.

import numpy as np
import numba

@numba.guvectorize(
    [
        "void(float32[:], uint64, float32[:])",
        "void(float64[:], uint64, float64[:])",
    ],
    "(n),()->(n)",
    nopython=True,
    target="parallel",
)
def moving_ols_guvec(y, window, out):
    
    n = y.size
    
    # force init with NaN, for clarity
    out[:] = np.nan
    
    x = np.arange(1,window+1)
    x_mean = x.mean()
    x_diff = x - x_mean
    denom = (x_diff**2).sum()
    
    for i in numba.prange(window-1, n):
        
        y_win = y[i-window+1:i+1]
        y_mean = y_win.mean()
        
        slope = (x_diff*(y_win-y_mean)).sum() / denom
        intercept = y_mean - slope * x_mean
        
        out[i] = slope * window + intercept

window_size = 10
numba_result = moving_ols_guvec(a, window_size, axis=0)

You could probably gain a little extra performance by using something like Welfords algorithm to integrate over the window yourself, instead of slicing the y-input as in my example above. But I’m not sure if that would be worth it, it’s more code and probably less readable for others.

Using the snippet above I get a little over a 300x speedup compared to your example in the opening post. This was with an input shape of (5000, 100) and a window of 25 on a relatively old i3-7100 cpu (2 cores, 4 threads). Your mileage may vary depending on those settings.

Regards,
Rutger