# Moving/Rolling linear regression

I am trying to create a moving linear regression and I wanted to utilize `numba` . However, I am struggling with the latter part as I lack the relevant experience.

Here’s what I have so far using pure `numpy`. It is working, however, without applying `numba` it is quite slow once you throw large arrays at it.

``````import numpy as np

def ols_1d(y, window):
y_roll = np.lib.stride_tricks.sliding_window_view(y, window_shape=window)

m = list()
c = list()
for row in np.arange(y_roll.shape[0]):
A = np.vstack([np.arange(1, window + 1), np.ones(window)]).T
tmp_m, tmp_c = np.linalg.lstsq(A, y_roll[row], rcond=None)[0]
m.append(tmp_m)
c.append(tmp_c)

m, c = np.array([m, c])

return np.hstack((np.full((window - 1), np.nan), m * window + c))

def ols_2d(y, window):
out = list()
for col in range(y.shape[1]):
out.append(ols_1d(y=y[:, col], window=window))
return np.array(out).T

if __name__ == "__main__":
a = np.random.randn(
10000, 10
)  # function is slow once you really increse number of columns

print(ols_2d(a, 10))
``````

I need those functions to become as fast as possible. I am working with arrays that are potentially quite large (10,000 by 1,000,000).

Any help in applying `numba` is highly appreciated.

I think because this code might spend a lot of its time in compiled NumPy functions, there may be a limited speedup from using Numba here - that said, I have a couple of follow-on questions to try and understand what we can do here:

When I run your code, I get:

``````\$ python repro.py
[[        nan         nan         nan ...         nan         nan
nan]
[        nan         nan         nan ...         nan         nan
nan]
[        nan         nan         nan ...         nan         nan
nan]
...
[ 0.01638578 -0.08973367 -0.24285798 ...  0.81616557  0.60287684
0.27096057]
[ 0.61282818  0.68256055 -0.76067196 ...  0.93851723 -0.23058742
0.05134814]
[ 0.89330888  1.02812995 -0.5556358  ...  0.82519863 -0.46228677
-1.22771044]]
``````

Is this what you’d expect to see? I’m not sure it will be easy to experiment with the code with Numba and know that I haven’t modified its operation when it produces `nan` values in the output.

Can you post the code you’ve tried using Numba too, and how you’re measuring performance?

Hi @andi,

Some thoughts from a former transportation econometrician.

Rember in OLS the coefficients (Beta) are calculated as follows in matrix form (Greene, 66-90):

``````Beta = inv(X^T * X) * (X^T * Y)
``````

Where X is the matrix of m-columns (explanatory variables) with n-rows (observations):

``````x_0,0    x_0,1    ...    x_0,m
x_1,0    x_1,1    ...    x_1,m
...      ...    ...      ...
x_n,0    x_n,1    ...    x_n,m
``````

And Y is the vector of dependent variable with n-rows (observations).

Thus Beta is equal to:

``````Beta =
inv(
[
sum_i_n(x_i,0^2)          sum_i_n(x_i,0 * x_i,1)    ...
sum_i_n(x_i,1 * x_i,0)    sum_i_n(x_i,2^2)          ...
...                       ...                       sum_i_n(x_i,m^2)
]
) *
[ sum_i_n(x_i,0 * y_i),    sum_i_n(x_i,1 * y_i),    ..., sum_i_n(x_i,m * y_i)))
]
``````

Thus when running regressions on a moving time windows using basic calls, you incur recalculations. For example say you are calculating in the following time periods: [i=0, n=10] and [i=1, n=11]. If you call `numpy.linalg.lstsq` you will have computed `x_[1:10],0^2` twice. You will also have computed `x_[1:10],1*x_[1:10],0` twice, and so on.

You can avoid recalculation by writing your own OLS implementation that saves precomputed values.

References
Greene, William H. “The Linear Regression Model: Least Squares,” in Econometric Analysis 66-90. Pearson, 2012.

Hope this helps!

BR,
Ryan

If all you’re after is the result of simple linear regression, you can gain a lot of performance by skipping the linear algebra needed for the general case.

Implementing this using `guvectorize` helps a little extra when compiling for the `parallel` target, but in my opinion the main benefit of `guvectorize` is the standard ufunc benefits you’ll get for free. Like the axis keyword to apply it along any axis of an n-D array. This allows you to implement only the 1D-case, and it will work for any n-D input, removing the need for “wrappers” like your `ols_2d` function.

``````import numpy as np
import numba

@numba.guvectorize(
[
"void(float32[:], uint64, float32[:])",
"void(float64[:], uint64, float64[:])",
],
"(n),()->(n)",
nopython=True,
target="parallel",
)
def moving_ols_guvec(y, window, out):

n = y.size

# force init with NaN, for clarity
out[:] = np.nan

x = np.arange(1,window+1)
x_mean = x.mean()
x_diff = x - x_mean
denom = (x_diff**2).sum()

for i in numba.prange(window-1, n):

y_win = y[i-window+1:i+1]
y_mean = y_win.mean()

slope = (x_diff*(y_win-y_mean)).sum() / denom
intercept = y_mean - slope * x_mean

out[i] = slope * window + intercept

window_size = 10
numba_result = moving_ols_guvec(a, window_size, axis=0)
``````

You could probably gain a little extra performance by using something like Welfords algorithm to integrate over the window yourself, instead of slicing the y-input as in my example above. But I’m not sure if that would be worth it, it’s more code and probably less readable for others.

Using the snippet above I get a little over a 300x speedup compared to your example in the opening post. This was with an input shape of (5000, 100) and a window of 25 on a relatively old i3-7100 cpu (2 cores, 4 threads). Your mileage may vary depending on those settings.

Regards,
Rutger