skills/timeseries/rolling-refit-arima-forecast/SKILL.md
Walk-forward validation for ARIMA by refitting on history at each step, forecasting one step ahead, then appending the true observation — produces an honest one-step error distribution that mirrors nightly-retrained production forecasters
npx skillsauth add wenmin-wu/ds-skills timeseries-rolling-refit-arima-forecastInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ARIMA.forecast(steps=h) compounds error and hides refit cost. A rolling one-step loop — refit on current history, predict the next point, append the true value — gives you the error distribution a nightly-retrained production model would actually see. The distinction matters: multi-step ARIMA looks much worse than it really is because errors integrate, and a static-fit model looks much better than it really is because it never sees drift. The rolling refit is the honest middle ground.
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
history = list(train)
predictions = []
for t in range(len(test)):
fit = ARIMA(history, order=(5, 0, 5)).fit(method='css') # css = faster
yhat = float(fit.forecast(steps=1)[0])
predictions.append(yhat)
history.append(test.iloc[t]) # append TRUTH, not prediction
pred = pd.Series(predictions, index=test.index)
abs_err = (test - pred).abs()
train into a mutable history listhistory, forecast one step, record ithistory before the next iterationtest.index; plot observed vs forecast and the absolute-error distributionmethod='css': conditional sum of squares is faster than full MLE and plenty accurate for walk-forward. Bump maxiter if (p,q) are large.arma_order_select_ic: BIC over a small grid on the train split, then freeze. Re-selecting each step wastes compute and rarely changes the order.data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF