skills/timeseries/kaggle-api-streaming-inference/SKILL.md
Predict day-by-day via Kaggle's iter_test API while maintaining a rolling history buffer for computing lag features online
npx skillsauth add wenmin-wu/ds-skills timeseries-kaggle-api-streaming-inferenceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Kaggle time-series competitions often use a streaming API where test data arrives one day at a time. The model must predict using only data available up to that point. This requires maintaining a rolling history buffer, computing lag features on the fly, and appending each day's predictions back to the buffer for use as future lag inputs.
import pandas as pd
from datetime import timedelta
history = train_df[["entity_id", "date"] + TARGETS].copy()
env = competition.make_env()
for test_df, sample_sub in env.iter_test():
eval_date = pd.to_datetime(test_df["date"].iloc[0])
# Compute lag features from history
lag_features = []
for lag in range(1, 21):
lag_date = eval_date - timedelta(days=lag)
lag_vals = history[history["date"] == lag_date][["entity_id"] + TARGETS]
lag_vals = lag_vals.rename(columns={t: f"{t}_{lag}" for t in TARGETS})
lag_features.append(lag_vals)
features = test_df[["entity_id"]].copy()
for lf in lag_features:
features = features.merge(lf, on="entity_id", how="left")
features = features.fillna(0)
preds = model.predict(features[lag_cols])
sample_sub[TARGETS] = preds
# Update history with today's predictions
new_row = test_df[["entity_id", "date"]].copy()
new_row[TARGETS] = preds
history = pd.concat([history, new_row], ignore_index=True)
env.predict(sample_sub)
iter_test():
a. Extract the evaluation date
b. Look up lag values from the history buffer
c. Construct the feature vector matching the training schema
d. Predict and submit
e. Append predictions to history for future lag computationdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF