skills/tabular/streaming-prediction-api/SKILL.md
Online inference pattern that processes test batches sequentially, updating feature dictionaries incrementally for time-series prediction APIs.
npx skillsauth add wenmin-wu/ds-skills tabular-streaming-prediction-apiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Some competitions (and production systems) serve test data in sequential batches — you predict one batch, then receive the next with updated ground truth. Maintain in-memory feature dictionaries that update after each batch. This pattern handles the Kaggle time-series API and applies to any online learning / streaming prediction scenario.
from collections import defaultdict
import numpy as np
class StreamingPredictor:
def __init__(self, model, features, global_mean=0.5):
self.model = model
self.features = features
self.global_mean = global_mean
self.user_sum = defaultdict(float)
self.user_count = defaultdict(int)
self.content_mean = {} # precomputed from training
def predict_batch(self, batch_df):
"""Predict and update features for one batch."""
# Build features from current state
batch_df['user_mean'] = batch_df['user_id'].map(
lambda u: self.user_sum[u] / self.user_count[u]
if self.user_count[u] > 0 else self.global_mean
)
batch_df['user_count'] = batch_df['user_id'].map(self.user_count)
batch_df['content_mean'] = batch_df['content_id'].map(self.content_mean)
preds = self.model.predict(batch_df[self.features])
return preds
def update(self, batch_df, outcomes):
"""Update internal state with observed outcomes."""
for uid, outcome in zip(batch_df['user_id'], outcomes):
self.user_sum[uid] += outcome
self.user_count[uid] += 1
# Usage with Kaggle API
predictor = StreamingPredictor(model, feature_cols)
for test_df, sample_pred_df in env.iter_test():
preds = predictor.predict_batch(test_df)
predictor.update(test_df, preds) # or true outcomes if available
env.predict(sample_pred_df.assign(answered_correctly=preds))
int32 keys and float32 valuesdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF