Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

wenmin-wu/timeseries-kaggle-api-streaming-inference

Name: timeseries-kaggle-api-streaming-inference
Author: wenmin-wu

skills/timeseries/kaggle-api-streaming-inference/SKILL.md

npx skillsauth add wenmin-wu/ds-skills timeseries-kaggle-api-streaming-inference

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Kaggle API Streaming Inference

Overview

Kaggle time-series competitions often use a streaming API where test data arrives one day at a time. The model must predict using only data available up to that point. This requires maintaining a rolling history buffer, computing lag features on the fly, and appending each day's predictions back to the buffer for use as future lag inputs.

Quick Start

import pandas as pd
from datetime import timedelta

history = train_df[["entity_id", "date"] + TARGETS].copy()

env = competition.make_env()
for test_df, sample_sub in env.iter_test():
    eval_date = pd.to_datetime(test_df["date"].iloc[0])

    # Compute lag features from history
    lag_features = []
    for lag in range(1, 21):
        lag_date = eval_date - timedelta(days=lag)
        lag_vals = history[history["date"] == lag_date][["entity_id"] + TARGETS]
        lag_vals = lag_vals.rename(columns={t: f"{t}_{lag}" for t in TARGETS})
        lag_features.append(lag_vals)

    features = test_df[["entity_id"]].copy()
    for lf in lag_features:
        features = features.merge(lf, on="entity_id", how="left")
    features = features.fillna(0)

    preds = model.predict(features[lag_cols])
    sample_sub[TARGETS] = preds

    # Update history with today's predictions
    new_row = test_df[["entity_id", "date"]].copy()
    new_row[TARGETS] = preds
    history = pd.concat([history, new_row], ignore_index=True)

    env.predict(sample_sub)

Workflow

Initialize history buffer with all available training data
For each test day received from iter_test(): a. Extract the evaluation date b. Look up lag values from the history buffer c. Construct the feature vector matching the training schema d. Predict and submit e. Append predictions to history for future lag computation
Keep only the last N days in the buffer to limit memory usage

Key Decisions

History pruning: keep last 30-60 days only; older data is never queried as a lag
Missing lags: fill with 0 or entity-level median from training data
Prediction feedback: using model predictions as future lags compounds errors — consider blending with entity median
Deduplication: if resubmitting, deduplicate history by (entity_id, date) keeping the latest entry
Performance: precompute entity medians and yearly stats before the loop to avoid repeated groupby operations

References

Getting Started with MLB Player Digital Engagement
[Fork of] LightGBM + CatBoost + ANN 2505f2

wenmin-wu/timeseries-kaggle-api-streaming-inference

skills/timeseries/kaggle-api-streaming-inference/SKILL.md

Predict day-by-day via Kaggle's iter_test API while maintaining a rolling history buffer for computing lag features online

31 stars

development

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add wenmin-wu/ds-skills timeseries-kaggle-api-streaming-inference

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 23, 2026, 7:31 AM138.7s1 file scanned

SKILL.md

name:: timeseries-kaggle-api-streaming-inference
description:: Predict day-by-day via Kaggle's iter_test API while maintaining a rolling history buffer for computing lag features online

Kaggle API Streaming Inference

Overview

Quick Start

import pandas as pd
from datetime import timedelta

history = train_df[["entity_id", "date"] + TARGETS].copy()

env = competition.make_env()
for test_df, sample_sub in env.iter_test():
    eval_date = pd.to_datetime(test_df["date"].iloc[0])

    # Compute lag features from history
    lag_features = []
    for lag in range(1, 21):
        lag_date = eval_date - timedelta(days=lag)
        lag_vals = history[history["date"] == lag_date][["entity_id"] + TARGETS]
        lag_vals = lag_vals.rename(columns={t: f"{t}_{lag}" for t in TARGETS})
        lag_features.append(lag_vals)

    features = test_df[["entity_id"]].copy()
    for lf in lag_features:
        features = features.merge(lf, on="entity_id", how="left")
    features = features.fillna(0)

    preds = model.predict(features[lag_cols])
    sample_sub[TARGETS] = preds

    # Update history with today's predictions
    new_row = test_df[["entity_id", "date"]].copy()
    new_row[TARGETS] = preds
    history = pd.concat([history, new_row], ignore_index=True)

    env.predict(sample_sub)

Workflow

Initialize history buffer with all available training data
For each test day received from iter_test(): a. Extract the evaluation date b. Look up lag values from the history buffer c. Construct the feature vector matching the training schema d. Predict and submit e. Append predictions to history for future lag computation
Keep only the last N days in the buffer to limit memory usage

Key Decisions

History pruning: keep last 30-60 days only; older data is never queried as a lag
Missing lags: fill with 0 or entity-level median from training data
Prediction feedback: using model predictions as future lags compounds errors — consider blending with entity median
Deduplication: if resubmitting, deduplicate history by (entity_id, date) keeping the latest entry
Performance: precompute entity medians and yearly stats before the loop to avoid repeated groupby operations

References

Getting Started with MLB Player Digital Engagement
[Fork of] LightGBM + CatBoost + ANN 2505f2

Related Skills

wenmin-wu/timeseries-scaled-pinball-loss

data-ai

VerifiedTrustedCommunity

Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-scaled-pinball-loss

wenmin-wu/timeseries-retroactive-outlier-rescaling

data-ai

VerifiedTrustedCommunity

Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-retroactive-outlier-rescaling

wenmin-wu/timeseries-ratio-target-for-smape

testing

VerifiedTrustedCommunity

Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-ratio-target-for-smape

wenmin-wu/timeseries-quantile-ratio-scaling

tools

VerifiedTrustedCommunity

Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-quantile-ratio-scaling

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/wenmin-wu/ds-skills.git

# Copy into Claude Code skills folder (global)
cp -r ds-skills/skills/timeseries/kaggle-api-streaming-inference ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

wenmin-wu/ds-skills

31 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT