skills/tabular/multi-seed-fold-averaging/SKILL.md
Trains multiple models per CV fold with different random seeds for augmentation, then averages their predictions to reduce variance from stochastic data generation.
npx skillsauth add wenmin-wu/ds-skills tabular-multi-seed-fold-averagingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When training involves stochastic data augmentation (column shuffling, random oversampling), a single run per fold is noisy — different seeds yield different OOF scores. Train N models per fold, each with a different augmentation seed, and average their predictions before computing the fold's validation score. This smooths out augmentation variance without increasing the number of folds.
import numpy as np
import lightgbm as lgb
N_SEEDS = 5
oof = np.zeros(len(X_train))
for fold, (trn_idx, val_idx) in enumerate(kfold.split(X, y)):
X_val, y_val = X.iloc[val_idx], y.iloc[val_idx]
p_valid = np.zeros(len(val_idx))
for seed in range(N_SEEDS):
np.random.seed(seed)
X_aug, y_aug = augment(X.iloc[trn_idx].values, y.iloc[trn_idx].values)
trn_data = lgb.Dataset(X_aug, label=y_aug)
val_data = lgb.Dataset(X_val, label=y_val)
model = lgb.train(params, trn_data, valid_sets=[val_data],
callbacks=[lgb.early_stopping(500)])
p_valid += model.predict(X_val) / N_SEEDS
oof[val_idx] = p_valid
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF