skills/cv/kfold-model-averaging/SKILL.md
Average predictions from K independently trained fold models at inference time for variance reduction without stacking complexity
npx skillsauth add wenmin-wu/ds-skills cv-kfold-model-averagingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Train K separate models on K-fold splits, save each checkpoint, then average their predictions at inference. Simpler than stacking — no meta-learner needed — yet typically captures 80% of the ensemble benefit. Works with any model type (CNN, transformer, tree-based).
import numpy as np
import tensorflow as tf
# Training: save best model per fold
from sklearn.model_selection import GroupKFold
gkf = GroupKFold(n_splits=5)
for fold, (train_idx, val_idx) in enumerate(gkf.split(df, groups=df.patient_id)):
model = build_model()
checkpoint = tf.keras.callbacks.ModelCheckpoint(
f'model_fold{fold}.h5', save_best_only=True,
monitor='val_loss', mode='min'
)
model.fit(train_data, epochs=20, callbacks=[checkpoint],
validation_data=val_data)
# Inference: load all folds, average predictions
models = [tf.keras.models.load_model(f'model_fold{i}.h5') for i in range(5)]
predictions = sum(m.predict(test_data) for m in models) / len(models)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF