skills/cv/stochastic-weight-averaging/SKILL.md
Keras callback implementing Stochastic Weight Averaging (SWA) — running mean of model weights over final training epochs
npx skillsauth add wenmin-wu/ds-skills cv-stochastic-weight-averagingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
SWA averages the weights of a model across the final epochs of training. This simple trick often gives a significant boost over picking the single best checkpoint because the average sits in a wider, flatter region of the loss landscape, generalizing better. Unlike checkpoint ensembles, SWA costs nothing at inference (one model, one forward pass).
import keras
class SWA(keras.callbacks.Callback):
def __init__(self, filepath, swa_start_epoch):
super().__init__()
self.filepath = filepath
self.swa_start = swa_start_epoch
self.swa_weights = None
def on_epoch_end(self, epoch, logs=None):
if epoch == self.swa_start:
self.swa_weights = self.model.get_weights()
elif epoch > self.swa_start:
n = epoch - self.swa_start
new = self.model.get_weights()
for i in range(len(self.swa_weights)):
self.swa_weights[i] = (self.swa_weights[i] * n + new[i]) / (n + 1)
def on_train_end(self, logs=None):
self.model.set_weights(self.swa_weights)
self.model.save_weights(self.filepath)
# Usage
model.fit(..., callbacks=[SWA('swa.h5', swa_start_epoch=10)])
swa_start_epoch (usually 60-80% of total epochs)data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF