skills/cv/ema-model-averaging/SKILL.md
Tracks an Exponential Moving Average of model weights during training and evaluates both live and EMA models for more stable predictions.
npx skillsauth add wenmin-wu/ds-skills cv-ema-model-averagingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Exponential Moving Average (EMA) maintains a shadow copy of model weights as a running average: ema_w = decay * ema_w + (1 - decay) * w. The EMA model is smoother than the live model — it averages over many training steps, reducing variance from batch noise and learning rate spikes. At inference, the EMA weights typically outperform any single checkpoint by 0.002–0.01. During training, evaluate both live and EMA models to track both trajectories.
import torch
from copy import deepcopy
class ModelEMA:
def __init__(self, model, decay=0.999):
self.ema = deepcopy(model)
self.ema.eval()
self.decay = decay
for p in self.ema.parameters():
p.requires_grad_(False)
@torch.no_grad()
def update(self, model):
for ema_p, model_p in zip(self.ema.parameters(), model.parameters()):
ema_p.data.mul_(self.decay).add_(model_p.data, alpha=1 - self.decay)
def state_dict(self):
return self.ema.state_dict()
# Training loop
model = MyModel().cuda()
ema = ModelEMA(model, decay=0.999)
for epoch in range(num_epochs):
model.train()
for batch in train_loader:
loss = criterion(model(batch['image']), batch['target'])
loss.backward()
optimizer.step()
optimizer.zero_grad()
ema.update(model) # update EMA after each step
# Evaluate both
val_score = evaluate(model, val_loader)
ema_score = evaluate(ema.ema, val_loader)
print(f"Live: {val_score:.4f}, EMA: {ema_score:.4f}")
# Save EMA weights for inference
torch.save(ema.state_dict(), 'model_ema.pth')
requires_grad=Falseema.update(model)data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF