skills/cv/gradient-accumulation/SKILL.md
Accumulates gradients over multiple mini-batches before stepping the optimizer, simulating larger effective batch sizes.
npx skillsauth add wenmin-wu/ds-skills cv-gradient-accumulationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When GPU memory limits batch size, accumulate gradients over N mini-batches before calling optimizer.step(). The effective batch size becomes mini_batch × N. This is critical for large image models (ResNet200d, EfficientNet-B7) where batch size 4-8 per GPU would produce noisy gradients.
import torch
from torch.cuda.amp import autocast, GradScaler
accum_steps = 4 # effective batch = mini_batch * 4
scaler = GradScaler()
for step, (images, labels) in enumerate(train_loader):
images, labels = images.cuda(), labels.cuda()
with autocast():
logits = model(images)
loss = criterion(logits, labels) / accum_steps # scale loss
scaler.scale(loss).backward()
if (step + 1) % accum_steps == 0:
scaler.unscale_(optimizer)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
scaler.step(optimizer)
scaler.update()
optimizer.zero_grad()
.grad)data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF