skills/cv/mixed-precision-training/SKILL.md
Uses PyTorch AMP autocast and GradScaler for FP16 training, halving memory usage and speeding up training on modern GPUs.
npx skillsauth add wenmin-wu/ds-skills cv-mixed-precision-trainingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run forward pass in FP16 (half precision) while keeping master weights in FP32. This halves GPU memory for activations and enables larger batch sizes or image resolutions. PyTorch's autocast handles dtype selection per operation; GradScaler prevents underflow in FP16 gradients.
import torch
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for images, labels in train_loader:
images, labels = images.cuda(), labels.cuda()
optimizer.zero_grad()
with autocast():
logits = model(images)
loss = criterion(logits, labels)
scaler.scale(loss).backward()
scaler.unscale_(optimizer)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
scaler.step(optimizer)
scaler.update()
GradScaler() before training loopautocast() contextscaler.scale(loss).backward() instead of loss.backward()scaler.step(optimizer) and scaler.update() replace optimizer.step()unscale_ before clip_grad_norm_ to clip in FP32 spacedata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF