skills/cv/pack-padded-sequence-loss/SKILL.md
Uses pack_padded_sequence to exclude padding tokens from cross-entropy loss in variable-length sequence generation.
npx skillsauth add wenmin-wu/ds-skills cv-pack-padded-sequence-lossInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In image captioning and seq2seq tasks, target sequences have different lengths and are padded. Computing cross-entropy on padding tokens wastes compute and leaks gradient noise. pack_padded_sequence strips out padding positions before loss computation, giving a clean loss signal from real tokens only.
from torch.nn.utils.rnn import pack_padded_sequence
# Sort batch by decreasing sequence length (required by pack_padded_sequence)
caption_lengths, sort_ind = caption_lengths.squeeze(1).sort(dim=0, descending=True)
predictions = predictions[sort_ind]
targets = targets[sort_ind]
decode_lengths = (caption_lengths - 1).tolist() # exclude <sos>
# Pack to strip padding, then compute loss on real tokens only
packed_preds = pack_padded_sequence(predictions, decode_lengths, batch_first=True).data
packed_targets = pack_padded_sequence(targets, decode_lengths, batch_first=True).data
loss = nn.CrossEntropyLoss()(packed_preds, packed_targets)
decode_lengths = actual length of each target sequencepack_padded_sequence.data attribute (flattened real tokens)pack_padded_sequence requires descending sort; re-sort encoder outputs tooignore_index in CrossEntropyLoss for padding token ID — simpler but slightly less efficientdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF