skills/nlp/bucket-batching-seq2seq/SKILL.md
Group variable-length sequences into length-sorted buckets before batching to minimize padding waste during seq2seq inference
npx skillsauth add wenmin-wu/ds-skills nlp-bucket-batching-seq2seqInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Naive batching pads all sequences to the longest in the batch, wasting compute on short inputs. Sort by length, split into buckets, then batch within each bucket. Reduces padding by 30-60% and stabilizes GPU memory usage.
from torch.utils.data import Sampler
import numpy as np
class BucketBatchSampler(Sampler):
def __init__(self, lengths, batch_size, num_buckets=8, shuffle=False):
sorted_idx = sorted(range(len(lengths)), key=lambda i: lengths[i])
bsize = max(1, len(sorted_idx) // num_buckets)
self.batches = []
for b in range(num_buckets):
start = b * bsize
end = None if b == num_buckets - 1 else (b + 1) * bsize
bucket = sorted_idx[start:end]
if shuffle:
np.random.shuffle(bucket)
for i in range(0, len(bucket), batch_size):
self.batches.append(bucket[i:i + batch_size])
def __iter__(self):
return iter(self.batches)
def __len__(self):
return len(self.batches)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF