skills/nlp/length-sorted-batching/SKILL.md
Sort texts by length before batching with dynamic padding to minimize wasted padding tokens and speed up transformer inference
npx skillsauth add wenmin-wu/ds-skills nlp-length-sorted-batchingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When using dynamic padding (padding='longest'), batch padding length equals the longest sequence in the batch. If a 512-token text lands in a batch of 20-token texts, every sample pads to 512. Sorting by length first ensures each batch contains similarly-sized texts, drastically reducing total padding tokens. Typical speedup: 1.5–3x on inference.
from transformers import DataCollatorWithPadding
df['length'] = df['text'].apply(len)
df = df.sort_values('length').reset_index(drop=True)
dataset = Dataset.from_pandas(df)
tokenized = dataset.map(lambda x: tokenizer(x['text'], truncation=True),
batched=True)
loader = DataLoader(
tokenized, batch_size=32, shuffle=False,
collate_fn=DataCollatorWithPadding(tokenizer, padding='longest')
)
shuffle=False and padding='longest'data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF