skills/llm/batched-perplexity-scoring/SKILL.md
Batch-compute perplexity for multiple texts using a causal LM with proper padding, shifted labels, and pad-token masking for efficient GPU utilization
npx skillsauth add wenmin-wu/ds-skills llm-batched-perplexity-scoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Single-sequence perplexity computation underutilizes the GPU. Batch multiple texts together with left-padding, mask out pad tokens in the loss, and compute perplexity per sequence in one forward pass. This is 4-8x faster than sequential scoring and essential when evaluating thousands of candidate orderings or generations.
import torch
import transformers
from math import exp
PAD_LABEL = -100
class BatchPerplexityScorer:
def __init__(self, model_path, device="cuda"):
self.tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)
self.tokenizer.pad_token = self.tokenizer.eos_token
self.tokenizer.padding_side = "left"
self.model = transformers.AutoModelForCausalLM.from_pretrained(
model_path, device_map=device, torch_dtype=torch.float16)
self.loss_fn = torch.nn.CrossEntropyLoss(reduction='none')
@torch.no_grad()
def score(self, texts, batch_size=8):
all_ppl = []
for i in range(0, len(texts), batch_size):
batch = [f"{self.tokenizer.bos_token}{t}{self.tokenizer.eos_token}"
for t in texts[i:i+batch_size]]
inputs = self.tokenizer(batch, return_tensors='pt',
padding=True, add_special_tokens=False)
inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
logits = self.model(**inputs).logits
labels = inputs['input_ids'].clone()
labels[labels == self.tokenizer.pad_token_id] = PAD_LABEL
shift_logits = logits[:, :-1].contiguous()
shift_labels = labels[:, 1:].contiguous()
loss = self.loss_fn(
shift_logits.view(-1, shift_logits.size(-1)),
shift_labels.view(-1)).view(len(batch), -1)
valid = (shift_labels != PAD_LABEL).sum(dim=-1)
ppl = [exp((loss[j].sum() / valid[j]).item()) for j in range(len(batch))]
all_ppl.extend(ppl)
return all_ppl
pad_token = eos_tokenadd_special_tokens=False ensures exact controldata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF