skills/llm/perplexity-prompt-ranking/SKILL.md
Rank candidate prompts by computing LLM perplexity of the full conversation conditioned on each prompt, selecting the lowest-perplexity candidate as the best match
npx skillsauth add wenmin-wu/ds-skills llm-perplexity-prompt-rankingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When selecting among multiple candidate prompts (e.g., which instruction produced a given output), format each candidate into a full conversation template with the input/output text, compute the perplexity of each formatted sequence, and rank by lowest perplexity. The LLM assigns lower perplexity to sequences that are more "natural" given its training — the prompt that actually produced the output will typically score lowest.
import torch
import torch.nn as nn
from transformers import AutoModelForCausalLM, AutoTokenizer
def rank_prompts(candidates, context_template, model, tokenizer):
loss_fn = nn.CrossEntropyLoss(reduction='none')
scores = []
texts = [context_template.format(prompt=p) for p in candidates]
inputs = tokenizer(texts, return_tensors="pt", padding=True,
truncation=True, add_special_tokens=False).to(model.device)
with torch.no_grad():
logits = model(**inputs).logits
labels = inputs["input_ids"].clone()
labels[~inputs["attention_mask"].bool()] = -100
shift_logits = logits[:, :-1].contiguous()
shift_labels = labels[:, 1:].contiguous()
for i in range(len(candidates)):
loss = loss_fn(shift_logits[i], shift_labels[i])
valid = (shift_labels[i] != -100).sum()
scores.append((loss.sum() / valid).item())
ranked = sorted(zip(candidates, scores), key=lambda x: x[1])
return ranked[0][0], ranked
best_prompt, all_ranked = rank_prompts(
["Rewrite formally", "Make it rhyme", "Simplify"],
"Instruction: {prompt}\nInput: ...\nOutput: ...",
model, tokenizer)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF