skills/llm/kv-cache-prefix-scoring/SKILL.md
Shares KV cache from a common prefix (context+question) across multiple answer suffixes for efficient multi-choice scoring.
npx skillsauth add wenmin-wu/ds-skills llm-kv-cache-prefix-scoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For multiple-choice QA, the context and question are shared across all answer options. Compute the KV cache once for this shared prefix, then score each answer suffix by expanding the cache. This avoids redundant computation — scoring 5 options costs ~1.2x of a single forward pass instead of 5x.
import torch
def score_options(model, prefix_ids, suffix_ids_list):
"""Score multiple suffixes against a shared prefix."""
with torch.no_grad():
# Forward pass on shared prefix — cache the KV states
prefix_out = model(prefix_ids, use_cache=True)
kv_cache = prefix_out.past_key_values
scores = []
for suffix_ids in suffix_ids_list:
# Expand cache to match suffix batch
expanded_kv = [(k.expand_as(k), v.expand_as(v)) for k, v in kv_cache]
out = model(suffix_ids, past_key_values=expanded_kv)
# Score = mean log-prob of suffix tokens
logprobs = torch.log_softmax(out.logits[:, :-1], dim=-1)
token_scores = logprobs.gather(2, suffix_ids[:, 1:].unsqueeze(-1))
scores.append(token_scores.mean().item())
return scores
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF