skills/nlp/cls-null-score-span-reranking/SKILL.md
Reranks candidate answer spans by subtracting the CLS token's start+end logit sum as a null-answer baseline score.
npx skillsauth add wenmin-wu/ds-skills nlp-cls-null-score-span-rerankingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In extractive QA, the CLS token's span logits represent the model's confidence that no answer exists. By subtracting the CLS score (start_logits[0] + end_logits[0]) from each candidate span's score, you get a relative measure of how much better the span is than "no answer." This naturally handles unanswerable questions — if no span exceeds the CLS baseline, the question is unanswerable.
import numpy as np
def rerank_spans(start_logits, end_logits, start_indices, end_indices):
"""Score spans relative to CLS null-answer baseline.
Args:
start_logits: (seq_len,) start position logits
end_logits: (seq_len,) end position logits
start_indices: top-k start position candidates
end_indices: top-k end position candidates
Returns:
list of (score, start, end) sorted descending
"""
cls_score = start_logits[0] + end_logits[0]
candidates = []
for s in start_indices:
for e in end_indices:
if e >= s and (e - s) < 30: # max span length
span_score = start_logits[s] + end_logits[e]
score = span_score - cls_score
candidates.append((score, s, e))
return sorted(candidates, reverse=True)
# Positive score = span is better than null; negative = unanswerable
ranked = rerank_spans(start_logits, end_logits, top_starts, top_ends)
if ranked and ranked[0][0] > 0:
best_start, best_end = ranked[0][1], ranked[0][2]
start_logits[0] + end_logits[0]span_score - cls_scoredata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF