skills/nlp/token-map-filtered-topk-spans/SKILL.md
Filters candidate span indices through a token map to skip special tokens, then cross-products top-k start/end indices with length constraints.
npx skillsauth add wenmin-wu/ds-skills nlp-token-map-filtered-topk-spansInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extractive QA models output start/end logits over all token positions, including special tokens (CLS, SEP, PAD) and context-only tokens that shouldn't be answer candidates. A token map marks valid answer positions (map value >= 0) vs invalid ones (-1). Filtering through this map before taking top-k indices prevents invalid spans. The cross-product of filtered top-k starts and ends, pruned by ordering and length, gives efficient candidate generation.
import numpy as np
def get_topk_spans(start_logits, end_logits, token_map, n_best=20, max_len=30):
"""Generate candidate spans from filtered top-k start/end indices.
Args:
start_logits: (seq_len,) logits for start positions
end_logits: (seq_len,) logits for end positions
token_map: (seq_len,) array; -1 for invalid positions
n_best: number of top positions to consider
max_len: maximum span length in tokens
"""
def topk_filtered(logits):
# Sort descending, skip position 0 (CLS)
indices = np.argsort(logits[1:]) + 1
# Keep only valid answer positions
indices = indices[token_map[indices] != -1]
return indices[-n_best:] # top-k
starts = topk_filtered(start_logits)
ends = topk_filtered(end_logits)
# Cross-product with constraints
candidates = []
for s in starts:
for e in ends:
if s <= e and (e - s) < max_len:
score = start_logits[s] + end_logits[e]
candidates.append((score, int(s), int(e)))
return sorted(candidates, reverse=True)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF