skills/nlp/token-to-char-span-prediction/SKILL.md
Map token-level classifier outputs back to character-level spans via offset mapping, thresholding, and contiguous grouping
npx skillsauth add wenmin-wu/ds-skills nlp-token-to-char-span-predictionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
End-to-end pipeline for extracting text spans from token-level classifiers. Token probabilities are projected onto character positions using offset mappings, thresholded, and grouped into contiguous spans. Used in NER, clinical note annotation, and extractive QA.
import numpy as np
import itertools
def get_char_probs(texts, predictions, tokenizer):
"""Project per-token probabilities onto character positions."""
results = [np.zeros(len(t)) for t in texts]
for i, (text, pred) in enumerate(zip(texts, predictions)):
enc = tokenizer(text, add_special_tokens=True, return_offsets_mapping=True)
for (start, end), p in zip(enc["offset_mapping"], pred):
results[i][start:end] = p
return results
def get_spans(char_probs, threshold=0.5):
"""Threshold char probs and group consecutive indices into spans."""
spans = []
for probs in char_probs:
indices = np.where(probs >= threshold)[0]
groups = [list(g) for _, g in itertools.groupby(
indices, key=lambda n, c=itertools.count(): n - next(c))]
spans.append([(min(g), max(g) + 1) for g in groups])
return spans
offset_mapping(char_start, char_end) — assign token prob to all chars in rangeitertools.groupby with counter detects gaps between consecutive indicesdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF