skills/nlp/space-aware-span-postprocessing/SKILL.md
Clean up character-level span predictions by removing isolated space characters at span boundaries
npx skillsauth add wenmin-wu/ds-skills nlp-space-aware-span-postprocessingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Tokenizers (especially BPE/RoBERTa) often include leading/trailing spaces in token offsets. This causes span predictions to include stray space characters at boundaries. Post-process by removing isolated space predictions and bridging spaces between positive predictions.
import numpy as np
def postprocess_spaces(predictions, text):
"""Fix space artifacts in character-level span predictions."""
preds = np.copy(predictions)
for i in range(1, len(text) - 1):
if text[i] == " ":
if preds[i] and not preds[i - 1]: # leading space
preds[i] = 0
if preds[i] and not preds[i + 1]: # trailing space
preds[i] = 0
if preds[i - 1] and preds[i + 1]: # bridge gap
preds[i] = 1
return preds
Ġword), so offset mapping assigns space characters the same probability as the word"chest pain" into "chest" + "pain"data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF