skills/nlp/bio-tag-span-reconstruction/SKILL.md
Reconstructs named entity spans from BIO token-level tags, handling B/I/O transitions and sentence boundaries.
npx skillsauth add wenmin-wu/ds-skills nlp-bio-tag-span-reconstructionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Token-classification NER models output per-token BIO tags (B=begin, I=inside, O=outside). Convert these back into entity text spans by tracking state transitions: B starts a new entity, I continues it, O or another B closes the current span. Essential post-processing for any BERT/transformer NER pipeline.
def reconstruct_spans(words, tags):
entities = []
current = []
for word, tag in zip(words, tags):
if tag == "B":
if current:
entities.append(" ".join(current))
current = [word]
elif tag == "I" and current:
current.append(word)
else:
if current:
entities.append(" ".join(current))
current = []
if current:
entities.append(" ".join(current))
return entities
# Usage with BERT NER output:
words = sentence.split()
tags = model.predict(words) # ["O", "B", "I", "I", "O", ...]
entities = reconstruct_spans(words, tags)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF