skills/nlp/adjacent-span-merging/SKILL.md
Merges nearby predicted NER spans of the same class within a word-distance threshold into single coherent segments.
npx skillsauth add wenmin-wu/ds-skills nlp-adjacent-span-mergingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Token-level NER models often fragment a single discourse element into multiple adjacent spans — a long "Evidence" paragraph gets split into two or three pieces. Adjacent span merging fixes this by joining same-class spans that are within a configurable word-distance gap. This is especially important for long-document NER where discourse elements span many sentences.
def merge_adjacent_spans(predictions, max_gap=1, merge_classes=None):
"""Merge nearby same-class spans within max_gap words.
Args:
predictions: list of (doc_id, class, word_indices_str)
max_gap: max word-index gap to merge across
merge_classes: set of classes to merge (None = all)
Returns:
list of merged (doc_id, class, word_indices_str)
"""
from collections import defaultdict
by_doc_class = defaultdict(list)
for doc_id, cls, pred_str in predictions:
words = sorted(int(x) for x in pred_str.split())
by_doc_class[(doc_id, cls)].append(words)
merged = []
for (doc_id, cls), spans in by_doc_class.items():
if merge_classes and cls not in merge_classes:
for s in spans:
merged.append((doc_id, cls, " ".join(map(str, s))))
continue
# Sort spans by start position
all_words = sorted(set(w for s in spans for w in s))
# Split into groups by gap
groups = [[all_words[0]]]
for w in all_words[1:]:
if w - groups[-1][-1] <= max_gap:
groups[-1].append(w)
else:
groups.append([w])
for g in groups:
merged.append((doc_id, cls, " ".join(map(str, g))))
return merged
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF