skills/nlp/text-overlap-features/SKILL.md
Computes n-gram overlap counts/ratios and NER entity overlap between reference and generated text as features.
npx skillsauth add wenmin-wu/ds-skills nlp-text-overlap-featuresInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For tasks comparing two texts (summary vs. source, answer vs. reference), compute lexical overlap at multiple granularities: word-level, bigram, trigram, and named entity overlap. These features capture surface-level relevance that complements transformer embeddings.
import spacy
from collections import Counter
nlp = spacy.load("en_core_web_sm")
def ngram_overlap(text_a, text_b, n=2):
tokens_a = text_a.lower().split()
tokens_b = text_b.lower().split()
ngrams_a = set(zip(*[tokens_a[i:] for i in range(n)]))
ngrams_b = set(zip(*[tokens_b[i:] for i in range(n)]))
overlap = ngrams_a & ngrams_b
ratio = len(overlap) / max(len(ngrams_b), 1)
return len(overlap), ratio
def ner_overlap(text_a, text_b):
ents_a = {(e.text, e.label_) for e in nlp(text_a).ents}
ents_b = {(e.text, e.label_) for e in nlp(text_b).ents}
common = ents_a & ents_b
return len(common), dict(Counter(e[1] for e in common))
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF