skills/nlp/fill-mask-entity-classification/SKILL.md
Uses a fine-tuned masked language model to classify candidate entity spans by comparing fill-mask probabilities for positive vs negative marker tokens.
npx skillsauth add wenmin-wu/ds-skills nlp-fill-mask-entity-classificationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Instead of training a separate classifier for entity type detection, leverage masked language modeling: replace a candidate entity span with a [MASK] token, then compare the model's fill-mask probabilities for a positive marker (e.g., "$") vs a negative marker (e.g., "#"). If the positive marker is more likely, the span is classified as an entity. Fine-tune the MLM on domain text where markers replace known entities vs non-entities.
from transformers import pipeline
ENTITY_SYMBOL = "$"
NON_ENTITY_SYMBOL = "#"
mlm = pipeline("fill-mask", model="fine-tuned-mlm-model", device=0)
def classify_candidates(sentences_with_mask):
results = mlm(sentences_with_mask, targets=[f" {ENTITY_SYMBOL}", f" {NON_ENTITY_SYMBOL}"])
predictions = []
for (pos_result, neg_result), phrase in zip(results, candidate_phrases):
if pos_result["score"] > neg_result["score"] * 2:
predictions.append(phrase)
return predictions
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF