skills/nlp/domain-special-token-embedding/SKILL.md
Add domain-specific categorical values as new special tokens, resize embeddings, and prepend them to input so the model learns domain-aware representations
npx skillsauth add wenmin-wu/ds-skills nlp-domain-special-token-embeddingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a categorical feature (patent section, medical specialty, product category) strongly conditions the semantics of the text, encode it as a new special token rather than a text prefix. The model learns a dedicated embedding vector for each category that participates in self-attention. More parameter-efficient and expressive than prepending the category name as text.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-v3-base')
categories = ['[A]', '[B]', '[C]', '[D]', '[E]', '[F]', '[G]', '[H]']
tokenizer.add_special_tokens({'additional_special_tokens': categories})
model = AutoModelForSequenceClassification.from_pretrained(
'microsoft/deberta-v3-base', num_labels=1)
model.resize_token_embeddings(len(tokenizer))
df['input'] = df['category_token'] + ' ' + df['text_field_1'] + ' [SEP] ' + df['text_field_2']
[A], [B], etc.additional_special_tokens to the tokenizermodel.resize_token_embeddings(len(tokenizer))[X] to avoid collision with existing vocab — tokenizer treats brackets as specialdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF