skills/nlp/taxonomy-context-enrichment/SKILL.md
Enrich model input by mapping categorical codes to human-readable taxonomy descriptions and concatenating them as context for transformer models
npx skillsauth add wenmin-wu/ds-skills nlp-taxonomy-context-enrichmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When inputs contain categorical codes (patent CPC codes, ICD medical codes, industry SIC codes), the raw code string carries no semantic information for a language model. Map each code to its human-readable description from an external taxonomy and concatenate it as context. This gives the model domain knowledge without fine-tuning on code semantics.
import pandas as pd
taxonomy = pd.read_csv('taxonomy_descriptions.csv') # code → title
df = df.merge(taxonomy, left_on='category_code', right_on='code', how='left')
df['text'] = df['taxonomy_title'] + '[SEP]' + df['anchor'] + '[SEP]' + df['target']
tokenized = tokenizer(df['text'].tolist(), truncation=True,
max_length=128, padding=True)
[SEP] delimiters between fieldsmax_length or use the adaptive max-length patterndata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF