skills/tabular/brand-name-recovery-from-title/SKILL.md
Recover missing categorical values by matching words in a related text field against a known vocabulary built from the full dataset
npx skillsauth add wenmin-wu/ds-skills tabular-brand-name-recovery-from-titleInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a high-cardinality categorical (e.g. brand) has many missing values but a related free-text field (e.g. product title) often contains the category value, recover missing entries by matching title words against the known vocabulary of that categorical. This is a lightweight, domain-agnostic imputation that can recover a large fraction of missing values without a model.
import pandas as pd
full = pd.concat([train, test])
known_brands = set(full['brand_name'].dropna().unique())
def recover_brand(row):
if pd.isna(row['brand_name']) or row['brand_name'] == 'missing':
for word in row['name'].split():
if word in known_brands:
return word
return row['brand_name']
train['brand_name'] = train.apply(recover_brand, axis=1)
test['brand_name'] = test.apply(recover_brand, axis=1)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF