skills/tabular/frequency-encoding/SKILL.md
Adds each feature's value-count frequency as a new column, enabling tree models to split on how common or rare a value is.
npx skillsauth add wenmin-wu/ds-skills tabular-frequency-encodingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For categorical or high-cardinality numeric features, the raw value itself may be less informative than how often it appears. Frequency encoding replaces (or augments) each value with its count in the dataset. Tree models can then split on "values appearing fewer than N times" — a pattern that captures rarity without target leakage. Compute counts on train + real test combined for consistency.
import pandas as pd
def frequency_encode(train, test, col):
# Count on combined data for consistent frequencies
combined = pd.concat([train[[col]], test[[col]]], axis=0)
freq = combined[col].value_counts()
train[col + "_FE"] = train[col].map(freq)
test[col + "_FE"] = test[col].map(freq).fillna(0)
for col in feature_cols:
frequency_encode(train, test, col)
value_counts() for each feature_FE columnuint8/uint16 for memory savingsnp.log1p(freq) compresses skewed distributionsmax(freq)data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF