skills/tabular/relative-deviation-features/SKILL.md
Computes differences and ratios between group-level aggregates and raw values to capture how each sample deviates from its group.
npx skillsauth add wenmin-wu/ds-skills tabular-relative-deviation-featuresInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
After computing group aggregates (mean, max, min, std), create features that measure how each row deviates from its group. The difference (value - group_mean) captures absolute deviation; the ratio (value / group_mean) captures relative deviation. These features help tree models split on "is this value unusual for its group?" rather than just raw magnitude.
import pandas as pd
def add_relative_features(df, group_cols, value_col):
"""Add deviation features: diff and ratio vs group stats."""
prefix = '_'.join(group_cols) + '_' + value_col
grp_mean = df.groupby(group_cols)[value_col].transform('mean')
grp_std = df.groupby(group_cols)[value_col].transform('std')
grp_max = df.groupby(group_cols)[value_col].transform('max')
grp_min = df.groupby(group_cols)[value_col].transform('min')
df[f'{prefix}_mean_diff'] = df[value_col] - grp_mean
df[f'{prefix}_mean_ratio'] = df[value_col] / grp_mean.clip(lower=1e-9)
df[f'{prefix}_max_diff'] = grp_max - df[value_col]
df[f'{prefix}_zscore'] = (df[value_col] - grp_mean) / grp_std.clip(lower=1e-9)
df[f'{prefix}_range_pos'] = (df[value_col] - grp_min) / (grp_max - grp_min).clip(lower=1e-9)
return df
# Usage
df = add_relative_features(df, ['molecule_name', 'atom_index_0'], 'dist')
df = add_relative_features(df, ['molecule_name'], 'electronegativity')
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF