skills/tabular/row-aggregate-features/SKILL.md
Engineers row-wise statistical features (sum, mean, std, skew, kurtosis, median, min, max) across all numeric columns per sample.
npx skillsauth add wenmin-wu/ds-skills tabular-row-aggregate-featuresInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a dataset has many anonymous or homogeneous numeric columns (e.g., 200 var_0 to var_199), row-level aggregates summarize each sample's overall distribution. Tree models can use these to split on "samples with high variance" or "samples with extreme values" — patterns invisible to individual features.
import pandas as pd
feature_cols = [c for c in df.columns if c.startswith("var_")]
df["row_sum"] = df[feature_cols].sum(axis=1)
df["row_mean"] = df[feature_cols].mean(axis=1)
df["row_std"] = df[feature_cols].std(axis=1)
df["row_min"] = df[feature_cols].min(axis=1)
df["row_max"] = df[feature_cols].max(axis=1)
df["row_skew"] = df[feature_cols].skew(axis=1)
df["row_kurt"] = df[feature_cols].kurtosis(axis=1)
df["row_med"] = df[feature_cols].median(axis=1)
np.percentile(row, [25, 75]) can capture spread better than min/maxskipna=True (default) to handle missing values gracefullydata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF