skills/tabular/per-partition-variance-filtering/SKILL.md
Apply VarianceThreshold within each data partition on combined train+test to select informative features per subgroup
npx skillsauth add wenmin-wu/ds-skills tabular-per-partition-variance-filteringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a dataset contains a categorical variable that defines distinct subpopulations, features that are informative in one partition may be constant (zero-variance) in another. Applying VarianceThreshold per partition instead of globally selects the right feature subset for each subgroup. Fitting on combined train+test (transductive) gives a more stable variance estimate, especially for small partitions.
from sklearn.feature_selection import VarianceThreshold
import pandas as pd
import numpy as np
cols = [c for c in train.columns if c not in ['id', 'target', 'group']]
oof = np.zeros(len(train))
preds = np.zeros(len(test))
for g in train['group'].unique():
tr = train[train['group'] == g]
te = test[test['group'] == g]
combined = pd.concat([tr[cols], te[cols]])
sel = VarianceThreshold(threshold=1.5)
combined_t = sel.fit_transform(combined)
X_train = combined_t[:len(tr)]
X_test = combined_t[len(tr):]
# ... train model on X_train, predict X_test ...
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF