skills/tabular/transductive-train-test-transform/SKILL.md
Fit unsupervised transforms (scaler, PCA, variance filter) on combined train+test data for more stable statistics, especially on small datasets
npx skillsauth add wenmin-wu/ds-skills tabular-transductive-train-test-transformInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When the training set is small, fitting a scaler or dimensionality reduction on train alone produces noisy statistics. For unsupervised transforms (StandardScaler, PCA, VarianceThreshold, NMF) that don't use the target, fitting on combined train+test is safe and produces more stable estimates. This is especially valuable when data is partitioned into many small subgroups.
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import VarianceThreshold
from sklearn.pipeline import Pipeline
cols = [c for c in train.columns if c not in ['id', 'target', 'group']]
for g in train['group'].unique():
tr = train[train['group'] == g][cols]
te = test[test['group'] == g][cols]
combined = pd.concat([tr, te])
pipe = Pipeline([
('vt', VarianceThreshold(threshold=1.5)),
('scaler', StandardScaler())
])
combined_t = pipe.fit_transform(combined)
X_train = combined_t[:len(tr)]
X_test = combined_t[len(tr):]
# ... train model ...
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF