skills/tabular/prior-rebalancing-oversampling/SKILL.md
Rebalances training data by oversampling the majority class to match a known test-set class prior, reducing prediction miscalibration.
npx skillsauth add wenmin-wu/ds-skills tabular-prior-rebalancing-oversamplingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When training data has a different class ratio than the test set (e.g., 37% positive in train vs 16.5% in test), models trained on the raw distribution produce miscalibrated probabilities. Instead of post-hoc calibration, resample the training set to match the known test prior. This is especially effective for log-loss metrics where calibration directly affects the score.
import pandas as pd
import numpy as np
test_prior = 0.165 # known or estimated test positive rate
pos = X_train[y_train == 1]
neg = X_train[y_train == 0]
# Scale negatives up to match test prior
scale = (len(pos) / (len(pos) + len(neg))) / test_prior - 1
neg_resampled = pd.concat([neg] * int(scale) + [neg[:int((scale % 1) * len(neg))]])
X_train = pd.concat([pos, neg_resampled]).sample(frac=1, random_state=42)
y_train = np.array([1] * len(pos) + [0] * len(neg_resampled))
CalibratedClassifierCVdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF