skills/tabular/predicted-class-mass-reweighting/SKILL.md
Post-hoc rescales ensemble probabilities by the inverse of each class's estimated total mass across the test set, correcting for class imbalance in predictions.
npx skillsauth add wenmin-wu/ds-skills tabular-predicted-class-mass-reweightingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Even after ensembling, predicted probabilities can be systematically biased toward the majority class — the model assigns too much total probability mass to class 0 and too little to class 1. This technique estimates each class's total mass across all test predictions (sum of predicted probabilities), then rescales each sample's probabilities by the inverse of its class mass. This shifts the calibration so the predicted class distribution better matches the expected true distribution, improving balanced metrics like balanced log loss.
import numpy as np
def reweight_by_class_mass(probs):
"""Rescale probabilities by inverse class mass.
Args:
probs: (n_samples, n_classes) predicted probabilities
Returns:
Reweighted and renormalized probabilities
"""
class_mass = probs.sum(axis=0) # total mass per class
inv_mass = 1.0 / class_mass
reweighted = probs * inv_mass[np.newaxis, :]
# Renormalize to sum to 1 per sample
reweighted /= reweighted.sum(axis=1, keepdims=True)
return reweighted
# After ensemble averaging
ensemble_probs = np.mean([m.predict_proba(X_test) for m in models], axis=0)
calibrated_probs = reweight_by_class_mass(ensemble_probs)
1 / class_massdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF