skills/tabular/rank-calibrated-blending/SKILL.md
Blends predictions from multiple models by converting to ranks, weighting, and calibrating back to probabilities via rank-group means from a reference model. Ensures monotonic calibrated output.
npx skillsauth add wenmin-wu/ds-skills tabular-rank-calibrated-blendingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Standard prediction averaging can produce poorly calibrated blends when models have different score distributions. Rank-calibrated blending sidesteps this by operating in rank space, then mapping back to calibrated probabilities using a reference model's predictions grouped by rank bucket.
import numpy as np
from scipy.stats import rankdata
# 1. Rank each model's predictions
rank_a = rankdata(pred_xgb)
rank_b = rankdata(pred_cb)
# 2. Weighted blend in rank space
rank_blend = rank_a * 0.99 + rank_b * 0.01
# 3. Calibrate: map rank groups to reference model's mean prediction
n_bins = 1000
rank_bins = np.digitize(rank_blend, np.linspace(rank_blend.min(), rank_blend.max(), n_bins))
calibrated = np.zeros_like(rank_blend)
for b in np.unique(rank_bins):
mask = rank_bins == b
calibrated[mask] = pred_xgb[mask].mean() # reference model = xgb
# 4. Enforce monotonicity via isotonic regression (optional)
from sklearn.isotonic import IsotonicRegression
calibrated = IsotonicRegression(out_of_bounds='clip').fit_transform(rank_blend, calibrated)
scipy.stats.rankdata.| Decision | Guidance |
|---|---|
| Weight selection | Start with the better model at ~0.99, minor model at ~0.01. Tune on CV. |
| Number of bins | 500-1000 bins works well for ~100k samples. More data = more bins. |
| Reference model | Use the single best model (highest CV AUC) as the calibration reference. |
| Monotonicity | Always enforce via IsotonicRegression -- rank-to-probability must be non-decreasing. |
| When to use | When models have similar AUC but different score distributions or ranking behavior. |
scipy.stats.rankdata documentationsklearn.isotonic.IsotonicRegression documentationdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF