skills/cv/per-label-platt-isotonic-calibration/SKILL.md
Fit a per-label probability calibrator on out-of-fold scores using Platt scaling (logistic regression on raw scores) and fall back to isotonic regression for labels where the logistic doesn't converge — pickle the dict of fitted calibrators and apply at inference for a small but free leaderboard lift on multi-label classification
npx skillsauth add wenmin-wu/ds-skills cv-per-label-platt-isotonic-calibrationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Multi-label deep nets are systematically miscalibrated: rare labels are pushed to extreme low probabilities, common labels saturate at high ones. Calibration fixes this without retraining. The recipe: collect out-of-fold scores per label across the training set, fit a LogisticRegression per label (Platt scaling), and gracefully fall back to IsotonicRegression for labels where Platt fails (constant scores, single-class folds, divergence). Save the dict {label: ('platt'|'isotonic', model)} to disk and apply at inference. The lift is usually 0.001-0.003 on macro metrics but it's free, deterministic, and stacks with all other tricks.
import numpy as np, joblib
from sklearn.linear_model import LogisticRegression
from sklearn.isotonic import IsotonicRegression
def fit_calibrators(oof_df, gt_df, label_cols):
cal = {}
for col in label_cols:
s = oof_df[col].values
y = gt_df[col].values
if np.unique(y).size < 2 or np.allclose(s, s[0]):
cal[col] = None
continue
try:
lr = LogisticRegression(max_iter=2000)
lr.fit(s.reshape(-1, 1), y)
cal[col] = ('platt', lr)
except Exception:
iso = IsotonicRegression(out_of_bounds='clip')
iso.fit(s, y)
cal[col] = ('isotonic', iso)
return cal
def apply_calibrators(scores, cal, label_cols):
out = scores.copy()
for i, col in enumerate(label_cols):
c = cal.get(col)
if c is None: continue
kind, m = c
if kind == 'platt':
out[:, i] = m.predict_proba(scores[:, i].reshape(-1, 1))[:, 1]
else:
out[:, i] = m.transform(scores[:, i])
return out
None and pass scores through unchanged.data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF