skills/tabular/distribution-matching-postprocess/SKILL.md
Reshapes model predictions to match the known label distribution from training data using rank-based mapping.
npx skillsauth add wenmin-wu/ds-skills tabular-distribution-matching-postprocessInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When the target distribution is known (e.g., from training labels), reshape raw predictions to match it. Convert predictions to ranks, map rank positions to the corresponding quantiles of the target distribution. This corrects systematic distribution shifts without retraining.
import numpy as np
from scipy.stats import rankdata
def distribution_match(preds, target_distribution):
"""Map predictions to match target label distribution.
Args:
preds: array of raw predictions, shape (n_samples,)
target_distribution: sorted array of target values to match
Returns:
array with predictions reshaped to target distribution
"""
ranks = rankdata(preds, method="ordinal") - 1
n_preds = len(preds)
n_target = len(target_distribution)
indices = (ranks * (n_target - 1)) // (n_preds - 1)
return target_distribution[indices.astype(int)]
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF