skills/cv/quantile-threshold-prevalence-matching/SKILL.md
Sets the binary classification threshold as a prediction quantile matching the expected positive prevalence rate, avoiding manual threshold tuning.
npx skillsauth add wenmin-wu/ds-skills cv-quantile-threshold-prevalence-matchingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When the positive class prevalence is known (e.g., ~2% cancer rate from training data or domain knowledge), the classification threshold can be set as the (1 - prevalence) quantile of test predictions. This automatically adapts to the model's calibration: a well-calibrated model's 98th percentile roughly separates the top 2%. No validation set needed for threshold tuning — useful when the test distribution is expected to match training prevalence.
import numpy as np
import pandas as pd
def prevalence_threshold(predictions, prevalence_rate=0.02):
"""Set threshold as quantile matching expected positive rate."""
quantile = 1.0 - prevalence_rate
threshold = np.quantile(predictions, quantile)
return threshold
# Aggregate image-level predictions to patient level
pred_df = pd.DataFrame({
'prediction_id': prediction_ids,
'cancer_prob': image_predictions,
})
patient_preds = pred_df.groupby('prediction_id')['cancer_prob'].mean()
# Set threshold to match ~2% positive rate
threshold = prevalence_threshold(patient_preds.values, prevalence_rate=0.02)
binary_preds = (patient_preds > threshold).astype(int)
print(f"Threshold: {threshold:.4f}")
print(f"Positive rate: {binary_preds.mean():.3%}")
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF