skills/cv/multilabel-auc-evaluation/SKILL.md
Computes per-class ROC-AUC then macro-averages for multi-label classification evaluation and model selection.
npx skillsauth add wenmin-wu/ds-skills cv-multilabel-auc-evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For multi-label classification (multiple binary labels per sample), compute ROC-AUC independently for each label, then average. This macro-averaged AUC treats rare and common labels equally, preventing the model from ignoring minority classes. Use as the primary metric for model selection and early stopping.
import numpy as np
from sklearn.metrics import roc_auc_score
def multilabel_auc(y_true, y_pred, label_names=None):
"""Compute per-label and macro-averaged AUC.
Args:
y_true: array (n_samples, n_labels), binary ground truth
y_pred: array (n_samples, n_labels), predicted probabilities
label_names: optional list of label names for reporting
"""
per_label = {}
for i in range(y_true.shape[1]):
# Skip labels with single class in this split
if len(np.unique(y_true[:, i])) < 2:
continue
name = label_names[i] if label_names else f'label_{i}'
per_label[name] = roc_auc_score(y_true[:, i], y_pred[:, i])
macro_auc = np.mean(list(per_label.values()))
return macro_auc, per_label
# Usage in validation loop
y_preds = model(images).sigmoid().cpu().numpy()
macro_auc, per_label = multilabel_auc(y_true, y_preds, target_cols)
print(f"Macro AUC: {macro_auc:.4f}")
for name, auc in per_label.items():
print(f" {name}: {auc:.4f}")
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF