skills/tabular/multiclass-to-binary-collapse/SKILL.md
Trains on a finer-grained multiclass target (subtypes), then collapses non-baseline classes into a single positive class for binary submission.
npx skillsauth add wenmin-wu/ds-skills tabular-multiclass-to-binary-collapseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When auxiliary labels provide finer-grained categories than the competition's binary target (e.g., disease subtypes A/B/C/D vs healthy/sick), training on the multiclass target gives the model more signal to learn decision boundaries. At prediction time, sum the probabilities of all non-baseline classes to get the binary positive probability. This typically improves binary classification by 0.5-2% because the model learns distinct patterns per subtype instead of lumping them together.
import numpy as np
from sklearn.model_selection import StratifiedKFold
# y_multi has subtypes: ['A', 'B', 'C', 'D', 'E']
# 'A' = healthy, 'B','C','D','E' = different conditions
# Binary target: class_0 = 'A', class_1 = everything else
# Train on multiclass target
model.fit(X_train, y_multi_train)
# Predict multiclass probabilities
probs = model.predict_proba(X_test) # shape: (n, 5)
# Collapse: class_0 = P(A), class_1 = P(B) + P(C) + P(D) + P(E)
class_0_prob = probs[:, 0] # baseline class
class_1_prob = probs[:, 1:].sum(axis=1) # all non-baseline
submission['class_0'] = class_0_prob
submission['class_1'] = class_1_prob
model.classes_ — verify alignmentdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF