skills/tabular/majority-vote-submission-blend/SKILL.md
Blend multiple submission CSVs by row-wise majority voting on discrete predictions to produce a more robust final output
npx skillsauth add wenmin-wu/ds-skills tabular-majority-vote-submission-blendInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When you have multiple independently trained pipelines producing discrete class predictions, majority voting is the simplest and most robust blending strategy. Unlike rank averaging or weighted blending (which require continuous scores), majority voting works directly on integer labels and is especially effective for ordinal classification where averaging can land between classes.
import pandas as pd
def majority_vote_blend(submission_paths, id_col='id', target_col='target'):
subs = [pd.read_csv(p).sort_values(id_col).reset_index(drop=True)
for p in submission_paths]
combined = pd.DataFrame({id_col: subs[0][id_col]})
for i, s in enumerate(subs):
combined[f'pred_{i}'] = s[target_col].values
pred_cols = [c for c in combined.columns if c.startswith('pred_')]
combined[target_col] = (combined[pred_cols]
.mode(axis=1)[0].astype(int))
return combined[[id_col, target_col]]
final = majority_vote_blend([
'sub_lgbm.csv', 'sub_xgb.csv', 'sub_catboost.csv'
])
final.to_csv('submission.csv', index=False)
mode() across predictionsmode()[0] takes the lowest class — add a tiebreaker model to avoid thispd.Series([preds]).value_counts() with weights for unequal-quality modelsdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF