skills/tabular/iterative-pseudo-label-refinement/SKILL.md
Multi-round pseudo labeling with progressively confident test predictions merged into training plus OOF-based train label correction
npx skillsauth add wenmin-wu/ds-skills tabular-iterative-pseudo-label-refinementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Standard pseudo labeling adds confident test predictions to training once. This technique iterates multiple rounds: each round retrains on the expanded dataset, produces better predictions, and raises the confidence bar. Additionally, it corrects noisy train labels using OOF predictions — if a train sample is predicted with extreme confidence opposite to its label, flip it. Each round typically gains 1-3% AUC.
import numpy as np
import pandas as pd
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.model_selection import StratifiedKFold
for itr in range(4):
# Add confident test predictions as pseudo labels
test['target'] = preds
test.loc[test['target'] > 0.955, 'target'] = 1
test.loc[test['target'] < 0.045, 'target'] = 0
usable = test[(test['target'] == 1) | (test['target'] == 0)]
new_train = pd.concat([train, usable]).reset_index(drop=True)
# Correct noisy train labels using OOF
new_train.loc[oof > 0.995, 'target'] = 1
new_train.loc[oof < 0.005, 'target'] = 0
# Retrain and collect new OOF + test predictions
oof = np.zeros(len(train))
preds = np.zeros(len(test))
# ... model training loop ...
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF