skills/tabular/pseudo-labeling/SKILL.md
Augments training data with high-confidence test predictions as pseudo labels, retrains the model, and keeps the result only if OOF AUC improves. A semi-supervised technique for tabular competitions.
npx skillsauth add wenmin-wu/ds-skills tabular-pseudo-labelingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pseudo labeling leverages unlabeled test data by treating high-confidence model predictions as ground truth. After a base model is trained with KFold CV, test samples where the model is very confident (near 0 or 1) are added to training with their predicted labels. The model is retrained and the result is kept only if OOF AUC improves.
import xgboost as xgb
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import numpy as np
# 1. Train base model with KFold, collect OOF and test predictions
oof_preds, test_preds = train_kfold(X_train, y_train, X_test, n_splits=5)
base_auc = roc_auc_score(y_train, oof_preds)
# 2. Select high-confidence test samples
THRESHOLD = 0.995
mask_pos = test_preds > THRESHOLD
mask_neg = test_preds < (1 - THRESHOLD)
pseudo_mask = mask_pos | mask_neg
X_pseudo = X_test[pseudo_mask]
y_pseudo = (test_preds[pseudo_mask] > 0.5).astype(int)
# 3. Augment training set
X_aug = np.concatenate([X_train, X_pseudo])
y_aug = np.concatenate([y_train, y_pseudo])
# 4. Retrain and compare
oof_preds_aug, _ = train_kfold(X_aug, y_aug, X_test, n_splits=5)
new_auc = roc_auc_score(y_aug[:len(y_train)], oof_preds_aug[:len(y_train)])
# 5. Keep only if improved
if new_auc > base_auc:
print(f"Pseudo labeling improved AUC: {base_auc:.5f} -> {new_auc:.5f}")
| Decision | Guidance | |---|---| | Confidence threshold | Start high (0.995). Lower thresholds add more data but risk label noise. | | How many rounds | One round is usually enough. Iterating risks confirmation bias. | | Validation | Always compare AUC on the original train labels, not on pseudo labels. | | Class balance | Check that pseudo labels don't skew class ratio excessively. | | Model choice | Works best with gradient boosting (XGBoost, LightGBM, CatBoost). | | When to skip | If base model AUC is low (<0.80), pseudo labels are too noisy to help. |
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF