skills/cv/class-balanced-epoch-undersampling/SKILL.md
Dynamically undersample majority class each epoch with per-class keep probabilities for stochastic balance
npx skillsauth add wenmin-wu/ds-skills cv-class-balanced-epoch-undersamplingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Instead of fixed undersampling (which discards data permanently), resample at the start of each epoch using per-class keep probabilities. Negative samples are randomly dropped with probability p, so the model sees different negative subsets each epoch while maintaining approximate class balance. This preserves data diversity across epochs while controlling imbalance.
import numpy as np
class BalancedGenerator:
def __init__(self, ids, labels, batch_size, keep_probs=None):
self.ids = ids
self.labels = labels
self.batch_size = batch_size
self.keep_probs = keep_probs or {0: 0.35, 1: 0.5}
def on_epoch_end(self):
keep_prob = self.labels.map(self.keep_probs)
keep = keep_prob > np.random.rand(len(keep_prob))
self.indices = np.where(keep)[0]
np.random.shuffle(self.indices)
def __len__(self):
return len(self.indices) // self.batch_size
def __getitem__(self, idx):
batch_ids = self.indices[idx * self.batch_size:(idx + 1) * self.batch_size]
return self._load_batch(batch_ids)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF