skills/cv/coverage-stratified-split/SKILL.md
Stratify train/validation split by binned mask coverage percentage to ensure balanced foreground representation in segmentation tasks
npx skillsauth add wenmin-wu/ds-skills cv-coverage-stratified-splitInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In segmentation tasks, naive random splits can produce folds with unbalanced foreground/background ratios — some folds get mostly empty masks, others get mostly full masks. Compute per-image mask coverage (foreground pixel ratio), bin into discrete classes, and use stratified splitting on these bins. This ensures each fold sees the full range of mask densities.
import numpy as np
from sklearn.model_selection import train_test_split
coverage = masks.sum(axis=(1, 2)) / (masks.shape[1] * masks.shape[2])
def coverage_to_class(val):
for i in range(0, 11):
if val * 10 <= i:
return i
return 10
coverage_classes = np.array([coverage_to_class(c) for c in coverage])
X_train, X_val, y_train, y_val = train_test_split(
images, masks, test_size=0.2,
stratify=coverage_classes, random_state=42
)
stratify parameter in train_test_split or StratifiedKFoldStratifiedKFold(n_splits=5).split(X, coverage_classes) for cross-validationdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF