skills/cv/spatial-rect-train-val-split/SKILL.md
Split large-image segmentation data into train/val by spatial rectangle regions with border buffer exclusion to prevent patch leakage
npx skillsauth add wenmin-wu/ds-skills cv-spatial-rect-train-val-splitInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For large-image segmentation (satellite, medical, scroll fragments), random pixel splits leak spatial context between train and val. Instead, designate a rectangular region as validation and use everything else for training. Exclude a buffer zone around the rectangle boundary to prevent overlap between train and val patches.
import numpy as np
def spatial_split(mask, val_rect, buffer=32):
"""Split pixels into train/val by spatial region.
val_rect: (x, y, width, height)
"""
x, y, w, h = val_rect
valid = np.zeros_like(mask, dtype=bool)
valid[buffer:mask.shape[0]-buffer, buffer:mask.shape[1]-buffer] = True
valid &= mask.astype(bool)
val_region = np.zeros_like(mask, dtype=bool)
val_region[y:y+h, x:x+w] = True
val_pixels = np.argwhere(valid & val_region)
train_pixels = np.argwhere(valid & ~val_region)
return train_pixels, val_pixels
train_px, val_px = spatial_split(label_mask, val_rect=(1100, 3500, 700, 950))
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF