skills/cv/soft-threshold-count-aggregation/SKILL.md
Aggregate patch-level count-regression predictions into image-level totals by multiplying each prediction by a boolean mask (x * (x > tau)) instead of rounding, retaining fractional evidence above threshold while killing background noise
npx skillsauth add wenmin-wu/ds-skills cv-soft-threshold-count-aggregationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Patch count-regression produces noisy fractional outputs everywhere — even empty patches return small positive values that sum to spurious counts over thousands of tiles. Hard rounding throws away real signal on patches with 0.4 objects. The right aggregation is soft thresholding: preds * (preds > tau) zeros out any patch below tau with a boolean mask but keeps the full fractional value for surviving patches, then sums. This is not the same as preds.clip(tau) or hard rounding — it's a masked sum that preserves the confidence signal.
import numpy as np
preds = model.predict(test_patches) # (n_patches, n_classes) float
tau = 0.30 # tuned on validation
counts = np.sum(preds * (preds > tau), axis=0).astype('int')
# equivalent, explicit:
# counts = np.where(preds > tau, preds, 0).sum(axis=0).round().astype('int')
tau over [0.1, 0.2, 0.3, 0.4, 0.5] on a held-out validation settau (per-class if needed) minimizing validation RMSE against ground-truth image counts(preds > tau) and sum across patchesnp.round(preds).sum() discards 0.4-per-patch signal that legitimately sums to objects; soft mask keeps it.tau: rare classes want lower thresholds to not zero out their weak-but-real signal; dense classes want higher to suppress noise.preds.clip(tau) would floor everything at tau, inflating counts. The mask-and-keep is the right operator.data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF