skills/cv/combinatorial-deletion-point-matching/SKILL.md
Match two unequal-length point sets by enumerating which points to drop from the longer set, searching over deletion combinations
npx skillsauth add wenmin-wu/ds-skills cv-combinatorial-deletion-point-matchingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When detector output and ground-truth point sets differ in length (missing detections, extra false positives), standard Hungarian matching forces a 1-to-1 assignment and silently mis-pairs points. A cleaner approach: if the length difference is small, enumerate all possible ways to delete k = |A| - |B| points from the longer set, compute the distance for each subset, and pick the minimum. For C(N, k) combinations that blow up, sample a fixed max-iter budget (e.g. 2000). This handles false positives and missing detections in a single pass without tuning Hungarian cost matrices.
import itertools, random
import numpy as np
def norm_arr(a):
a = a - a.min()
return a / (a.max() + 1e-9)
def match_unequal(a1, a2, max_iter=2000):
"""a1 is the longer set. Returns (distance, deletion_indices)."""
len_diff = len(a1) - len(a2)
if len_diff < 0:
raise ValueError("a1 must be >= len(a2)")
a2n = norm_arr(np.asarray(a2, dtype=float))
if len_diff == 0:
return np.linalg.norm(norm_arr(np.asarray(a1, dtype=float)) - a2n), ()
del_list = list(itertools.combinations(range(len(a1)), len_diff))
if len(del_list) > max_iter:
del_list = random.sample(del_list, max_iter)
best = (float('inf'), None)
for idx in del_list:
kept = norm_arr(np.delete(a1, list(idx)).astype(float))
d = np.linalg.norm(kept - a2n)
if d < best[0]:
best = (d, idx)
return best
len_diff = |longer| - |shorter| — this is the number of deletions per candidateC(N, len_diff) deletion subsets; cap at max_iter via random samplingC(N, k) > max_iter, uniform sampling preserves the minimum well enough in practice for k ≤ 3.len_diff: if the detector drops >5 points, pre-filter by confidence before matching.data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF