skills/cv/iou-weighted-assignment-metric/SKILL.md
Evaluation scorer that merges predictions with GT per frame, takes top-IoU match per GT, and computes weighted accuracy with IoU threshold gate
npx skillsauth add wenmin-wu/ds-skills cv-iou-weighted-assignment-metricInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For multi-object ID assignment problems (NFL helmets, pedestrians, cells), the natural metric is "did the predicted ID match the GT ID, for the correct box?". Counting IoU alone misses identity errors, counting ID alone misses localization errors. The combined metric: for each GT box per frame, find the predicted box with the highest IoU, gate by an IoU threshold (e.g. 0.35), check ID equality, then compute weighted accuracy where high-importance rows (impact plays, critical events) get a 1000× weight. The vectorized IoU computation makes it fast enough to run on every training step.
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
def vectorized_iou(df):
ixmin = df[['x1_sub','x1_gt']].max(axis=1)
iymin = df[['y1_sub','y1_gt']].max(axis=1)
ixmax = df[['x2_sub','x2_gt']].min(axis=1)
iymax = df[['y2_sub','y2_gt']].min(axis=1)
iw = np.maximum(ixmax - ixmin + 1, 0)
ih = np.maximum(iymax - iymin + 1, 0)
inter = iw * ih
area_sub = (df['x2_sub']-df['x1_sub']+1)*(df['y2_sub']-df['y1_sub']+1)
area_gt = (df['x2_gt' ]-df['x1_gt' ]+1)*(df['y2_gt' ]-df['y1_gt' ]+1)
df['iou'] = inter / (area_sub + area_gt - inter)
return df
def score(sub, gt, iou_thr=0.35, impact_weight=1000):
merged = gt.merge(sub, on='video_frame', suffixes=('_gt','_sub'))
merged = vectorized_iou(merged)
top = (merged.sort_values('iou', ascending=False)
.groupby(['video_frame','label_gt']).first().reset_index())
top['correct'] = (top['label_gt'] == top['label_sub']) & (top['iou'] >= iou_thr)
top['weight'] = np.where(top['isImpact'], impact_weight, 1)
return accuracy_score(np.ones(len(top)), top['correct'],
sample_weight=top['weight'])
(frame, gt_label) — this is the GT's best matchcorrect = (label matches) AND (iou ≥ threshold)data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF