skills/cv/dot-annotation-blob-diff-extraction/SKILL.md
Recover (x, y, class) point labels from color-coded dot-annotation image pairs via absdiff + blackout masking + Laplacian-of-Gaussian blob detection + center-pixel RGB classification
npx skillsauth add wenmin-wu/ds-skills cv-dot-annotation-blob-diff-extractionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Many wildlife/cell/object-counting datasets ship annotations as a second copy of the image with colored dots painted over each instance. You have to recover the point list yourself. The canonical recipe is four steps: cv2.absdiff(dotted, raw) to isolate the dots, bitwise-mask out any blacked-out exclusion regions present in either image, run skimage.feature.blob_log with tight sigma bounds matched to the known dot radius, then classify each blob by reading the centroid pixel color from the dotted image (not the diff — the diff desaturates the color). Used in NOAA Steller Sea Lion Population Count top kernels.
import cv2, numpy as np, skimage.feature
img_raw = cv2.imread(raw_path)
img_dot = cv2.imread(dotted_path)
diff = cv2.absdiff(img_dot, img_raw)
# mask out blacked-out regions (annotator exclusions) from either image
m1 = cv2.cvtColor(img_dot, cv2.COLOR_BGR2GRAY); m1[m1 < 20] = 0; m1[m1 > 0] = 255
m2 = cv2.cvtColor(img_raw, cv2.COLOR_BGR2GRAY); m2[m2 < 20] = 0; m2[m2 > 0] = 255
diff = cv2.bitwise_or(diff, diff, mask=m1)
diff = cv2.bitwise_or(diff, diff, mask=m2)
gray = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
blobs = skimage.feature.blob_log(gray, min_sigma=3, max_sigma=4, num_sigma=1, threshold=0.02)
points = []
for y, x, _ in blobs:
b, g, r = img_dot[int(y), int(x)]
if r > 200 and b < 50 and g < 50: cls = 'adult_male'
elif r > 200 and b > 200 and g < 50: cls = 'subadult_male'
elif r < 100 and g > 100 and b < 100: cls = 'juvenile'
elif r < 100 and g < 100 and 150 < b < 200: cls = 'pup'
elif r < 150 and g < 50 and b < 100: cls = 'adult_female'
else: continue
points.append((int(x), int(y), cls))
cv2.absdiff(dotted, raw) to leave only the dot pixelsblob_log with min/max_sigma bracketing the expected dot radius and num_sigma=1 for speederror bucket to quantify noisenum_sigma=1 with tight sigma range: single-scale LoG is fast and precise when dot radius is fixed.img_dot, not diff: the diff fades saturated colors, wrecking the decision tree.error fallback: catches out-of-gamut dots and quantifies labeling noise rather than silently mislabeling.data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF