skills/computer-vision-pipeline/SKILL.md
Build production computer vision pipelines for object detection, tracking, and video analysis. Handles drone footage, wildlife monitoring, and real-time detection. Supports YOLO, Detectron2, TensorFlow, PyTorch. Use for archaeological surveys, conservation, security. Activate on "object detection", "video analysis", "YOLO", "tracking", "drone footage". NOT for simple image filters, photo editing, or face recognition APIs.
npx skillsauth add curiositech/windags-skills computer-vision-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert in building production-ready computer vision systems for object detection, tracking, and video analysis.
✅ Use for:
❌ NOT for:
| Model | Speed (FPS) | Accuracy (mAP) | Use Case | |-------|-------------|----------------|----------| | YOLOv8 | 140 | 53.9% | Real-time detection | | Detectron2 | 25 | 58.7% | High accuracy, research | | EfficientDet | 35 | 55.1% | Mobile deployment | | Faster R-CNN | 10 | 42.0% | Legacy systems |
Timeline:
Decision tree:
Need real-time (>30 FPS)? → YOLOv8
Need highest accuracy? → Detectron2 Mask R-CNN
Need mobile deployment? → YOLOv8-nano or EfficientDet
Need instance segmentation? → Detectron2 or YOLOv8-seg
Need custom objects? → Fine-tune YOLOv8
Novice thinking: "Just run detection on raw video frames"
Problem: Poor detection accuracy, wasted GPU cycles.
Wrong approach:
# ❌ No preprocessing - poor results
import cv2
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')
while True:
ret, frame = video.read()
if not ret:
break
# Raw frame detection - no normalization, no resizing
results = model(frame)
# Poor accuracy, slow inference
Why wrong:
Correct approach:
# ✅ Proper preprocessing pipeline
import cv2
import numpy as np
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')
# Model expects 640x640 input
TARGET_SIZE = 640
def preprocess_frame(frame):
# Resize while maintaining aspect ratio
h, w = frame.shape[:2]
scale = TARGET_SIZE / max(h, w)
new_w, new_h = int(w * scale), int(h * scale)
resized = cv2.resize(frame, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
# Pad to square
pad_w = (TARGET_SIZE - new_w) // 2
pad_h = (TARGET_SIZE - new_h) // 2
padded = cv2.copyMakeBorder(
resized,
pad_h, TARGET_SIZE - new_h - pad_h,
pad_w, TARGET_SIZE - new_w - pad_w,
cv2.BORDER_CONSTANT,
value=(114, 114, 114) # Gray padding
)
# Normalize to 0-1 (if model expects it)
# normalized = padded.astype(np.float32) / 255.0
return padded, scale
while True:
ret, frame = video.read()
if not ret:
break
preprocessed, scale = preprocess_frame(frame)
results = model(preprocessed)
# Scale bounding boxes back to original coordinates
for box in results[0].boxes:
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = x1/scale, y1/scale, x2/scale, y2/scale
Performance comparison:
Timeline context:
Novice thinking: "Run detection on every single frame"
Problem: 99% of frames are redundant, wasting compute.
Wrong approach:
# ❌ Process every frame (30 FPS video = 1800 frames/min)
import cv2
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')
detections = []
while True:
ret, frame = video.read()
if not ret:
break
# Run detection on EVERY frame
results = model(frame)
detections.append(results)
# 10-minute video = 18,000 inferences (15 minutes on GPU)
Why wrong:
Correct approach 1: Frame sampling
# ✅ Sample every Nth frame
import cv2
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')
SAMPLE_RATE = 30 # Process 1 frame per second (if 30 FPS video)
frame_count = 0
detections = []
while True:
ret, frame = video.read()
if not ret:
break
frame_count += 1
# Only process every 30th frame
if frame_count % SAMPLE_RATE == 0:
results = model(frame)
detections.append({
'frame': frame_count,
'timestamp': frame_count / 30.0,
'results': results
})
# 10-minute video = 600 inferences (30 seconds on GPU)
Correct approach 2: Adaptive sampling with scene change detection
# ✅ Only process when scene changes significantly
import cv2
import numpy as np
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')
def scene_changed(prev_frame, curr_frame, threshold=0.3):
"""Detect scene change using histogram comparison"""
if prev_frame is None:
return True
# Convert to grayscale
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
# Calculate histograms
prev_hist = cv2.calcHist([prev_gray], [0], None, [256], [0, 256])
curr_hist = cv2.calcHist([curr_gray], [0], None, [256], [0, 256])
# Compare histograms
correlation = cv2.compareHist(prev_hist, curr_hist, cv2.HISTCMP_CORREL)
return correlation < (1 - threshold)
prev_frame = None
detections = []
while True:
ret, frame = video.read()
if not ret:
break
# Only run detection if scene changed
if scene_changed(prev_frame, frame):
results = model(frame)
detections.append(results)
prev_frame = frame.copy()
# Adapts to video content - static shots skip frames, action scenes process more
Savings:
Novice thinking: "Process one image at a time"
Problem: GPU sits idle 80% of the time waiting for data.
Wrong approach:
# ❌ Sequential processing - GPU underutilized
import cv2
from ultralytics import YOLO
import time
model = YOLO('yolov8n.pt')
# 100 images to process
image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]
start = time.time()
for path in image_paths:
frame = cv2.imread(path)
results = model(frame) # Process one at a time
# GPU utilization: ~20%
elapsed = time.time() - start
print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
# Output: 45 seconds
Why wrong:
Correct approach:
# ✅ Batch inference - GPU fully utilized
import cv2
from ultralytics import YOLO
import time
model = YOLO('yolov8n.pt')
image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]
BATCH_SIZE = 16 # Process 16 images at once
start = time.time()
for i in range(0, len(image_paths), BATCH_SIZE):
batch_paths = image_paths[i:i+BATCH_SIZE]
# Load batch
frames = [cv2.imread(path) for path in batch_paths]
# Batch inference (single GPU call)
results = model(frames) # Pass list of images
# GPU utilization: ~85%
elapsed = time.time() - start
print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
# Output: 8 seconds (5.6x faster!)
Performance comparison: | Method | Time (100 images) | GPU Util | Throughput | |--------|-------------------|----------|------------| | Sequential | 45s | 20% | 2.2 img/s | | Batch (16) | 8s | 85% | 12.5 img/s | | Batch (32) | 6s | 92% | 16.7 img/s |
Batch size tuning:
# Find optimal batch size for your GPU
import torch
def find_optimal_batch_size(model, image_size=(640, 640)):
for batch_size in [1, 2, 4, 8, 16, 32, 64]:
try:
dummy_input = torch.randn(batch_size, 3, *image_size).cuda()
start = time.time()
with torch.no_grad():
_ = model(dummy_input)
elapsed = time.time() - start
throughput = batch_size / elapsed
print(f"Batch {batch_size}: {throughput:.1f} img/s")
except RuntimeError as e:
print(f"Batch {batch_size}: OOM (out of memory)")
break
# Find optimal batch size before production
find_optimal_batch_size(model)
Problem: Duplicate detections, missed objects, slow post-processing.
Wrong approach:
# ❌ Use default NMS settings for everything
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
# Default settings (iou_threshold=0.45, conf_threshold=0.25)
results = model('crowded_scene.jpg')
# Result: 50 bounding boxes, 30 are duplicates!
Why wrong:
Correct approach:
# ✅ Tune NMS for your use case
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
# Sparse objects (dolphins in ocean)
sparse_results = model(
'ocean_footage.jpg',
iou=0.5, # Higher IoU = allow closer boxes
conf=0.4 # Higher confidence = fewer false positives
)
# Dense objects (crowd, flock of birds)
dense_results = model(
'crowded_scene.jpg',
iou=0.3, # Lower IoU = suppress more duplicates
conf=0.5 # Higher confidence = filter noise
)
# High precision needed (legal evidence)
precise_results = model(
'evidence.jpg',
iou=0.5,
conf=0.7, # Very high confidence
max_det=50 # Limit max detections
)
NMS parameter guide: | Use Case | IoU | Conf | Max Det | |----------|-----|------|---------| | Sparse objects (wildlife) | 0.5 | 0.4 | 100 | | Dense objects (crowd) | 0.3 | 0.5 | 300 | | High precision (evidence) | 0.5 | 0.7 | 50 | | Real-time (speed priority) | 0.45 | 0.3 | 100 |
Novice thinking: "Run detection on each frame independently"
Problem: Can't count unique objects, track movement, or build trajectories.
Wrong approach:
# ❌ Independent frame detection - no object identity
from ultralytics import YOLO
import cv2
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('dolphins.mp4')
detections = []
while True:
ret, frame = video.read()
if not ret:
break
results = model(frame)
detections.append(results)
# Result: Can't tell if frame 10 dolphin is same as frame 20 dolphin
# Can't count unique dolphins
# Can't track trajectories
Why wrong:
Correct approach: Use tracking (ByteTrack)
# ✅ Multi-object tracking with ByteTrack
from ultralytics import YOLO
import cv2
# YOLO with tracking
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('dolphins.mp4')
# Track objects across frames
tracks = {}
while True:
ret, frame = video.read()
if not ret:
break
# Run detection + tracking
results = model.track(
frame,
persist=True, # Maintain IDs across frames
tracker='bytetrack.yaml' # ByteTrack algorithm
)
# Each detection now has persistent ID
for box in results[0].boxes:
track_id = int(box.id[0]) # Unique ID across frames
x1, y1, x2, y2 = box.xyxy[0]
# Store trajectory
if track_id not in tracks:
tracks[track_id] = []
tracks[track_id].append({
'frame': len(tracks[track_id]),
'bbox': (x1, y1, x2, y2),
'conf': box.conf[0]
})
# Now we can analyze:
print(f"Unique dolphins detected: {len(tracks)}")
# Trajectory analysis
for track_id, trajectory in tracks.items():
if len(trajectory) > 30: # Only long tracks
print(f"Dolphin {track_id} appeared in {len(trajectory)} frames")
# Calculate movement, speed, etc.
Tracking benefits:
Tracking algorithms: | Algorithm | Speed | Robustness | Occlusion Handling | |-----------|-------|------------|---------------------| | ByteTrack | Fast | Good | Excellent | | SORT | Very Fast | Fair | Fair | | DeepSORT | Medium | Excellent | Good | | BotSORT | Medium | Excellent | Excellent |
□ Preprocess frames (resize, pad, normalize)
□ Sample frames intelligently (1 FPS or scene change detection)
□ Use batch inference (16-32 images per batch)
□ Tune NMS thresholds for your use case
□ Implement tracking if analyzing video
□ Log inference time and GPU utilization
□ Handle edge cases (empty frames, corrupted video)
□ Save results in structured format (JSON, CSV)
□ Visualize detections for debugging
□ Benchmark on representative data
| Scenario | Appropriate? | |----------|--------------| | Analyze drone footage for archaeology | ✅ Yes - custom object detection | | Track wildlife in video | ✅ Yes - detection + tracking | | Count people in crowd | ✅ Yes - dense object detection | | Real-time security camera | ✅ Yes - YOLOv8 real-time | | Filter vacation photos | ❌ No - use photo management apps | | Face recognition login | ❌ No - use AWS Rekognition API | | Read license plates | ❌ No - use specialized OCR |
/references/yolo-guide.md - YOLOv8 setup, training, inference patterns/references/video-processing.md - Frame extraction, scene detection, optimization/references/tracking-algorithms.md - ByteTrack, SORT, DeepSORT comparisonscripts/video_analyzer.py - Extract frames, run detection, generate timelinescripts/model_trainer.py - Fine-tune YOLO on custom dataset, export weightsThis skill guides: Computer vision | Object detection | Video analysis | YOLO | Tracking | Drone footage | Wildlife monitoring
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.