Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

curiositech/computer-vision-pipeline

Name: computer-vision-pipeline
Author: curiositech

skills/computer-vision-pipeline/SKILL.md

npx skillsauth add curiositech/windags-skills computer-vision-pipeline

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Computer Vision Pipeline

Expert in building production-ready computer vision systems for object detection, tracking, and video analysis.

When to Use

✅ Use for:

Drone footage analysis (archaeological surveys, conservation)
Wildlife monitoring and tracking
Real-time object detection systems
Video preprocessing and analysis
Custom model training and inference
Multi-object tracking (MOT)

❌ NOT for:

Simple image filters (use Pillow/PIL)
Photo editing (use Photoshop/GIMP)
Face recognition APIs (use AWS Rekognition)
Basic OCR (use Tesseract)

Technology Selection

Object Detection Models

| Model | Speed (FPS) | Accuracy (mAP) | Use Case | |-------|-------------|----------------|----------| | YOLOv8 | 140 | 53.9% | Real-time detection | | Detectron2 | 25 | 58.7% | High accuracy, research | | EfficientDet | 35 | 55.1% | Mobile deployment | | Faster R-CNN | 10 | 42.0% | Legacy systems |

Timeline:

2015: Faster R-CNN (two-stage detection)
2016: YOLO v1 (one-stage, real-time)
2020: YOLOv5 (PyTorch, production-ready)
2023: YOLOv8 (state-of-the-art)
2024: YOLOv8 is industry standard for real-time

Decision tree:

Need real-time (&gt;30 FPS)? → YOLOv8
Need highest accuracy? → Detectron2 Mask R-CNN
Need mobile deployment? → YOLOv8-nano or EfficientDet
Need instance segmentation? → Detectron2 or YOLOv8-seg
Need custom objects? → Fine-tune YOLOv8

Common Anti-Patterns

Anti-Pattern 1: Not Preprocessing Frames Before Detection

Novice thinking: "Just run detection on raw video frames"

Problem: Poor detection accuracy, wasted GPU cycles.

Wrong approach:

# ❌ No preprocessing - poor results
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Raw frame detection - no normalization, no resizing
    results = model(frame)
    # Poor accuracy, slow inference

Why wrong:

Video resolution too high (4K = 8.3 megapixels per frame)
No normalization (pixel values 0-255 instead of 0-1)
Aspect ratio not maintained
GPU memory overflow on high-res frames

Correct approach:

# ✅ Proper preprocessing pipeline
import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

# Model expects 640x640 input
TARGET_SIZE = 640

def preprocess_frame(frame):
    # Resize while maintaining aspect ratio
    h, w = frame.shape[:2]
    scale = TARGET_SIZE / max(h, w)
    new_w, new_h = int(w * scale), int(h * scale)

    resized = cv2.resize(frame, (new_w, new_h), interpolation=cv2.INTER_LINEAR)

    # Pad to square
    pad_w = (TARGET_SIZE - new_w) // 2
    pad_h = (TARGET_SIZE - new_h) // 2

    padded = cv2.copyMakeBorder(
        resized,
        pad_h, TARGET_SIZE - new_h - pad_h,
        pad_w, TARGET_SIZE - new_w - pad_w,
        cv2.BORDER_CONSTANT,
        value=(114, 114, 114)  # Gray padding
    )

    # Normalize to 0-1 (if model expects it)
    # normalized = padded.astype(np.float32) / 255.0

    return padded, scale

while True:
    ret, frame = video.read()
    if not ret:
        break

    preprocessed, scale = preprocess_frame(frame)
    results = model(preprocessed)

    # Scale bounding boxes back to original coordinates
    for box in results[0].boxes:
        x1, y1, x2, y2 = box.xyxy[0]
        x1, y1, x2, y2 = x1/scale, y1/scale, x2/scale, y2/scale

Performance comparison:

Raw 4K frames: 5 FPS, 72% mAP
Preprocessed 640x640: 45 FPS, 89% mAP

Timeline context:

2015: Manual preprocessing required
2020: YOLOv5 added auto-resize
2023: YOLOv8 has smart preprocessing but explicit control is better

Anti-Pattern 2: Processing Every Frame in Video

Novice thinking: "Run detection on every single frame"

Problem: 99% of frames are redundant, wasting compute.

Wrong approach:

# ❌ Process every frame (30 FPS video = 1800 frames/min)
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Run detection on EVERY frame
    results = model(frame)
    detections.append(results)

# 10-minute video = 18,000 inferences (15 minutes on GPU)

Why wrong:

Adjacent frames are nearly identical
Wasting 95% of compute on duplicate work
Slow processing time
Massive storage for results

Correct approach 1: Frame sampling

# ✅ Sample every Nth frame
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

SAMPLE_RATE = 30  # Process 1 frame per second (if 30 FPS video)

frame_count = 0
detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    frame_count += 1

    # Only process every 30th frame
    if frame_count % SAMPLE_RATE == 0:
        results = model(frame)
        detections.append({
            'frame': frame_count,
            'timestamp': frame_count / 30.0,
            'results': results
        })

# 10-minute video = 600 inferences (30 seconds on GPU)

Correct approach 2: Adaptive sampling with scene change detection

# ✅ Only process when scene changes significantly
import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

def scene_changed(prev_frame, curr_frame, threshold=0.3):
    """Detect scene change using histogram comparison"""
    if prev_frame is None:
        return True

    # Convert to grayscale
    prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
    curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)

    # Calculate histograms
    prev_hist = cv2.calcHist([prev_gray], [0], None, [256], [0, 256])
    curr_hist = cv2.calcHist([curr_gray], [0], None, [256], [0, 256])

    # Compare histograms
    correlation = cv2.compareHist(prev_hist, curr_hist, cv2.HISTCMP_CORREL)

    return correlation < (1 - threshold)

prev_frame = None
detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Only run detection if scene changed
    if scene_changed(prev_frame, frame):
        results = model(frame)
        detections.append(results)

    prev_frame = frame.copy()

# Adapts to video content - static shots skip frames, action scenes process more

Savings:

Every frame: 18,000 inferences
Sample 1 FPS: 600 inferences (97% reduction)
Adaptive: ~1,200 inferences (93% reduction)

Anti-Pattern 3: Not Using Batch Inference

Novice thinking: "Process one image at a time"

Problem: GPU sits idle 80% of the time waiting for data.

Wrong approach:

# ❌ Sequential processing - GPU underutilized
import cv2
from ultralytics import YOLO
import time

model = YOLO('yolov8n.pt')

# 100 images to process
image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]

start = time.time()

for path in image_paths:
    frame = cv2.imread(path)
    results = model(frame)  # Process one at a time
    # GPU utilization: ~20%

elapsed = time.time() - start
print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
# Output: 45 seconds

Why wrong:

GPU has to wait for CPU to load each image
No parallelization
GPU utilization ~20%
Slow throughput

Correct approach:

# ✅ Batch inference - GPU fully utilized
import cv2
from ultralytics import YOLO
import time

model = YOLO('yolov8n.pt')

image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]

BATCH_SIZE = 16  # Process 16 images at once

start = time.time()

for i in range(0, len(image_paths), BATCH_SIZE):
    batch_paths = image_paths[i:i+BATCH_SIZE]

    # Load batch
    frames = [cv2.imread(path) for path in batch_paths]

    # Batch inference (single GPU call)
    results = model(frames)  # Pass list of images
    # GPU utilization: ~85%

elapsed = time.time() - start
print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
# Output: 8 seconds (5.6x faster!)

Performance comparison: | Method | Time (100 images) | GPU Util | Throughput | |--------|-------------------|----------|------------| | Sequential | 45s | 20% | 2.2 img/s | | Batch (16) | 8s | 85% | 12.5 img/s | | Batch (32) | 6s | 92% | 16.7 img/s |

Batch size tuning:

# Find optimal batch size for your GPU
import torch

def find_optimal_batch_size(model, image_size=(640, 640)):
    for batch_size in [1, 2, 4, 8, 16, 32, 64]:
        try:
            dummy_input = torch.randn(batch_size, 3, *image_size).cuda()

            start = time.time()
            with torch.no_grad():
                _ = model(dummy_input)
            elapsed = time.time() - start

            throughput = batch_size / elapsed
            print(f"Batch {batch_size}: {throughput:.1f} img/s")
        except RuntimeError as e:
            print(f"Batch {batch_size}: OOM (out of memory)")
            break

# Find optimal batch size before production
find_optimal_batch_size(model)

Anti-Pattern 4: Ignoring Non-Maximum Suppression (NMS) Tuning

Problem: Duplicate detections, missed objects, slow post-processing.

Wrong approach:

# ❌ Use default NMS settings for everything
from ultralytics import YOLO

model = YOLO('yolov8n.pt')

# Default settings (iou_threshold=0.45, conf_threshold=0.25)
results = model('crowded_scene.jpg')

# Result: 50 bounding boxes, 30 are duplicates!

Why wrong:

Default IoU=0.45 is too permissive for dense objects
Default conf=0.25 includes low-quality detections
No adaptation to use case

Correct approach:

# ✅ Tune NMS for your use case
from ultralytics import YOLO

model = YOLO('yolov8n.pt')

# Sparse objects (dolphins in ocean)
sparse_results = model(
    'ocean_footage.jpg',
    iou=0.5,    # Higher IoU = allow closer boxes
    conf=0.4    # Higher confidence = fewer false positives
)

# Dense objects (crowd, flock of birds)
dense_results = model(
    'crowded_scene.jpg',
    iou=0.3,    # Lower IoU = suppress more duplicates
    conf=0.5    # Higher confidence = filter noise
)

# High precision needed (legal evidence)
precise_results = model(
    'evidence.jpg',
    iou=0.5,
    conf=0.7,   # Very high confidence
    max_det=50  # Limit max detections
)

NMS parameter guide: | Use Case | IoU | Conf | Max Det | |----------|-----|------|---------| | Sparse objects (wildlife) | 0.5 | 0.4 | 100 | | Dense objects (crowd) | 0.3 | 0.5 | 300 | | High precision (evidence) | 0.5 | 0.7 | 50 | | Real-time (speed priority) | 0.45 | 0.3 | 100 |

Anti-Pattern 5: No Tracking Between Frames

Novice thinking: "Run detection on each frame independently"

Problem: Can't count unique objects, track movement, or build trajectories.

Wrong approach:

# ❌ Independent frame detection - no object identity
from ultralytics import YOLO
import cv2

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('dolphins.mp4')

detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    results = model(frame)
    detections.append(results)

# Result: Can't tell if frame 10 dolphin is same as frame 20 dolphin
# Can't count unique dolphins
# Can't track trajectories

Why wrong:

No object identity across frames
Can't count unique objects
Can't analyze movement patterns
Can't build trajectories

Correct approach: Use tracking (ByteTrack)

# ✅ Multi-object tracking with ByteTrack
from ultralytics import YOLO
import cv2

# YOLO with tracking
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('dolphins.mp4')

# Track objects across frames
tracks = {}

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Run detection + tracking
    results = model.track(
        frame,
        persist=True,     # Maintain IDs across frames
        tracker='bytetrack.yaml'  # ByteTrack algorithm
    )

    # Each detection now has persistent ID
    for box in results[0].boxes:
        track_id = int(box.id[0])  # Unique ID across frames
        x1, y1, x2, y2 = box.xyxy[0]

        # Store trajectory
        if track_id not in tracks:
            tracks[track_id] = []

        tracks[track_id].append({
            'frame': len(tracks[track_id]),
            'bbox': (x1, y1, x2, y2),
            'conf': box.conf[0]
        })

# Now we can analyze:
print(f"Unique dolphins detected: {len(tracks)}")

# Trajectory analysis
for track_id, trajectory in tracks.items():
    if len(trajectory) > 30:  # Only long tracks
        print(f"Dolphin {track_id} appeared in {len(trajectory)} frames")
        # Calculate movement, speed, etc.

Tracking benefits:

Count unique objects (not just detections per frame)
Build trajectories and movement patterns
Analyze behavior over time
Filter out brief false positives

Tracking algorithms: | Algorithm | Speed | Robustness | Occlusion Handling | |-----------|-------|------------|---------------------| | ByteTrack | Fast | Good | Excellent | | SORT | Very Fast | Fair | Fair | | DeepSORT | Medium | Excellent | Good | | BotSORT | Medium | Excellent | Excellent |

Production Checklist

□ Preprocess frames (resize, pad, normalize)
□ Sample frames intelligently (1 FPS or scene change detection)
□ Use batch inference (16-32 images per batch)
□ Tune NMS thresholds for your use case
□ Implement tracking if analyzing video
□ Log inference time and GPU utilization
□ Handle edge cases (empty frames, corrupted video)
□ Save results in structured format (JSON, CSV)
□ Visualize detections for debugging
□ Benchmark on representative data

When to Use vs Avoid

| Scenario | Appropriate? | |----------|--------------| | Analyze drone footage for archaeology | ✅ Yes - custom object detection | | Track wildlife in video | ✅ Yes - detection + tracking | | Count people in crowd | ✅ Yes - dense object detection | | Real-time security camera | ✅ Yes - YOLOv8 real-time | | Filter vacation photos | ❌ No - use photo management apps | | Face recognition login | ❌ No - use AWS Rekognition API | | Read license plates | ❌ No - use specialized OCR |

References

/references/yolo-guide.md - YOLOv8 setup, training, inference patterns
/references/video-processing.md - Frame extraction, scene detection, optimization
/references/tracking-algorithms.md - ByteTrack, SORT, DeepSORT comparison

Scripts

scripts/video_analyzer.py - Extract frames, run detection, generate timeline
scripts/model_trainer.py - Fine-tune YOLO on custom dataset, export weights

curiositech/computer-vision-pipeline

skills/computer-vision-pipeline/SKILL.md

Build production computer vision pipelines for object detection, tracking, and video analysis. Handles drone footage, wildlife monitoring, and real-time detection. Supports YOLO, Detectron2, TensorFlow, PyTorch. Use for archaeological surveys, conservation, security. Activate on "object detection", "video analysis", "YOLO", "tracking", "drone footage". NOT for simple image filters, photo editing, or face recognition APIs.

development

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add curiositech/windags-skills computer-vision-pipeline

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 4, 2026, 2:00 PM226.5s6 files scanned

SKILL.md

license:: Apache-2.0
name:: computer-vision-pipeline
description:: Build production computer vision pipelines for object detection, tracking, and video analysis. Handles drone footage, wildlife monitoring, and real-time detection. Supports YOLO, Detectron2, TensorFlow, PyTorch. Use for archaeological surveys, conservation, security. Activate on "object detection", "video analysis", "YOLO", "tracking", "drone footage". NOT for simple image filters, photo editing, or face recognition APIs.
allowed-tools:: Read,Write,Edit,Bash(python*,pip*,ffmpeg*)
category:: AI & Machine Learning

Computer Vision Pipeline

Expert in building production-ready computer vision systems for object detection, tracking, and video analysis.

When to Use

✅ Use for:

Drone footage analysis (archaeological surveys, conservation)
Wildlife monitoring and tracking
Real-time object detection systems
Video preprocessing and analysis
Custom model training and inference
Multi-object tracking (MOT)

❌ NOT for:

Simple image filters (use Pillow/PIL)
Photo editing (use Photoshop/GIMP)
Face recognition APIs (use AWS Rekognition)
Basic OCR (use Tesseract)

Technology Selection

Object Detection Models

Timeline:

2015: Faster R-CNN (two-stage detection)
2016: YOLO v1 (one-stage, real-time)
2020: YOLOv5 (PyTorch, production-ready)
2023: YOLOv8 (state-of-the-art)
2024: YOLOv8 is industry standard for real-time

Decision tree:

Need real-time (&gt;30 FPS)? → YOLOv8
Need highest accuracy? → Detectron2 Mask R-CNN
Need mobile deployment? → YOLOv8-nano or EfficientDet
Need instance segmentation? → Detectron2 or YOLOv8-seg
Need custom objects? → Fine-tune YOLOv8

Common Anti-Patterns

Anti-Pattern 1: Not Preprocessing Frames Before Detection

Novice thinking: "Just run detection on raw video frames"

Problem: Poor detection accuracy, wasted GPU cycles.

Wrong approach:

# ❌ No preprocessing - poor results
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Raw frame detection - no normalization, no resizing
    results = model(frame)
    # Poor accuracy, slow inference

Why wrong:

Video resolution too high (4K = 8.3 megapixels per frame)
No normalization (pixel values 0-255 instead of 0-1)
Aspect ratio not maintained
GPU memory overflow on high-res frames

Correct approach:

# ✅ Proper preprocessing pipeline
import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

# Model expects 640x640 input
TARGET_SIZE = 640

def preprocess_frame(frame):
    # Resize while maintaining aspect ratio
    h, w = frame.shape[:2]
    scale = TARGET_SIZE / max(h, w)
    new_w, new_h = int(w * scale), int(h * scale)

    resized = cv2.resize(frame, (new_w, new_h), interpolation=cv2.INTER_LINEAR)

    # Pad to square
    pad_w = (TARGET_SIZE - new_w) // 2
    pad_h = (TARGET_SIZE - new_h) // 2

    padded = cv2.copyMakeBorder(
        resized,
        pad_h, TARGET_SIZE - new_h - pad_h,
        pad_w, TARGET_SIZE - new_w - pad_w,
        cv2.BORDER_CONSTANT,
        value=(114, 114, 114)  # Gray padding
    )

    # Normalize to 0-1 (if model expects it)
    # normalized = padded.astype(np.float32) / 255.0

    return padded, scale

while True:
    ret, frame = video.read()
    if not ret:
        break

    preprocessed, scale = preprocess_frame(frame)
    results = model(preprocessed)

    # Scale bounding boxes back to original coordinates
    for box in results[0].boxes:
        x1, y1, x2, y2 = box.xyxy[0]
        x1, y1, x2, y2 = x1/scale, y1/scale, x2/scale, y2/scale

Performance comparison:

Raw 4K frames: 5 FPS, 72% mAP
Preprocessed 640x640: 45 FPS, 89% mAP

Timeline context:

2015: Manual preprocessing required
2020: YOLOv5 added auto-resize
2023: YOLOv8 has smart preprocessing but explicit control is better

Anti-Pattern 2: Processing Every Frame in Video

Novice thinking: "Run detection on every single frame"

Problem: 99% of frames are redundant, wasting compute.

Wrong approach:

# ❌ Process every frame (30 FPS video = 1800 frames/min)
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Run detection on EVERY frame
    results = model(frame)
    detections.append(results)

# 10-minute video = 18,000 inferences (15 minutes on GPU)

Why wrong:

Adjacent frames are nearly identical
Wasting 95% of compute on duplicate work
Slow processing time
Massive storage for results

Correct approach 1: Frame sampling

# ✅ Sample every Nth frame
import cv2
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

SAMPLE_RATE = 30  # Process 1 frame per second (if 30 FPS video)

frame_count = 0
detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    frame_count += 1

    # Only process every 30th frame
    if frame_count % SAMPLE_RATE == 0:
        results = model(frame)
        detections.append({
            'frame': frame_count,
            'timestamp': frame_count / 30.0,
            'results': results
        })

# 10-minute video = 600 inferences (30 seconds on GPU)

Correct approach 2: Adaptive sampling with scene change detection

# ✅ Only process when scene changes significantly
import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('drone_footage.mp4')

def scene_changed(prev_frame, curr_frame, threshold=0.3):
    """Detect scene change using histogram comparison"""
    if prev_frame is None:
        return True

    # Convert to grayscale
    prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
    curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)

    # Calculate histograms
    prev_hist = cv2.calcHist([prev_gray], [0], None, [256], [0, 256])
    curr_hist = cv2.calcHist([curr_gray], [0], None, [256], [0, 256])

    # Compare histograms
    correlation = cv2.compareHist(prev_hist, curr_hist, cv2.HISTCMP_CORREL)

    return correlation < (1 - threshold)

prev_frame = None
detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Only run detection if scene changed
    if scene_changed(prev_frame, frame):
        results = model(frame)
        detections.append(results)

    prev_frame = frame.copy()

# Adapts to video content - static shots skip frames, action scenes process more

Savings:

Every frame: 18,000 inferences
Sample 1 FPS: 600 inferences (97% reduction)
Adaptive: ~1,200 inferences (93% reduction)

Anti-Pattern 3: Not Using Batch Inference

Novice thinking: "Process one image at a time"

Problem: GPU sits idle 80% of the time waiting for data.

Wrong approach:

# ❌ Sequential processing - GPU underutilized
import cv2
from ultralytics import YOLO
import time

model = YOLO('yolov8n.pt')

# 100 images to process
image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]

start = time.time()

for path in image_paths:
    frame = cv2.imread(path)
    results = model(frame)  # Process one at a time
    # GPU utilization: ~20%

elapsed = time.time() - start
print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
# Output: 45 seconds

Why wrong:

GPU has to wait for CPU to load each image
No parallelization
GPU utilization ~20%
Slow throughput

Correct approach:

# ✅ Batch inference - GPU fully utilized
import cv2
from ultralytics import YOLO
import time

model = YOLO('yolov8n.pt')

image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]

BATCH_SIZE = 16  # Process 16 images at once

start = time.time()

for i in range(0, len(image_paths), BATCH_SIZE):
    batch_paths = image_paths[i:i+BATCH_SIZE]

    # Load batch
    frames = [cv2.imread(path) for path in batch_paths]

    # Batch inference (single GPU call)
    results = model(frames)  # Pass list of images
    # GPU utilization: ~85%

elapsed = time.time() - start
print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
# Output: 8 seconds (5.6x faster!)

Batch size tuning:

# Find optimal batch size for your GPU
import torch

def find_optimal_batch_size(model, image_size=(640, 640)):
    for batch_size in [1, 2, 4, 8, 16, 32, 64]:
        try:
            dummy_input = torch.randn(batch_size, 3, *image_size).cuda()

            start = time.time()
            with torch.no_grad():
                _ = model(dummy_input)
            elapsed = time.time() - start

            throughput = batch_size / elapsed
            print(f"Batch {batch_size}: {throughput:.1f} img/s")
        except RuntimeError as e:
            print(f"Batch {batch_size}: OOM (out of memory)")
            break

# Find optimal batch size before production
find_optimal_batch_size(model)

Anti-Pattern 4: Ignoring Non-Maximum Suppression (NMS) Tuning

Problem: Duplicate detections, missed objects, slow post-processing.

Wrong approach:

# ❌ Use default NMS settings for everything
from ultralytics import YOLO

model = YOLO('yolov8n.pt')

# Default settings (iou_threshold=0.45, conf_threshold=0.25)
results = model('crowded_scene.jpg')

# Result: 50 bounding boxes, 30 are duplicates!

Why wrong:

Default IoU=0.45 is too permissive for dense objects
Default conf=0.25 includes low-quality detections
No adaptation to use case

Correct approach:

# ✅ Tune NMS for your use case
from ultralytics import YOLO

model = YOLO('yolov8n.pt')

# Sparse objects (dolphins in ocean)
sparse_results = model(
    'ocean_footage.jpg',
    iou=0.5,    # Higher IoU = allow closer boxes
    conf=0.4    # Higher confidence = fewer false positives
)

# Dense objects (crowd, flock of birds)
dense_results = model(
    'crowded_scene.jpg',
    iou=0.3,    # Lower IoU = suppress more duplicates
    conf=0.5    # Higher confidence = filter noise
)

# High precision needed (legal evidence)
precise_results = model(
    'evidence.jpg',
    iou=0.5,
    conf=0.7,   # Very high confidence
    max_det=50  # Limit max detections
)

Anti-Pattern 5: No Tracking Between Frames

Novice thinking: "Run detection on each frame independently"

Problem: Can't count unique objects, track movement, or build trajectories.

Wrong approach:

# ❌ Independent frame detection - no object identity
from ultralytics import YOLO
import cv2

model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('dolphins.mp4')

detections = []

while True:
    ret, frame = video.read()
    if not ret:
        break

    results = model(frame)
    detections.append(results)

# Result: Can't tell if frame 10 dolphin is same as frame 20 dolphin
# Can't count unique dolphins
# Can't track trajectories

Why wrong:

No object identity across frames
Can't count unique objects
Can't analyze movement patterns
Can't build trajectories

Correct approach: Use tracking (ByteTrack)

# ✅ Multi-object tracking with ByteTrack
from ultralytics import YOLO
import cv2

# YOLO with tracking
model = YOLO('yolov8n.pt')
video = cv2.VideoCapture('dolphins.mp4')

# Track objects across frames
tracks = {}

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Run detection + tracking
    results = model.track(
        frame,
        persist=True,     # Maintain IDs across frames
        tracker='bytetrack.yaml'  # ByteTrack algorithm
    )

    # Each detection now has persistent ID
    for box in results[0].boxes:
        track_id = int(box.id[0])  # Unique ID across frames
        x1, y1, x2, y2 = box.xyxy[0]

        # Store trajectory
        if track_id not in tracks:
            tracks[track_id] = []

        tracks[track_id].append({
            'frame': len(tracks[track_id]),
            'bbox': (x1, y1, x2, y2),
            'conf': box.conf[0]
        })

# Now we can analyze:
print(f"Unique dolphins detected: {len(tracks)}")

# Trajectory analysis
for track_id, trajectory in tracks.items():
    if len(trajectory) > 30:  # Only long tracks
        print(f"Dolphin {track_id} appeared in {len(trajectory)} frames")
        # Calculate movement, speed, etc.

Tracking benefits:

Count unique objects (not just detections per frame)
Build trajectories and movement patterns
Analyze behavior over time
Filter out brief false positives

Production Checklist

□ Preprocess frames (resize, pad, normalize)
□ Sample frames intelligently (1 FPS or scene change detection)
□ Use batch inference (16-32 images per batch)
□ Tune NMS thresholds for your use case
□ Implement tracking if analyzing video
□ Log inference time and GPU utilization
□ Handle edge cases (empty frames, corrupted video)
□ Save results in structured format (JSON, CSV)
□ Visualize detections for debugging
□ Benchmark on representative data

When to Use vs Avoid

References

/references/yolo-guide.md - YOLOv8 setup, training, inference patterns
/references/video-processing.md - Frame extraction, scene detection, optimization
/references/tracking-algorithms.md - ByteTrack, SORT, DeepSORT comparison

Scripts

scripts/video_analyzer.py - Extract frames, run detection, generate timeline
scripts/model_trainer.py - Fine-tune YOLO on custom dataset, export weights

Related Skills

curiositech/revisiting-interview-data-analysing-turn

data-ai

VerifiedTrustedCommunity

license: Apache-2.0 NOT for unrelated tasks outside this domain.

8SKILL.mdUpdated Jul 19, 2026

curiositech/revisiting-interview-data-analysing-turn

curiositech/redis-patterns-expert

development

VerifiedTrustedCommunity

Use when designing caching strategies (cache-aside, write-through, write-behind), implementing distributed locks, building rate limiters, leaderboards, real-time streams (XADD/consumer groups), pub/sub, or tuning eviction policies. Triggers: thundering-herd on cache miss, dogpile on key expiry, Redlock vs SET-NX-PX choice, sliding-window rate limiter, hot-key on a single cluster slot, big-key blowup, MULTI/EXEC across slots, KEYS in production. NOT for Redis Cluster operations/admin (different domain), embedded KV (SQLite, leveldb), in-process LRU caches, or Memcached.

8SKILL.mdUpdated Jul 19, 2026

curiositech/redis-patterns-expert

curiositech/react-server-components-boundary

tools

VerifiedTrustedCommunity

Drawing the `'use client'` boundary correctly in React Server Components apps (Next.js App Router, RSC frameworks) — leaf-pushing, slot composition, serialization rules, and environment poisoning prevention. Grounded in react.dev and Next.js 16 docs.

8SKILL.mdUpdated Jul 19, 2026

curiositech/react-server-components-boundary

curiositech/rate-limiting-strategy

development

VerifiedTrustedCommunity

Use when designing rate limiting for an API, choosing between token bucket / sliding window / leaky bucket / fixed window, implementing it in Redis, deciding edge (Cloudflare/Upstash) vs origin enforcement, sizing per-user vs per-IP vs per-endpoint quotas, returning the right 429 response with Retry-After, or fixing the boundary-burst bug in fixed-window limiters. Triggers: 429 too many requests, INCR + EXPIRE, ZADD + ZREMRANGEBYSCORE + ZCARD, X-RateLimit-Remaining header, Cloudflare WAF rate limiting rules, Upstash @upstash/ratelimit, leaky bucket shaping vs policing, distributed rate limiter consistency. NOT for DDoS mitigation specifically (different scale), CAPTCHA / bot management, full WAF design, or per-user quota billing.

8SKILL.mdUpdated Jul 19, 2026

curiositech/rate-limiting-strategy

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/curiositech/windags-skills.git

# Copy into Claude Code skills folder (global)
cp -r windags-skills/skills/computer-vision-pipeline ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

curiositech/windags-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT