skills/3d-cv-labeling-2026/SKILL.md
Expert in 3D computer vision labeling tools, workflows, and AI-assisted annotation for LiDAR, point clouds, and sensor fusion. Covers SAM4D/Point-SAM, human-in-the-loop architectures, and vertical-specific training strategies. Activate on '3D labeling', 'point cloud annotation', 'LiDAR labeling', 'SAM 3D', 'SAM4D', 'sensor fusion annotation', '3D bounding box', 'semantic segmentation point cloud'. NOT for 2D image labeling (use clip-aware-embeddings), general ML training (use ml-engineer), video annotation without 3D (use computer-vision-pipeline), or VLM prompt engineering (use prompt-engineer).
npx skillsauth add curiositech/windags-skills 3d-cv-labeling-2026Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert guidance on 3D annotation tools, AI-assisted labeling workflows, and training architectures for LiDAR/point cloud computer vision in autonomous vehicles, robotics, infrastructure inspection, and geospatial applications.
✅ Use for:
❌ NOT for:
| Tool | Strength | Best For | Key AI Feature | |------|----------|----------|----------------| | BasicAI | One-click detection | Autonomous driving | Pre-labeling models fine-tuned for AV | | Supervisely | Customization | R&D teams | AI tracking, 2D→3D single-click | | Segments.ai | 2D+3D sync | Robotics perception | Sequential propagation | | Deepen AI | Sensor calibration | In-house perception | Pixel-perfect multi-sensor | | Dataloop | Enterprise MLOps | Large annotation teams | Model-assisted + Point Cloud Focus | | Encord | Full workflow | Multi-modal projects | Track-ID management | | Ango Hub (iMerit) | Dense annotation | Complex multi-modal | Frame-to-frame propagation |
| Tool | Maturity | Limitations | |------|----------|-------------| | CVAT | Stable | 3D bounding boxes only, limited interpolation | | 3D BAT | Good | Full-surround annotation, semi-auto tracking | | Label Studio | Partial 3D | Better for multi-format, not specialized 3D |
Key innovation: Unified Multi-modal Positional Encoding (UMPE) aligns camera and LiDAR in shared 3D space.
Camera Stream → Feature Extraction → ┐
├→ UMPE Alignment → Promptable 3D Segmentation
LiDAR Stream → Point Encoding → ┘
Data engine breakthrough: Automatic pseudo-label generation at 100x+ faster than human annotation using:
Dataset: Waymo-4DSeg (300k+ camera-LiDAR aligned masklets)
Architecture: Efficient transformer designed specifically for point clouds (not adapted from 2D).
Knowledge distillation: 2D SAM → 3D Point-SAM via data engine that generates:
Benchmarks: Outperforms state-of-the-art on indoor (ScanNet) and outdoor (nuScenes, Waymo) datasets.
Two-stage approach:
Best for: UAV/drone workflows where colorized point clouds from L1 LiDAR + RGB cameras are available.
Old approach: Human labels → Train model → Deploy New approach: Model assists → Human validates → Rapid iteration
┌─────────────────────────────────────────────────────────┐
│ LABELING PIPELINE │
├─────────────────────────────────────────────────────────┤
│ Raw Data → AI Pre-label → Human Review → QA Check │
│ │ │ │ │ │
│ │ SAM4D/VLM Corrections Consensus │
│ │ generates only where sampling │
│ │ proposals AI uncertain │
└─────────────────────────────────────────────────────────┘
| Approach | Time for 10k frames | Annotation Quality | |----------|--------------------|--------------------| | Manual only | 400 hours | 95% (expert) | | AI pre-label + review | 50 hours | 97% (AI+human) | | SAM4D data engine | 4 hours | 92% (pseudo) |
The 80/20 rule: ~80% of ML project time is data prep. Model-in-the-loop cuts this dramatically.
| Aspect | Specialized (YOLO, PointPillars) | VLMs (GPT-4V, Gemini) | |--------|----------------------------------|----------------------| | Latency | 10-50ms (real-time) | 500-2000ms | | 3D precision | Strong geometric priors | Noisy text-3D alignment | | Novel objects | Closed-set (what you train) | Open-vocabulary | | Compute | Edge-deployable | GPU cluster required | | Hallucinations | None (deterministic) | Yes (safety-critical risk) | | Domain shift | Struggles (fog, night) | Better generalization |
Use Specialized Models When:
Use VLMs/Foundation Models When:
┌───────────────────────┐
│ VLM (Slow Brain) │
│ • Scene understanding│
│ • Open vocabulary │
│ • Anomaly detection │
└──────────┬────────────┘
│ High-level context
▼
┌──────────────────────────────────────────────────────────┐
│ Specialized Detector (Fast Brain) │
│ • Real-time inference (YOLO, PointPillars, CenterPoint)│
│ • Known object detection & tracking │
│ • Safety-critical decisions │
└──────────────────────────────────────────────────────────┘
Examples:
Objects: Utility poles, insulators, conductors, vegetation, damage types Sensor fusion: RGB + thermal + LiDAR Training data needs:
Architecture:
LiDAR → Point cloud encoder → ┐
Thermal → 2D encoder → ├→ Fusion → Multi-task head
RGB → 2D encoder → ┘ ├→ Object detection
├→ Defect classification
└→ Clearance regression
Objects: Vehicles, pedestrians, cyclists, traffic signs, lane markings Key requirement: Temporal consistency (track-IDs across frames) Training data needs:
Architecture: CenterPoint, PointPillars, or Voxel-based detectors with BEV (Bird's Eye View) representation.
Objects: Crop rows, canopy height, fuel load, fire spread boundaries Sensor fusion: RGB + multispectral + LiDAR Training data needs:
Why not just VLM? VLMs can't:
Novice thinking: "SAM segments anything, so I'll just run it on my LiDAR data"
Reality:
Correct approach: Use Point-SAM for native 3D, or project to 2D for SAM → lift back to 3D.
Novice thinking: "AI pre-labels are 95% accurate, we can skip review"
Reality:
Correct approach: Tier 1 (safety-critical) always human-validated. Use confidence thresholds for Tier 2/3.
Novice thinking: "GPT-4V can identify damage in my photos"
Reality:
Correct approach: Use VLM for data generation/exploration, specialized model for deployment.
Novice thinking: "LiDAR is enough for 3D detection"
Reality:
Correct approach: Sensor fusion from day one. SAM4D shows fusion pseudo-labels > single-modal.
Do you need real-time inference?
/ \
YES NO
| |
Use specialized Is this exploration?
detector (YOLO, / \
CenterPoint) YES NO
| | |
Have labeled data? Use VLM Generate
/ \ for zero- pseudo-labels
YES NO shot with SAM4D
| |
Train model Use SAM4D/
Point-SAM for
auto-labeling
| Requirement | Recommended Tool | |-------------|------------------| | Autonomous driving at scale | Deepen AI or BasicAI | | R&D/research flexibility | Supervisely or Segments.ai | | Multi-modal (camera+LiDAR+radar) | Ango Hub or Dataloop | | Self-hosted/open source | CVAT + 3D plugins or 3D BAT | | Robotics perception | Segments.ai (2D+3D sync) | | Budget-conscious | Label Studio + custom scripts |
/references/sam4d-architecture.md - Deep dive on SAM4D UMPE and data engine/references/tool-comparison-matrix.md - Detailed feature comparison of all tools/references/hybrid-architecture-examples.md - VOLTRON, DrivePI implementation patterns/references/vertical-training-recipes.md - Infrastructure, AV, agriculture specificstools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.