skills/personal/jetson-deploy/SKILL.md
Deploy and optimize applications on Jetson Orin Nano with TensorRT. Use when setting up Jetson environments, converting models to TensorRT, managing power modes, and containerizing edge AI applications. Do NOT use when the target hardware is not a Jetson device; Do NOT use when deploying to Raspberry Pi — use edge-cv-pipeline and sensor-integration instead.
npx skillsauth add michaelalber/ai-toolkit jetson-deployInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"The future of AI is at the edge. Every robot, every camera, every sensor will have AI processing locally." — Dustin Franklin, NVIDIA Jetson AI Developer
This skill orchestrates the full lifecycle of deploying AI applications to NVIDIA Jetson Orin Nano devices. Every decision is constrained by thermal limits, power budgets, and memory ceilings that do not exist in cloud or desktop environments.
Non-Negotiable Constraints:
jetson-containers from dustynv to build reproducible deployment environments. Bare-metal installs create fragile, unreproducible setups.| Principle | Description | Priority | |-----------|-------------|----------| | Power Mode Awareness | Select and validate power mode before benchmarking or deploying; results are meaningless without a fixed power profile | Critical | | TensorRT First | Convert all inference models to TensorRT engines before deployment; never ship raw ONNX or PyTorch models to production | Critical | | JetPack Compatibility | Verify JetPack version, L4T version, and CUDA version before installing any package or building any container | Critical | | Container Reproducibility | Use jetson-containers for all deployments; pin base images to specific L4T versions; never rely on bare-metal installs | High | | Thermal Management | Profile thermal behavior under sustained load; set power mode and fan policy before benchmarking; monitor with tegrastats | High | | Memory Budget Discipline | The Orin Nano has 8GB unified memory shared between CPU and GPU; account for OS overhead (~1.5GB), display server, and framework footprint | High | | On-Device Validation | Never trust desktop or cloud benchmarks; always validate latency, throughput, and accuracy on the target Jetson device | High | | Precision-Accuracy Tradeoff | FP16 is the default for Orin Nano; INT8 requires calibration data and accuracy validation; never assume precision reduction is lossless | Medium | | Incremental Deployment | Deploy one component at a time; validate each stage before adding the next pipeline element | Medium | | Telemetry from Day One | Instrument with tegrastats and jtop from the first deployment; do not wait for production to add monitoring | Medium |
| Query | When to Call |
|-------|--------------|
| search_knowledge("TensorRT FP16 INT8 quantization Jetson") | During CONVERT/OPTIMIZE — selecting precision and quantization strategy |
| search_knowledge("Jetson JetPack CUDA cuDNN compatibility") | During SETUP — verifying version compatibility before any installation |
| search_knowledge("Docker container NVIDIA GPU runtime") | During CONTAINERIZE — configuring nvidia-docker runtime |
| search_knowledge("TensorRT ONNX model conversion trtexec") | During CONVERT — converting ONNX models to TensorRT engines |
| search_knowledge("Jetson power mode thermal monitoring tegrastats") | During BENCHMARK — measuring thermal behavior and power draw |
| search_code_examples("TensorRT Python inference engine") | Before writing inference code — find TensorRT Python API patterns |
| search_code_examples("Docker Compose systemd service autostart") | During DEPLOY — configuring auto-start and restart policies |
Search edge_ai and robotics collections for Jetson and TensorRT guidance. Search automation for containerization and fleet deployment context.
The deployment lifecycle flows: SETUP → CONTAINERIZE → CONVERT → OPTIMIZE → BENCHMARK → DEPLOY. Iterate between OPTIMIZE and BENCHMARK until performance targets and thermal stability are met.
Verify before beginning any deployment step:
cat /etc/nv_tegra_release)dpkg -l nvidia-l4t-core)nvcc --version)dpkg -l tensorrt)df -h)docker info | grep -i runtime)sudo nvpmodel -q)sudo jetson_clocks --show)If ANY item is unchecked — STOP. Resolve before proceeding.
Confirm the Jetson device is properly configured for deployment.
cat /etc/nv_tegra_release — confirm L4T versionsudo nvpmodel -q — check current power modesudo nvpmodel -m <MODE> — set target power mode (0=MAXN, 1=15W, 2=7W for Orin Nano)sudo jetson_clocks — lock clock frequencies for consistent benchmarkingsudo pip3 install jetson-stats — install jtopdocker run --rm --runtime nvidia --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi — verify Docker nvidia runtimeExit Criteria: JetPack version documented, power mode confirmed, Docker nvidia runtime functional, jtop installed and running.
Build a reproducible container environment using jetson-containers.
git clone https://github.com/dusty-nv/jetson-containersjetson-containers build or docker buildpython3 -c "import tensorrt; print(tensorrt.__version__)" — verify GPU access inside containerExit Criteria: Container runs with --runtime nvidia, GPU accessible inside container, model files accessible via volume mount.
Convert model from training format to TensorRT engine.
python3 -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model)" — validate ONNXtrtexec or Python API; specify FP16 precision (default for Orin Nano)Exit Criteria: .engine or .trt file created, loads without errors, output shape matches expected dimensions.
Tune the TensorRT engine and pipeline for target performance.
trtexec --loadEngine=model.engine --iterations=100 --avgRuns=50 — profile baseline--memPoolSize=workspace:1024MiBExit Criteria: Latency meets target at specified power mode, memory leaves headroom for OS and other processes, no thermal throttling under sustained load.
Produce reliable, reproducible performance measurements.
sudo nvpmodel -m <MODE> — set power mode explicitlysudo jetson_clocks — lock clockstegrastats --interval 1000 — monitor GPU utilization, memory usage, temperature, power draw during benchmarkExit Criteria: Benchmark results documented with power mode and clock state, latency distribution captured (not just mean), no throttling during measurement, accuracy validated against golden reference.
Finalize the deployment for production operation.
Exit Criteria: Application starts automatically on boot, restart policy handles crashes gracefully, monitoring active, end-to-end pipeline validated with real data.
<jetson-deploy-state>
step: [SETUP | CONTAINERIZE | CONVERT | OPTIMIZE | BENCHMARK | DEPLOY]
jetpack_version: [e.g., "6.0", "5.1.2"]
power_mode: [MAXN | 15W | 7W]
inference_engine: [tensorrt | onnxruntime | tflite]
last_action: [what was just done]
next_action: [what should happen next]
blockers: [any issues]
</jetson-deploy-state>
Example: step: CONVERT | jetpack_version: 6.0 | power_mode: 15W | last_action: Exported YOLOv8n to ONNX | next_action: Convert ONNX to TensorRT FP16 engine | blockers: none
## Jetson Deployment Report: [Model Name]
**Device**: Jetson Orin Nano 8GB | **JetPack**: [N] | **Power Mode**: [mode]
**Precision**: [FP16/INT8] | **Mean Latency**: [ms] | **Throughput**: [fps]
**GPU Util**: [%] | **Memory**: [MB/8192 MB] | **Peak Temp**: [C] | **Power**: [W]
**Benchmark**: P50=[ms] P95=[ms] P99=[ms] over N=[iterations] | Accuracy: [mAP/acc]
**FP32 vs FP16 vs INT8**: [latency and accuracy comparison]
Full templates (Deployment Report, Benchmark Results with latency distribution and precision comparison): references/edge-profiling.md
Always verify JetPack version before any installation, container build, or model conversion. Run cat /etc/nv_tegra_release and document the L4T version before proceeding. Mixing packages from different JetPack versions corrupts the system — there is no recovery short of reflashing the device.
Never deploy raw ONNX or PyTorch models to production. TensorRT engines deliver 2–10x better performance on Jetson. TensorRT engines are architecture-specific: an engine built on x86 will NOT run on ARM, and an engine built on one JetPack version may not run on another. Always build engines on the target Jetson device itself.
Profile thermal behavior for at least 5 minutes of sustained load before declaring production-ready. The Orin Nano thermal-throttles at approximately 85–90°C. Passive cooling is insufficient for sustained MAXN workloads. If temperatures exceed 80°C during benchmarking, adjust power mode, add cooling, or reduce model complexity.
All production deployments must run inside containers with pinned L4T base images. Never pip install on the Jetson host system — this creates version conflicts with JetPack system packages that break CUDA and TensorRT. Use jetson-containers build which handles JetPack compatibility automatically.
| Anti-Pattern | Why It's Wrong | Correct Approach | |--------------|----------------|------------------| | Benchmarking on desktop GPU | Results are meaningless for edge deployment; different architecture, memory, and power | Always benchmark on the target Jetson device | | Skipping TensorRT conversion | 2-10x performance left on the table; latency targets will not be met | Convert all production models to TensorRT engines | | Building engine on x86 | TensorRT engines are architecture-specific; x86 engines do not run on ARM | Build engines on the Jetson device itself | | Ignoring power mode during benchmark | Results are not reproducible; different runs use different power profiles | Set power mode explicitly before every benchmark | | Installing pip packages bare-metal | Creates version conflicts with JetPack system packages; breaks CUDA/TensorRT | Use containers; never pip install on the host system | | Using FP32 without trying FP16 | Orin Nano has dedicated FP16 tensor cores; FP32 wastes half the compute capability | Default to FP16; only use FP32 if accuracy requires it | | Deploying without thermal profiling | Device throttles or shuts down under sustained load in production | Run sustained load test with tegrastats for 10+ minutes | | Hardcoding paths in containers | Breaks when deploying to different devices or updating models | Use volume mounts for models, data, and configuration |
CUDA version mismatch ("no kernel image is available" or "CUDA driver version insufficient"): Check JetPack version (cat /etc/nv_tegra_release) and CUDA version (nvcc --version). Reinstall the mismatched package for your specific JetPack/CUDA version. If using a container, verify the base image L4T tag matches the device. Migrate to containers if running bare-metal.
Out of memory on model load ("CUDA out of memory"): Check usage with free -h and tegrastats. Kill unnecessary processes (desktop environment uses ~800MB). Reduce TensorRT workspace: --memPoolSize=workspace:512MiB. Switch FP32→FP16 to halve weight memory. Reduce batch size to 1. Consider sudo systemctl set-default multi-user.target to disable the GUI (~800MB freed).
Thermal throttling (performance degrades after minutes, temperature >80°C): Lower power mode (sudo nvpmodel -m 1), add active cooling, or reduce model complexity. Poll tegrastats inside the application and throttle workload before the hardware does it for you. Re-benchmark at the sustainable power mode.
Container build failures (package conflicts or missing dependencies): Verify device JetPack version and confirm the Dockerfile FROM line uses the matching L4T tag (JetPack 6.x → L4T r36.x, JetPack 5.x → L4T r35.x). Use jetson-containers build which handles compatibility automatically. Pin all package versions explicitly in custom Dockerfiles.
TensorRT engine build failures ("Unsupported ONNX opset" or "Layer not supported"): Check TensorRT version (dpkg -l tensorrt) and ONNX opset (python3 -c "import onnx; print(onnx.load('model.onnx').opset_import)"). Try python3 -m onnxsim model.onnx model_simplified.onnx. Use ONNX-GraphSurgeon to replace unsupported operations. Try an older opset when exporting from PyTorch.
edge-cv-pipeline to build the complete vision pipeline (camera capture, preprocessing, inference, postprocessing, output). Jetson deployment handles infrastructure; CV pipeline handles application logic.sensor-integration for device configuration and data acquisition. Mount device files (/dev/video0, /dev/i2c-*, /dev/spidev*) into containers as needed.development
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".