skills/professional/model-optimization/SKILL.md
Optimize ML models for edge deployment through quantization, pruning, format conversion (TensorRT/TFLite/ONNX), and accuracy/latency benchmarking. Use when preparing models for resource-constrained devices.
npx skillsauth add michaelalber/ai-toolkit model-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"Quantization is not about making models worse. It is about finding the representation that preserves what matters while discarding what does not." -- adapted from Benoit Jacob, Google Quantization Team
This skill covers the complete model optimization pipeline: profiling baseline performance, applying quantization and pruning, converting between inference formats, and benchmarking the results. Every optimization decision is driven by measurement, not intuition — optimize for speed subject to an accuracy floor, never the other way around.
Non-Negotiable Constraints:
Full principle table, KB lookups, pre-flight checklist, decision trees, discipline rules,
anti-patterns, and error recovery live in references/conventions.md.
PROFILE → OPTIMIZE → BENCHMARK → VALIDATE → PACKAGE
baseline quantize/ speedup + accuracy deploy-ready
metrics convert compression within tol artifact
PROFILE Run the pre-flight checklist (conventions.md). Measure baseline latency (100+ iters +
warmup), accuracy on the test set, size, and memory. Record all in the state block.
OPTIMIZE Pick the path from the quantization/pruning decision trees (conventions.md) by target
device. Apply ONE optimization at a time. (Strategy: quantization-workflows.md;
conversions: conversion-pipelines.md.) Validate preprocessing compatibility after each conversion.
BENCHMARK Measure on target hardware when available (set power mode, lock clocks, 5+ min sustained
for thermal throttling). Report P50/P95/P99, not just mean. Label host-only runs as estimates.
VALIDATE Compare accuracy against the floor. If outside tolerance → STOP, report exact numbers,
present alternatives, let the user decide. Never proceed silently past a violation.
PACKAGE Emit the deployment artifact with benchmark report, preprocessing config, and provenance.
Exit criteria: baseline measured and recorded; optimizations applied one at a time and benchmarked; accuracy within the stated tolerance (or the tradeoff explicitly accepted by the user); deployment artifact packaged with metadata. The original model is untouched.
<model-opt-state>
phase: PROFILE | OPTIMIZE | BENCHMARK | VALIDATE | PACKAGE
model_name: [name]
source_format: pytorch | tensorflow | onnx | tflite | tensorrt
target_device: jetson-orin-nano | raspberry-pi-5 | raspberry-pi-4 | cpu-generic
baseline_latency_ms: [number or "unmeasured"]
baseline_accuracy: [number or "unmeasured"]
accuracy_tolerance: [percentage, e.g., "2%"]
optimizations_applied: [comma-separated list or "none"]
current_best_latency_ms: [number or "unmeasured"]
current_best_accuracy: [number or "unmeasured"]
original_model_path: [absolute path to original model file]
last_action: [what was just done]
next_action: [what should happen next]
blockers: [any issues]
</model-opt-state>
references/output-templates.md.references/quantization-workflows.md.references/conversion-pipelines.md.references/conventions.md.| Skill | Relationship |
|-------|-------------|
| edge-cv-pipeline | After optimizing, build the full inference pipeline (camera capture, pre/postprocessing, result publishing). The optimized model becomes the CV pipeline's inference engine. |
| jetson-deploy | After optimizing for Jetson, containerize the deployment, build the TensorRT engine on-device, configure power modes, and set up tegrastats/jtop monitoring. |
development
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".