Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

michaelalber/model-optimization

Name: model-optimization
Author: michaelalber

skills/professional/model-optimization/SKILL.md

npx skillsauth add michaelalber/ai-toolkit model-optimization

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Model Optimization for Edge Deployment

"Quantization is not about making models worse. It is about finding the representation that preserves what matters while discarding what does not." -- adapted from Benoit Jacob, Google Quantization Team

Core Philosophy

This skill covers the complete model optimization pipeline: profiling baseline performance, applying quantization and pruning, converting between inference formats, and benchmarking the results. Every optimization decision is driven by measurement, not intuition — optimize for speed subject to an accuracy floor, never the other way around.

Non-Negotiable Constraints:

BASELINE FIRST — measure the original (latency, accuracy, size, memory) before touching it; without a baseline you cannot quantify improvement or regression.
ACCURACY IS THE CONSTRAINT, LATENCY THE OBJECTIVE — optimize speed subject to an accuracy floor, never the reverse.
ONE CHANGE AT A TIME — apply optimizations sequentially, benchmark after each; compound changes hide regressions.
FORMAT FOLLOWS HARDWARE — TensorRT for Jetson, TFLite for Raspberry Pi, ONNX Runtime for general CPU; never deploy the wrong format.
PRESERVE THE ORIGINAL — never modify or delete the source model; all outputs are new files.

Full principle table, KB lookups, pre-flight checklist, decision trees, discipline rules, anti-patterns, and error recovery live in references/conventions.md.

Workflow

            PROFILE → OPTIMIZE → BENCHMARK → VALIDATE → PACKAGE
            baseline  quantize/  speedup +   accuracy   deploy-ready
            metrics   convert    compression within tol  artifact

PROFILE     Run the pre-flight checklist (conventions.md). Measure baseline latency (100+ iters +
            warmup), accuracy on the test set, size, and memory. Record all in the state block.

OPTIMIZE    Pick the path from the quantization/pruning decision trees (conventions.md) by target
            device. Apply ONE optimization at a time. (Strategy: quantization-workflows.md;
            conversions: conversion-pipelines.md.) Validate preprocessing compatibility after each conversion.

BENCHMARK   Measure on target hardware when available (set power mode, lock clocks, 5+ min sustained
            for thermal throttling). Report P50/P95/P99, not just mean. Label host-only runs as estimates.

VALIDATE    Compare accuracy against the floor. If outside tolerance → STOP, report exact numbers,
            present alternatives, let the user decide. Never proceed silently past a violation.

PACKAGE     Emit the deployment artifact with benchmark report, preprocessing config, and provenance.

Exit criteria: baseline measured and recorded; optimizations applied one at a time and benchmarked; accuracy within the stated tolerance (or the tradeoff explicitly accepted by the user); deployment artifact packaged with metadata. The original model is untouched.

State Block

<model-opt-state>
phase: PROFILE | OPTIMIZE | BENCHMARK | VALIDATE | PACKAGE
model_name: [name]
source_format: pytorch | tensorflow | onnx | tflite | tensorrt
target_device: jetson-orin-nano | raspberry-pi-5 | raspberry-pi-4 | cpu-generic
baseline_latency_ms: [number or "unmeasured"]
baseline_accuracy: [number or "unmeasured"]
accuracy_tolerance: [percentage, e.g., "2%"]
optimizations_applied: [comma-separated list or "none"]
current_best_latency_ms: [number or "unmeasured"]
current_best_accuracy: [number or "unmeasured"]
original_model_path: [absolute path to original model file]
last_action: [what was just done]
next_action: [what should happen next]
blockers: [any issues]
</model-opt-state>

Output Template

Optimization summary report, tradeoff table — references/output-templates.md.
INT8/FP16 strategy, PTQ vs QAT, calibration requirements, per-layer sensitivity — references/quantization-workflows.md.
PyTorch→ONNX→TensorRT, TF→TFLite, ONNX Runtime optimization, dynamic batching — references/conversion-pipelines.md.
Principle table, KB lookups, pre-flight, decision trees, discipline rules, anti-patterns, error recovery — references/conventions.md.

Integration with Other Skills

| Skill | Relationship | |-------|-------------| | edge-cv-pipeline | After optimizing, build the full inference pipeline (camera capture, pre/postprocessing, result publishing). The optimized model becomes the CV pipeline's inference engine. | | jetson-deploy | After optimizing for Jetson, containerize the deployment, build the TensorRT engine on-device, configure power modes, and set up tegrastats/jtop monitoring. |

michaelalber/model-optimization

skills/professional/model-optimization/SKILL.md

Optimize ML models for edge deployment through quantization, pruning, format conversion (TensorRT/TFLite/ONNX), and accuracy/latency benchmarking. Use when preparing models for resource-constrained devices.

development

Updated Jun 4, 2026

$ install --global

skillsauth

npx skillsauth add michaelalber/ai-toolkit model-optimization

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 4, 2026, 8:40 AM140.0s5 files scanned

SKILL.md

name:: model-optimization
audience:: professional
description:: Optimize ML models for edge deployment through quantization, pruning, format conversion (TensorRT/TFLite/ONNX), and accuracy/latency benchmarking. Use when preparing models for resource-constrained devices.

Model Optimization for Edge Deployment

"Quantization is not about making models worse. It is about finding the representation that preserves what matters while discarding what does not." -- adapted from Benoit Jacob, Google Quantization Team

Core Philosophy

Non-Negotiable Constraints:

BASELINE FIRST — measure the original (latency, accuracy, size, memory) before touching it; without a baseline you cannot quantify improvement or regression.
ACCURACY IS THE CONSTRAINT, LATENCY THE OBJECTIVE — optimize speed subject to an accuracy floor, never the reverse.
ONE CHANGE AT A TIME — apply optimizations sequentially, benchmark after each; compound changes hide regressions.
FORMAT FOLLOWS HARDWARE — TensorRT for Jetson, TFLite for Raspberry Pi, ONNX Runtime for general CPU; never deploy the wrong format.
PRESERVE THE ORIGINAL — never modify or delete the source model; all outputs are new files.

Full principle table, KB lookups, pre-flight checklist, decision trees, discipline rules, anti-patterns, and error recovery live in references/conventions.md.

Workflow

            PROFILE → OPTIMIZE → BENCHMARK → VALIDATE → PACKAGE
            baseline  quantize/  speedup +   accuracy   deploy-ready
            metrics   convert    compression within tol  artifact

PROFILE     Run the pre-flight checklist (conventions.md). Measure baseline latency (100+ iters +
            warmup), accuracy on the test set, size, and memory. Record all in the state block.

OPTIMIZE    Pick the path from the quantization/pruning decision trees (conventions.md) by target
            device. Apply ONE optimization at a time. (Strategy: quantization-workflows.md;
            conversions: conversion-pipelines.md.) Validate preprocessing compatibility after each conversion.

BENCHMARK   Measure on target hardware when available (set power mode, lock clocks, 5+ min sustained
            for thermal throttling). Report P50/P95/P99, not just mean. Label host-only runs as estimates.

VALIDATE    Compare accuracy against the floor. If outside tolerance → STOP, report exact numbers,
            present alternatives, let the user decide. Never proceed silently past a violation.

PACKAGE     Emit the deployment artifact with benchmark report, preprocessing config, and provenance.

State Block

<model-opt-state>
phase: PROFILE | OPTIMIZE | BENCHMARK | VALIDATE | PACKAGE
model_name: [name]
source_format: pytorch | tensorflow | onnx | tflite | tensorrt
target_device: jetson-orin-nano | raspberry-pi-5 | raspberry-pi-4 | cpu-generic
baseline_latency_ms: [number or "unmeasured"]
baseline_accuracy: [number or "unmeasured"]
accuracy_tolerance: [percentage, e.g., "2%"]
optimizations_applied: [comma-separated list or "none"]
current_best_latency_ms: [number or "unmeasured"]
current_best_accuracy: [number or "unmeasured"]
original_model_path: [absolute path to original model file]
last_action: [what was just done]
next_action: [what should happen next]
blockers: [any issues]
</model-opt-state>

Output Template

Optimization summary report, tradeoff table — references/output-templates.md.
INT8/FP16 strategy, PTQ vs QAT, calibration requirements, per-layer sensitivity — references/quantization-workflows.md.
PyTorch→ONNX→TensorRT, TF→TFLite, ONNX Runtime optimization, dynamic batching — references/conversion-pipelines.md.
Principle table, KB lookups, pre-flight, decision trees, discipline rules, anti-patterns, error recovery — references/conventions.md.

Integration with Other Skills

Related Skills

michaelalber/grilling

development

VerifiedTrustedCommunity

Interviews the user relentlessly about a plan, decision, or idea — one question at a time, each with a recommended answer. Shared engine behind "grill-me" and "grill-with-docs". Use on any "grill" trigger phrase or to stress-test thinking. Do NOT use to build the plan; it ends at shared understanding, not implementation.

1SKILL.mdUpdated Jul 23, 2026

michaelalber/grilling

michaelalber/grill-with-docs

testing

VerifiedTrustedCommunity

Runs a relentless interview to sharpen a plan or design, capturing the decisions as ADRs and a glossary along the way. Use when the user wants to be grilled AND wants the session to leave durable domain documentation behind. Do NOT use for a throwaway stress-test with no artifacts; use grill-me instead.

1SKILL.mdUpdated Jul 23, 2026

michaelalber/grill-with-docs

michaelalber/vue-security-review

tools

VerifiedTrustedCommunity

OWASP-based security review of Vue/TypeScript front-ends. Detects framework (Vite/Vue CLI/Nuxt), entry points, and data flows; scans the OWASP Top 10 (2025) mapped to Vue client-side risks (raw-HTML XSS via v-html, URL/protocol injection, bundled secrets, insecure token storage, dependency CVEs, missing CSP, open redirects, router guard bypass); emits an exec summary plus graded findings. Use to audit Vue for vulnerabilities. Not for architecture grading (vue-architecture-checklist).

1SKILL.mdUpdated Jul 20, 2026

michaelalber/vue-security-review

michaelalber/vue-modernization-analyzer

tools

VerifiedTrustedCommunity

Analyzes legacy Vue codebases and produces actionable modernization plans. Primary migration paths include Options API to Composition API, Vue 2 to Vue 3, Vue CLI to Vite, JavaScript to TypeScript, Vue Test Utils/Karma/Mocha to Vitest + Vue Testing Library, legacy Vuex to Pinia, and removed-in-Vue-3 pattern cleanup (filters, event bus, `$listeners`). Does NOT perform the migration — assesses, quantifies risk, and plans.

1SKILL.mdUpdated Jul 20, 2026

michaelalber/vue-modernization-analyzer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/michaelalber/ai-toolkit.git

# Copy into Claude Code skills folder (global)
cp -r ai-toolkit/skills/professional/model-optimization ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

michaelalber/ai-toolkit

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT