skills/perf-ops/SKILL.md
Performance profiling and optimization orchestrator - diagnoses symptoms, dispatches skill-preloaded profiling agents, manages before/after comparisons. Triggers on: performance, profiling, flamegraph, pprof, py-spy, clinic.js, memray, heaptrack, bundle size, webpack analyzer, load testing, k6, artillery, vegeta, locust, benchmark, hyperfine, criterion, slow query, EXPLAIN ANALYZE, N+1, caching, optimization, latency, throughput, p99, memory leak, CPU spike, bottleneck.
npx skillsauth add 0xDarkMatter/claude-mods perf-opsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Orchestrator for cross-language performance profiling and optimization. Classifies symptoms inline, dispatches profiling to general-purpose agents preloaded with the relevant language -ops skill (background), and manages optimization with confirmation.
User describes performance issue or requests profiling
|
+---> T1: Diagnose (inline, fast)
| +---> Classify symptom (decision tree)
| +---> Detect language/runtime from project
| +---> Check installed profiling tools
| +---> Determine production vs development
| +---> Gather system baseline (CPU/mem/disk)
| +---> Present: diagnosis + recommended profiling approach
|
+---> T2: Profile (dispatch general-purpose + skill preload, background)
| +---> Select skill preload from routing table
| +---> Build perf-focused dispatch prompt
| +---> Agent runs profiler, collects data, interprets results
| | +---> Fallback: tool commands inlined (no skill preload)
| +---> Returns: findings + bottleneck identification + suggestions
| |
| +---> [Optional parallel dispatch]:
| +---> CPU profiling agent ---+
| +---> Memory profiling agent --+--> Consolidate findings
| +---> Baseline benchmark ------+
|
+---> T3: Optimize (dispatch general-purpose + skill preload, foreground + confirm)
+---> Agent proposes specific code changes
+---> Preflight: what changes, expected impact, risks
+---> User confirms
+---> Apply changes
+---> Re-benchmark for before/after delta
No agent needed. Execute directly via Bash for instant results.
| Operation | Command / Method |
|-----------|-----------------|
| Detect Python profilers | which py-spy && which memray && which scalene |
| Detect Go profilers | which go && go tool pprof -h 2>/dev/null |
| Detect Rust profilers | which cargo-flamegraph && which samply |
| Detect Node profilers | which clinic && which 0x |
| Detect benchmarking tools | which hyperfine && which k6 && which vegeta |
| System CPU baseline | top -bn1 -o %CPU \| head -20 (Linux) or wmic cpu get loadpercentage (Win) |
| System memory baseline | free -h (Linux) or wmic OS get FreePhysicalMemory (Win) |
| Disk I/O check | iostat -x 1 3 (Linux) |
| Identify language | Check for package.json, go.mod, Cargo.toml, pyproject.toml, requirements.txt |
| Production vs dev | Ask user or detect from environment (NODE_ENV, FLASK_ENV, etc.) |
| Read existing profiles | Parse .prof, .svg, .bin files in project |
Production safety rule: In production environments, only recommend sampling profilers (py-spy, pprof HTTP endpoint, perf). Never suggest attaching debuggers, tracing profilers, or tools that require process restart.
Gather context from T1 diagnosis, then dispatch a general-purpose agent preloaded with the relevant language -ops skill plus the perf-ops references below.
Language Routing:
| Detected Language | Dispatch | Preload | Key Profiling Tools |
|-------------------|----------|---------|---------------------|
| Python (.py, pyproject.toml, requirements.txt) | general-purpose | relevant skills/python-*/SKILL.md + perf-ops references | py-spy, memray, scalene, tracemalloc |
| Go (go.mod, .go files) | general-purpose | skills/go-ops/SKILL.md + perf-ops references | pprof (CPU/heap/goroutine/mutex), benchstat |
| Rust (Cargo.toml, .rs files) | general-purpose | skills/rust-ops/SKILL.md + perf-ops references | cargo-flamegraph, samply, DHAT, criterion |
| TypeScript/JavaScript (backend, package.json + server) | general-purpose | skills/javascript-ops/SKILL.md + perf-ops references | clinic flame/doctor/bubbleprof, 0x |
| TypeScript/JavaScript (frontend, bundle issues) | general-purpose | skills/typescript-ops/SKILL.md + perf-ops references | webpack-bundle-analyzer, Lighthouse, source-map-explorer |
| SQL / PostgreSQL | general-purpose | skills/postgres-ops/SKILL.md + perf-ops references | EXPLAIN ANALYZE, pg_stat_statements, pgbench |
| General / unknown / CLI benchmarking | general-purpose | perf-ops references | hyperfine, perf, strace |
Dispatch template (T2):
You are handling a performance profiling task dispatched by the perf-ops orchestrator.
## Diagnosis (from T1)
- Symptom: {classified symptom from decision tree}
- Language/Runtime: {detected language}
- Environment: {production | development}
- Installed tools: {list from tool detection}
- System baseline: {CPU/memory/disk metrics}
## Profiling Task
{specific profiling request - e.g., "CPU profile the API server under load"}
## Target
- Process/file: {target application or endpoint}
- Expected workload: {how to generate representative load if needed}
## Domain Knowledge
Before starting, read the relevant profiling reference for this language:
- Read: skills/perf-ops/references/cpu-memory-profiling.md
For load testing tasks, also read:
- Read: skills/perf-ops/references/load-testing.md
For database profiling, also read:
- Read: skills/postgres-ops/SKILL.md (if PostgreSQL)
## Instructions
1. Run the appropriate profiler for this language and symptom
2. Collect sufficient samples (minimum 30 seconds for CPU, multiple snapshots for memory)
3. Interpret the results - identify the top 3-5 bottlenecks
4. For each bottleneck: explain what it is, why it's slow, and suggest a fix
5. Report findings in structured format with metrics
Execution mode:
| Scenario | Mode | Why |
|----------|------|-----|
| User waiting for results | run_in_background=False | They need findings before continuing |
| User continuing other work | run_in_background=True | Don't block the main session |
| Quick benchmark (hyperfine) | run_in_background=False | Fast enough to wait |
| Load test (k6, artillery) | run_in_background=True | Takes minutes |
Dispatch a general-purpose agent (preloaded per the language routing table) with explicit instruction to produce a preflight report before any code changes.
Dispatch template (T3 preflight):
You are handling a performance optimization dispatched by the perf-ops orchestrator.
## Profiling Results (from T2)
{bottleneck findings, metrics, flamegraph interpretation}
## Optimization Request
{specific optimization - e.g., "Fix the N+1 query in UserController.list"}
IMPORTANT: Do NOT apply changes yet. Produce a Preflight Report:
1. Exactly what code/config changes you will make
2. Expected performance improvement (with reasoning)
3. Risks (correctness, side effects, edge cases)
4. How to verify the improvement (specific benchmark or test)
5. How to revert if the optimization causes issues
After user confirms: Re-dispatch with execute authority plus the before/after protocol.
Dispatch template (T3 execute + before/after):
User confirmed the optimization. Proceed with execution.
## Approved Changes
{exact changes from preflight report}
## Before/After Protocol
1. Record the current benchmark baseline: {specific command from T2}
2. Apply the approved changes
3. Run the same benchmark again
4. Report comparison:
- Metric: before value -> after value (% change)
- Include statistical confidence if tool supports it
5. If regression detected: revert and report
When multiple independent symptoms are detected, or the user requests comprehensive profiling, dispatch parallel agents.
Parallelizable combinations:
| Agent 1 | Agent 2 | Why Independent | |---------|---------|-----------------| | CPU profiler | Memory profiler | Different tools, different data | | CPU profiler | Baseline benchmark | Read vs measurement | | Backend profiler | Frontend bundle analysis | Different runtimes | | Service A profiler | Service B profiler | Different processes |
NOT parallelizable:
| Operation A | Operation B | Why Sequential | |-------------|-------------|----------------| | Profile | Interpret results | Dependency | | Before benchmark | After benchmark | Requires code change between | | Load test | CPU profile same process | Tool interference |
Dispatch pattern for parallel profiling:
# Example: CPU + memory profiling in parallel
Agent(
subagent_type="general-purpose",
model="sonnet",
run_in_background=True,
prompt="First read skills/perf-ops/references/cpu-memory-profiling.md "
"and the relevant language skill (e.g. skills/python-pytest-ops/SKILL.md). "
"Then: CPU profiling task: {cpu_prompt}"
)
Agent(
subagent_type="general-purpose",
model="sonnet",
run_in_background=True,
prompt="First read skills/perf-ops/references/cpu-memory-profiling.md "
"and the relevant language skill (e.g. skills/python-pytest-ops/SKILL.md). "
"Then: Memory profiling task: {memory_prompt}"
)
# Both run simultaneously, consolidate findings when both complete
If no language -ops skill covers the target, dispatch general-purpose with profiling commands inlined instead of a skill preload.
Agent(
subagent_type="general-purpose",
model="sonnet",
run_in_background=True,
prompt="""You are acting as a performance profiling agent for {language}.
Use these specific tools and commands:
{tool commands from diagnosis-quickref.md for the detected language}
{original dispatch prompt}
"""
)
For simple benchmarks (hyperfine, single command timing), skip agent dispatch entirely and run inline via Bash.
When a performance-related request arrives:
1. Classify the request:
- Symptom description? -> Start at T1 (diagnose)
- "Profile my app"? -> T1 (detect language + tools) then T2 (profile)
- "Benchmark X vs Y"? -> T2 directly (hyperfine or language benchmark)
- "Optimize this"? -> T2 (profile first) then T3 (optimize)
- "Why is X slow"? -> T1 (diagnose) then T2 (targeted profile)
2. T1 Diagnose (always runs first for new issues):
- Detect language/runtime
- Check installed profiling tools
- Classify symptom using decision tree (see diagnosis-quickref.md)
- Determine production vs development
- Present findings + recommend next step
3. T2 Profile (when diagnosis points to a specific bottleneck):
- Route per the language routing table (general-purpose + skill preload)
- Decide foreground vs background
- Consider parallel dispatch if multiple symptoms
- Consolidate findings from all agents
4. T3 Optimize (only when user wants changes applied):
- Always produce preflight report first
- Wait for explicit user confirmation
- Execute with before/after comparison
- Report delta with statistical confidence
| Task | Tier | Execution | |------|------|-----------| | Detect tools | T1 | Inline | | Check system metrics | T1 | Inline | | Classify symptom | T1 | Inline | | Identify language | T1 | Inline | | Run CPU profiler | T2 | Agent (bg) | | Run memory profiler | T2 | Agent (bg) | | Run load test | T2 | Agent (bg) | | Run benchmark | T2 | Agent (bg or inline for hyperfine) | | Bundle analysis | T2 | Agent (bg) | | EXPLAIN ANALYZE | T2 | Agent (fg) | | Before/after comparison | T2 | Agent (fg) | | Apply optimization | T3 | Agent + confirm | | Add index | T3 | Agent + confirm | | Refactor hot path | T3 | Agent + confirm |
| File | Contents |
|------|----------|
| references/diagnosis-quickref.md | Decision tree, tool selection matrix, quick references for all profiling domains, common gotchas |
| references/cpu-memory-profiling.md | Deep flamegraph interpretation, language-specific CPU/memory profiling guides |
| references/load-testing.md | k6, Artillery, vegeta, wrk, Locust methodology and CI integration |
| references/optimization-patterns.md | Caching, database, frontend, API, concurrency, memory optimization strategies |
| references/ci-integration.md | Performance budgets, regression detection, CI pipeline patterns, benchmark baselines |
Load reference files when deeper tool-specific guidance is needed beyond what the dispatch prompt provides.
| Skill | When to Combine |
|-------|----------------|
| debug-ops | Root cause analysis for performance regressions |
| monitoring-ops | Production metrics, alerting on latency/throughput |
| testing-ops | Performance regression tests in CI, benchmark suites |
| code-stats | Identify complex code that may be performance-sensitive |
| postgres-ops | PostgreSQL-specific query optimization, indexing, EXPLAIN |
| container-orchestration | Resource limits, pod scaling, container performance |
tools
yt-dlp operations - the media ACQUISITION layer that feeds ffmpeg-ops: format selection (-S sort vs -f filters) that avoids post-download transcodes, --download-sections clip-at-download, audio-only extraction for STT pipelines (-x --audio-format opus), playlists + --download-archive incremental channel syncs, cookies/auth (--cookies-from-browser), rate limiting and politeness, SponsorBlock mark/remove, output templates (-o), subtitle download (--write-subs/--write-auto-subs), remux-vs-recode doctrine, and failure triage (403s, throttling, geo blocks, the nsig-extraction class that means yt-dlp is outdated). Triggers on: yt-dlp, ytdlp, youtube-dl, download video, download youtube, download from youtube, download playlist, download channel, archive channel, channel sync, rip audio, youtube to mp3, youtube to mp4, save video, grab video, video downloader, download subtitles, download transcript, clip from youtube, download section, sponsorblock, cookies-from-browser, download-archive, nsig, requested format is not available, sign in to confirm, download livestream, record stream, live-from-start, premiere, impersonate.
tools
Comprehensive ffmpeg/ffprobe operations - probe-first media processing: transcode and compress (H.264/H.265/AV1/Opus), frame-accurate cut/trim/concat, EDL-driven editing, color grading and .cube LUTs, audio loudnorm and mixing, STT/Whisper audio prep, subtitles, GIF and thumbnails, HLS packaging, hardware encoding (NVENC/QSV/AMF/VideoToolbox), restoration, scene and silence detection, VMAF quality gates, screen capture, yt-dlp interop. Triggers on: ffmpeg, ffprobe, transcode, convert video, compress video, encode video, extract audio, trim video, cut video, concat videos, video to gif, thumbnail, contact sheet, burn subtitles, watermark, resize video, crop video, change fps, slow motion, timelapse, loudnorm, normalize audio, audio for whisper, transcription prep, scene detection, silence detection, remove silence, color grade, LUT, tonemap HDR, vmaf, nvenc, hardware encode, hls, remux, faststart, deinterlace, stabilize video, denoise video, screen record, EDL, keyframes.
development
Payload CMS 3 (Next.js-native) architecture - collections, globals, fields, access control, hooks, Local API, storage adapters, and database (Postgres/MongoDB/SQLite). Use for: payload, payloadcms, payload cms, payload 3, collection config, access control, payload hooks, local api, payload fields, multi-tenant payload, payload nextjs, payload s3, payload r2, payloadcms architecture, headless cms typescript.
testing
Cypress end-to-end and component testing operations - selector/retry-ability strategy, cy.intercept network stubbing, cy.session auth, component vs e2e, flake diagnosis, CI, Test Replay. Use for: cypress, e2e test, component test, cy.get, cy.intercept, cy.session, data-cy, data-test, retry-ability, flake, flaky test, cypress.config, cy.mount, Test Replay, custom commands, fixtures.