Performance Operations

Orchestrator for cross-language performance profiling and optimization. Classifies symptoms inline, dispatches profiling to general-purpose agents preloaded with the relevant language -ops skill (background), and manages optimization with confirmation.

Architecture

User describes performance issue or requests profiling
    |
    +---> T1: Diagnose (inline, fast)
    |       +---> Classify symptom (decision tree)
    |       +---> Detect language/runtime from project
    |       +---> Check installed profiling tools
    |       +---> Determine production vs development
    |       +---> Gather system baseline (CPU/mem/disk)
    |       +---> Present: diagnosis + recommended profiling approach
    |
    +---> T2: Profile (dispatch general-purpose + skill preload, background)
    |       +---> Select skill preload from routing table
    |       +---> Build perf-focused dispatch prompt
    |       +---> Agent runs profiler, collects data, interprets results
    |       |       +---> Fallback: tool commands inlined (no skill preload)
    |       +---> Returns: findings + bottleneck identification + suggestions
    |       |
    |       +---> [Optional parallel dispatch]:
    |             +---> CPU profiling agent  ---+
    |             +---> Memory profiling agent --+--> Consolidate findings
    |             +---> Baseline benchmark ------+
    |
    +---> T3: Optimize (dispatch general-purpose + skill preload, foreground + confirm)
            +---> Agent proposes specific code changes
            +---> Preflight: what changes, expected impact, risks
            +---> User confirms
            +---> Apply changes
            +---> Re-benchmark for before/after delta

Safety Tiers

T1: Diagnose - Run Inline

No agent needed. Execute directly via Bash for instant results.

| Operation | Command / Method | |-----------|-----------------| | Detect Python profilers | which py-spy && which memray && which scalene | | Detect Go profilers | which go && go tool pprof -h 2>/dev/null | | Detect Rust profilers | which cargo-flamegraph && which samply | | Detect Node profilers | which clinic && which 0x | | Detect benchmarking tools | which hyperfine && which k6 && which vegeta | | System CPU baseline | top -bn1 -o %CPU \| head -20 (Linux) or wmic cpu get loadpercentage (Win) | | System memory baseline | free -h (Linux) or wmic OS get FreePhysicalMemory (Win) | | Disk I/O check | iostat -x 1 3 (Linux) | | Identify language | Check for package.json, go.mod, Cargo.toml, pyproject.toml, requirements.txt | | Production vs dev | Ask user or detect from environment (NODE_ENV, FLASK_ENV, etc.) | | Read existing profiles | Parse .prof, .svg, .bin files in project |

Production safety rule: In production environments, only recommend sampling profilers (py-spy, pprof HTTP endpoint, perf). Never suggest attaching debuggers, tracing profilers, or tools that require process restart.

T2: Profile - Dispatch to Profiling Agent

Gather context from T1 diagnosis, then dispatch a general-purpose agent preloaded with the relevant language -ops skill plus the perf-ops references below.

Language Routing:

| Detected Language | Dispatch | Preload | Key Profiling Tools | |-------------------|----------|---------|---------------------| | Python (.py, pyproject.toml, requirements.txt) | general-purpose | relevant skills/python-*/SKILL.md + perf-ops references | py-spy, memray, scalene, tracemalloc | | Go (go.mod, .go files) | general-purpose | skills/go-ops/SKILL.md + perf-ops references | pprof (CPU/heap/goroutine/mutex), benchstat | | Rust (Cargo.toml, .rs files) | general-purpose | skills/rust-ops/SKILL.md + perf-ops references | cargo-flamegraph, samply, DHAT, criterion | | TypeScript/JavaScript (backend, package.json + server) | general-purpose | skills/javascript-ops/SKILL.md + perf-ops references | clinic flame/doctor/bubbleprof, 0x | | TypeScript/JavaScript (frontend, bundle issues) | general-purpose | skills/typescript-ops/SKILL.md + perf-ops references | webpack-bundle-analyzer, Lighthouse, source-map-explorer | | SQL / PostgreSQL | general-purpose | skills/postgres-ops/SKILL.md + perf-ops references | EXPLAIN ANALYZE, pg_stat_statements, pgbench | | General / unknown / CLI benchmarking | general-purpose | perf-ops references | hyperfine, perf, strace |

Dispatch template (T2):

You are handling a performance profiling task dispatched by the perf-ops orchestrator.

## Diagnosis (from T1)
- Symptom: {classified symptom from decision tree}
- Language/Runtime: {detected language}
- Environment: {production | development}
- Installed tools: {list from tool detection}
- System baseline: {CPU/memory/disk metrics}

## Profiling Task
{specific profiling request - e.g., "CPU profile the API server under load"}

## Target
- Process/file: {target application or endpoint}
- Expected workload: {how to generate representative load if needed}

## Domain Knowledge
Before starting, read the relevant profiling reference for this language:
- Read: skills/perf-ops/references/cpu-memory-profiling.md

For load testing tasks, also read:
- Read: skills/perf-ops/references/load-testing.md

For database profiling, also read:
- Read: skills/postgres-ops/SKILL.md (if PostgreSQL)

## Instructions
1. Run the appropriate profiler for this language and symptom
2. Collect sufficient samples (minimum 30 seconds for CPU, multiple snapshots for memory)
3. Interpret the results - identify the top 3-5 bottlenecks
4. For each bottleneck: explain what it is, why it's slow, and suggest a fix
5. Report findings in structured format with metrics

Execution mode:

| Scenario | Mode | Why | |----------|------|-----| | User waiting for results | run_in_background=False | They need findings before continuing | | User continuing other work | run_in_background=True | Don't block the main session | | Quick benchmark (hyperfine) | run_in_background=False | Fast enough to wait | | Load test (k6, artillery) | run_in_background=True | Takes minutes |

T3: Optimize - Preflight Required

Dispatch a general-purpose agent (preloaded per the language routing table) with explicit instruction to produce a preflight report before any code changes.

Dispatch template (T3 preflight):

You are handling a performance optimization dispatched by the perf-ops orchestrator.

## Profiling Results (from T2)
{bottleneck findings, metrics, flamegraph interpretation}

## Optimization Request
{specific optimization - e.g., "Fix the N+1 query in UserController.list"}

IMPORTANT: Do NOT apply changes yet. Produce a Preflight Report:
1. Exactly what code/config changes you will make
2. Expected performance improvement (with reasoning)
3. Risks (correctness, side effects, edge cases)
4. How to verify the improvement (specific benchmark or test)
5. How to revert if the optimization causes issues

After user confirms: Re-dispatch with execute authority plus the before/after protocol.

Dispatch template (T3 execute + before/after):

User confirmed the optimization. Proceed with execution.

## Approved Changes
{exact changes from preflight report}

## Before/After Protocol
1. Record the current benchmark baseline: {specific command from T2}
2. Apply the approved changes
3. Run the same benchmark again
4. Report comparison:
   - Metric: before value -> after value (% change)
   - Include statistical confidence if tool supports it
5. If regression detected: revert and report

Parallel Profiling

When multiple independent symptoms are detected, or the user requests comprehensive profiling, dispatch parallel agents.

Parallelizable combinations:

| Agent 1 | Agent 2 | Why Independent | |---------|---------|-----------------| | CPU profiler | Memory profiler | Different tools, different data | | CPU profiler | Baseline benchmark | Read vs measurement | | Backend profiler | Frontend bundle analysis | Different runtimes | | Service A profiler | Service B profiler | Different processes |

NOT parallelizable:

| Operation A | Operation B | Why Sequential | |-------------|-------------|----------------| | Profile | Interpret results | Dependency | | Before benchmark | After benchmark | Requires code change between | | Load test | CPU profile same process | Tool interference |

Dispatch pattern for parallel profiling:

# Example: CPU + memory profiling in parallel
Agent(
    subagent_type="general-purpose",
    model="sonnet",
    run_in_background=True,
    prompt="First read skills/perf-ops/references/cpu-memory-profiling.md "
           "and the relevant language skill (e.g. skills/python-pytest-ops/SKILL.md). "
           "Then: CPU profiling task: {cpu_prompt}"
)
Agent(
    subagent_type="general-purpose",
    model="sonnet",
    run_in_background=True,
    prompt="First read skills/perf-ops/references/cpu-memory-profiling.md "
           "and the relevant language skill (e.g. skills/python-pytest-ops/SKILL.md). "
           "Then: Memory profiling task: {memory_prompt}"
)
# Both run simultaneously, consolidate findings when both complete

Fallback: When No Language Skill Matches

If no language -ops skill covers the target, dispatch general-purpose with profiling commands inlined instead of a skill preload.

Agent(
    subagent_type="general-purpose",
    model="sonnet",
    run_in_background=True,
    prompt="""You are acting as a performance profiling agent for {language}.

Use these specific tools and commands:
{tool commands from diagnosis-quickref.md for the detected language}

{original dispatch prompt}
"""
)

For simple benchmarks (hyperfine, single command timing), skip agent dispatch entirely and run inline via Bash.

Decision Logic

When a performance-related request arrives:

1. Classify the request:
   - Symptom description? -> Start at T1 (diagnose)
   - "Profile my app"? -> T1 (detect language + tools) then T2 (profile)
   - "Benchmark X vs Y"? -> T2 directly (hyperfine or language benchmark)
   - "Optimize this"? -> T2 (profile first) then T3 (optimize)
   - "Why is X slow"? -> T1 (diagnose) then T2 (targeted profile)

2. T1 Diagnose (always runs first for new issues):
   - Detect language/runtime
   - Check installed profiling tools
   - Classify symptom using decision tree (see diagnosis-quickref.md)
   - Determine production vs development
   - Present findings + recommend next step

3. T2 Profile (when diagnosis points to a specific bottleneck):
   - Route per the language routing table (general-purpose + skill preload)
   - Decide foreground vs background
   - Consider parallel dispatch if multiple symptoms
   - Consolidate findings from all agents

4. T3 Optimize (only when user wants changes applied):
   - Always produce preflight report first
   - Wait for explicit user confirmation
   - Execute with before/after comparison
   - Report delta with statistical confidence

Quick Reference

| Task | Tier | Execution | |------|------|-----------| | Detect tools | T1 | Inline | | Check system metrics | T1 | Inline | | Classify symptom | T1 | Inline | | Identify language | T1 | Inline | | Run CPU profiler | T2 | Agent (bg) | | Run memory profiler | T2 | Agent (bg) | | Run load test | T2 | Agent (bg) | | Run benchmark | T2 | Agent (bg or inline for hyperfine) | | Bundle analysis | T2 | Agent (bg) | | EXPLAIN ANALYZE | T2 | Agent (fg) | | Before/after comparison | T2 | Agent (fg) | | Apply optimization | T3 | Agent + confirm | | Add index | T3 | Agent + confirm | | Refactor hot path | T3 | Agent + confirm |

Reference Files

| File | Contents | |------|----------| | references/diagnosis-quickref.md | Decision tree, tool selection matrix, quick references for all profiling domains, common gotchas | | references/cpu-memory-profiling.md | Deep flamegraph interpretation, language-specific CPU/memory profiling guides | | references/load-testing.md | k6, Artillery, vegeta, wrk, Locust methodology and CI integration | | references/optimization-patterns.md | Caching, database, frontend, API, concurrency, memory optimization strategies | | references/ci-integration.md | Performance budgets, regression detection, CI pipeline patterns, benchmark baselines |

Load reference files when deeper tool-specific guidance is needed beyond what the dispatch prompt provides.

Performance Operations

Architecture

User describes performance issue or requests profiling
    |
    +---> T1: Diagnose (inline, fast)
    |       +---> Classify symptom (decision tree)
    |       +---> Detect language/runtime from project
    |       +---> Check installed profiling tools
    |       +---> Determine production vs development
    |       +---> Gather system baseline (CPU/mem/disk)
    |       +---> Present: diagnosis + recommended profiling approach
    |
    +---> T2: Profile (dispatch general-purpose + skill preload, background)
    |       +---> Select skill preload from routing table
    |       +---> Build perf-focused dispatch prompt
    |       +---> Agent runs profiler, collects data, interprets results
    |       |       +---> Fallback: tool commands inlined (no skill preload)
    |       +---> Returns: findings + bottleneck identification + suggestions
    |       |
    |       +---> [Optional parallel dispatch]:
    |             +---> CPU profiling agent  ---+
    |             +---> Memory profiling agent --+--> Consolidate findings
    |             +---> Baseline benchmark ------+
    |
    +---> T3: Optimize (dispatch general-purpose + skill preload, foreground + confirm)
            +---> Agent proposes specific code changes
            +---> Preflight: what changes, expected impact, risks
            +---> User confirms
            +---> Apply changes
            +---> Re-benchmark for before/after delta

Safety Tiers

T1: Diagnose - Run Inline

No agent needed. Execute directly via Bash for instant results.

T2: Profile - Dispatch to Profiling Agent

Gather context from T1 diagnosis, then dispatch a general-purpose agent preloaded with the relevant language -ops skill plus the perf-ops references below.

Language Routing:

Dispatch template (T2):

You are handling a performance profiling task dispatched by the perf-ops orchestrator.

## Diagnosis (from T1)
- Symptom: {classified symptom from decision tree}
- Language/Runtime: {detected language}
- Environment: {production | development}
- Installed tools: {list from tool detection}
- System baseline: {CPU/memory/disk metrics}

## Profiling Task
{specific profiling request - e.g., "CPU profile the API server under load"}

## Target
- Process/file: {target application or endpoint}
- Expected workload: {how to generate representative load if needed}

## Domain Knowledge
Before starting, read the relevant profiling reference for this language:
- Read: skills/perf-ops/references/cpu-memory-profiling.md

For load testing tasks, also read:
- Read: skills/perf-ops/references/load-testing.md

For database profiling, also read:
- Read: skills/postgres-ops/SKILL.md (if PostgreSQL)

## Instructions
1. Run the appropriate profiler for this language and symptom
2. Collect sufficient samples (minimum 30 seconds for CPU, multiple snapshots for memory)
3. Interpret the results - identify the top 3-5 bottlenecks
4. For each bottleneck: explain what it is, why it's slow, and suggest a fix
5. Report findings in structured format with metrics

Execution mode:

T3: Optimize - Preflight Required

Dispatch a general-purpose agent (preloaded per the language routing table) with explicit instruction to produce a preflight report before any code changes.

Dispatch template (T3 preflight):

You are handling a performance optimization dispatched by the perf-ops orchestrator.

## Profiling Results (from T2)
{bottleneck findings, metrics, flamegraph interpretation}

## Optimization Request
{specific optimization - e.g., "Fix the N+1 query in UserController.list"}

IMPORTANT: Do NOT apply changes yet. Produce a Preflight Report:
1. Exactly what code/config changes you will make
2. Expected performance improvement (with reasoning)
3. Risks (correctness, side effects, edge cases)
4. How to verify the improvement (specific benchmark or test)
5. How to revert if the optimization causes issues

After user confirms: Re-dispatch with execute authority plus the before/after protocol.

Dispatch template (T3 execute + before/after):

User confirmed the optimization. Proceed with execution.

## Approved Changes
{exact changes from preflight report}

## Before/After Protocol
1. Record the current benchmark baseline: {specific command from T2}
2. Apply the approved changes
3. Run the same benchmark again
4. Report comparison:
   - Metric: before value -> after value (% change)
   - Include statistical confidence if tool supports it
5. If regression detected: revert and report

Parallel Profiling

When multiple independent symptoms are detected, or the user requests comprehensive profiling, dispatch parallel agents.

Parallelizable combinations:

NOT parallelizable:

Dispatch pattern for parallel profiling:

# Example: CPU + memory profiling in parallel
Agent(
    subagent_type="general-purpose",
    model="sonnet",
    run_in_background=True,
    prompt="First read skills/perf-ops/references/cpu-memory-profiling.md "
           "and the relevant language skill (e.g. skills/python-pytest-ops/SKILL.md). "
           "Then: CPU profiling task: {cpu_prompt}"
)
Agent(
    subagent_type="general-purpose",
    model="sonnet",
    run_in_background=True,
    prompt="First read skills/perf-ops/references/cpu-memory-profiling.md "
           "and the relevant language skill (e.g. skills/python-pytest-ops/SKILL.md). "
           "Then: Memory profiling task: {memory_prompt}"
)
# Both run simultaneously, consolidate findings when both complete

Fallback: When No Language Skill Matches

If no language -ops skill covers the target, dispatch general-purpose with profiling commands inlined instead of a skill preload.

Agent(
    subagent_type="general-purpose",
    model="sonnet",
    run_in_background=True,
    prompt="""You are acting as a performance profiling agent for {language}.

Use these specific tools and commands:
{tool commands from diagnosis-quickref.md for the detected language}

{original dispatch prompt}
"""
)

For simple benchmarks (hyperfine, single command timing), skip agent dispatch entirely and run inline via Bash.

Decision Logic

When a performance-related request arrives:

1. Classify the request:
   - Symptom description? -> Start at T1 (diagnose)
   - "Profile my app"? -> T1 (detect language + tools) then T2 (profile)
   - "Benchmark X vs Y"? -> T2 directly (hyperfine or language benchmark)
   - "Optimize this"? -> T2 (profile first) then T3 (optimize)
   - "Why is X slow"? -> T1 (diagnose) then T2 (targeted profile)

2. T1 Diagnose (always runs first for new issues):
   - Detect language/runtime
   - Check installed profiling tools
   - Classify symptom using decision tree (see diagnosis-quickref.md)
   - Determine production vs development
   - Present findings + recommend next step

3. T2 Profile (when diagnosis points to a specific bottleneck):
   - Route per the language routing table (general-purpose + skill preload)
   - Decide foreground vs background
   - Consider parallel dispatch if multiple symptoms
   - Consolidate findings from all agents

4. T3 Optimize (only when user wants changes applied):
   - Always produce preflight report first
   - Wait for explicit user confirmation
   - Execute with before/after comparison
   - Report delta with statistical confidence

Quick Reference

Reference Files

Load reference files when deeper tool-specific guidance is needed beyond what the dispatch prompt provides.

Adoption

0xDarkMatter/perf-ops

$ install --global

Security Scan Results

SKILL.md

Performance Operations

Architecture

Safety Tiers

T1: Diagnose - Run Inline

T2: Profile - Dispatch to Profiling Agent

T3: Optimize - Preflight Required

Parallel Profiling

Fallback: When No Language Skill Matches

Decision Logic

Quick Reference

Reference Files

See Also

Related Skills

0xDarkMatter/repo-doctor

0xDarkMatter/parallel-ops

0xDarkMatter/fleetflow

0xDarkMatter/threejs-ops

0xDarkMatter/perf-ops

$ install --global

Security Scan Results

SKILL.md

Performance Operations

Architecture

Safety Tiers

T1: Diagnose - Run Inline

T2: Profile - Dispatch to Profiling Agent

T3: Optimize - Preflight Required

Parallel Profiling

Fallback: When No Language Skill Matches

Decision Logic

Quick Reference

Reference Files

See Also

Related Skills

0xDarkMatter/repo-doctor

0xDarkMatter/parallel-ops

0xDarkMatter/fleetflow

0xDarkMatter/threejs-ops