.claude/skills/perf-pipeline/SKILL.md
Use when /bench-compare or /perf-regression identifies a regression needing root cause, when multiple performance dimensions need simultaneous triage, or when optimization work should be dispatched automatically. Two-phase diagnose-then-optimize pipeline.
npx skillsauth add ahrav/gossip-rs perf-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Two-phase performance team: triage from multiple angles in parallel, then dispatch the right specialist skill for each finding.
/performance-analyzer pass/perf-pipeline [<target>]
--bench <name>: start from Criterion benchmark results--crate <name>: analyze an entire crate's hot pathsLaunch three diagnostic agents in parallel using the Agent tool. Each agent approaches performance diagnosis from a different angle.
Analyze Criterion benchmark data for regressions, outliers, and trends.
Agent prompt template:
You are a Rust performance analyst specializing in benchmark interpretation.
Analyze the target code and any available Criterion benchmark results.
Look for:
- Statistical regressions (>5% median change)
- High variance indicating measurement instability
- Outlier samples suggesting GC pressure or system noise
- Benchmark gaps (hot code paths with no benchmarks)
- Comparison opportunities (before/after data available)
For each finding, report:
- Category: benchmark-regression | benchmark-gap | measurement-instability
- Location: file:line or benchmark name
- Evidence: numbers, percentages, statistical significance
- Impact estimate: how much latency/throughput is affected
- Recommended next step: which tool or skill to use
Target: {target_description}
Run `cargo bench --bench <relevant> -- --list` to discover available benchmarks.
Run benchmarks if needed to gather data.
Analyze code patterns for performance anti-patterns without running anything.
Agent prompt template:
You are a Rust performance analyst specializing in static code analysis for
performance issues. Analyze the target code for anti-patterns.
Check for:
Memory & Allocation:
- Unnecessary allocations in loops (Vec, String, Box)
- Missing with_capacity() for known-size collections
- Cloning where borrowing would suffice
- Large structs passed by value
CPU & Cache:
- False sharing in concurrent data structures
- Cache-unfriendly access patterns (strided, random)
- Branch-heavy code amenable to branchless alternatives
- Missing #[inline] on small hot functions
Async & Concurrency:
- Blocking operations in async contexts
- Lock contention patterns
- Oversized futures
- Unnecessary Arc when ownership would work
Project-specific patterns:
- NONE_U32 = u32::MAX sentinels (avoid Option overhead)
- Allocation tier violations (HOT paths must be allocation-silent)
- ByteSlab/InlineVec/RingBuffer usage opportunities
For each finding, report:
- Category: allocation-hotspot | cache-hostile | lock-contention |
async-blocking | codegen-issue | vectorization-opportunity
- Location: file:line
- Evidence: the specific code pattern
- Severity: Critical (measurable impact) | High (likely impact) |
Medium (potential impact) | Low (minor)
- Recommended fix: actionable change with code sketch
Target: {target_description}
Pre-profiling heuristic scan for likely performance bottlenecks.
Agent prompt template:
You are a Rust performance analyst specializing in hotspot detection. Scan
the target code to find functions and code paths most likely to be performance
bottlenecks, without running profilers.
Heuristics:
- Loop nesting depth and iteration counts
- Allocation density (allocs per iteration)
- Call graph depth in hot paths
- Data structure choice vs access pattern mismatch
- Serialization/deserialization in request paths
- Redundant computation (same value computed multiple times)
- Missed opportunities for short-circuit evaluation
For each hotspot, report:
- Risk level: High | Medium | Low
- Location: file:line (function name)
- Why it's likely hot: evidence from code structure
- Impact estimate: order-of-magnitude guess
- Ease of fix: Easy | Medium | Hard
- Recommended Phase 2 skill:
* /heap-profile — for allocation attribution
* /simd-optimize — for vectorizable loops
* /asm-forge — for codegen quality issues
* /bench-compare — for before/after measurement
* /perf-topdown — for CPU microarchitecture bottlenecks
* /pgo-bolt — for binary layout optimization
* /causal-profile — for critical-path ambiguity
* /linux-perf-profile — for PMU counter evidence
Target: {target_description}
After all three agents complete, merge and classify findings:
| Category | Phase 2 Skill | Description |
|----------|---------------|-------------|
| allocation-hotspot | /heap-profile | Excessive heap allocations in hot path |
| vectorization-opportunity | /simd-optimize | Loop pattern amenable to SIMD |
| codegen-issue | /asm-forge | Missed optimization visible in assembly |
| benchmark-regression | /bench-compare | Needs before/after measurement |
| microarch-bottleneck | /perf-topdown | Cache misses, branch misprediction |
| pgo-candidate | /pgo-bolt | Binary layout optimization opportunity |
| critical-path-unclear | /causal-profile | Hot function may not be on critical path |
| needs-pmu-data | /linux-perf-profile | Need hardware counter evidence |
| general-optimization | /asm-forge | Default: assembly-guided optimization |
Present findings to the user:
## Perf Pipeline — Phase 1 Complete
Found {N} performance findings across {M} files.
### Findings (ranked by impact + convergence)
| # | Risk | Location | Issue | Category | Phase 2 Skill | Type |
|----|------|------------------------|------------------------------|-----------------------|------------------|-------------|
| 1 | High | src/engine/core.rs:42 | Vec alloc in per-claim loop | allocation-hotspot | /heap-profile | Diagnostic |
| 2 | High | src/shard/split.rs:88 | Branchless opportunity | codegen-issue | /asm-forge | Optimization|
| 3 | Med | src/stdx/inline_vec.rs | Loop amenable to NEON SIMD | vectorization-opp | /simd-optimize | Optimization|
| 4 | Med | bench: acquire_restore | 12% regression vs baseline | benchmark-regression | /bench-compare | Measurement |
Approve all? Enter numbers to select, or modify skill assignments:
The user can:
For each approved finding, launch an Agent whose prompt embeds the relevant skill's methodology:
You are a performance specialist applying {skill_name} methodology.
Finding to address:
- Category: {category}
- Location: {file:line}
- Issue: {description}
- Evidence from triage: {evidence}
{Skill-specific methodology and checklist inlined here}
Files you own (only modify these): {file list}
After any code changes, run:
cargo fmt --all && cargo check && cargo clippy --all-targets --all-features -- -D warnings
If diagnostic Phase 2 agents (heap-profile, perf-topdown, etc.) produce findings that change the optimization picture, present an intermediate gate:
## Perf Pipeline — Diagnostic Phase 2 Complete
/heap-profile found: Top allocator is `ShardMap::resize` at 4.2MB/s
/perf-topdown found: 38% of cycles are backend-bound (L3 cache misses)
Updated recommendations:
| # | Location | Original Skill | Updated Skill | Reason |
|---|----------|---------------|---------------|--------|
| 2 | split.rs:88 | /asm-forge | /simd-optimize | Cache-line alignment more impactful |
Proceed with updated plan? Or modify:
## Perf Pipeline — Complete
### Results
| Finding | Phase 2 Skill | Status | Result |
|---------|---------------|-----------|-------------------------------------|
| #1 | /heap-profile | Diagnosed | ShardMap::resize is top allocator |
| #2 | /asm-forge | Optimized | Eliminated branch in split loop |
| #3 | /simd-optimize| Optimized | NEON vectorized InlineVec scan |
| #4 | /bench-compare| Measured | 8% improvement vs baseline |
### Verification
Run to confirm:
cargo fmt --all && cargo check && cargo clippy --all-targets --all-features -- -D warnings
cargo bench --bench <relevant>
/rust-perf-triage — Phase 1 methodology (benchmark data)/performance-analyzer — Phase 1 methodology (static analysis)/rust-hotspot-finder — Phase 1 methodology (hotspot heuristics)/heap-profile /simd-optimize /asm-forge /bench-compare /perf-topdown /pgo-bolt /causal-profile /linux-perf-profile — Phase 2 specialists/review-pipeline — Code quality team pipeline/test-pipeline — Testing team pipelinedevelopment
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.