skills/rust-profiling/SKILL.md
Rust profiling skill for performance analysis. Use when generating flamegraphs from Rust binaries, measuring monomorphization bloat with cargo-llvm-lines, analysing binary size with cargo-bloat, microbenchmarking with Criterion, or interpreting inlined frames in profiles. Activates on queries about cargo flamegraph, cargo-bloat, cargo-llvm-lines, Criterion benchmarks, Rust performance profiling, or binary size analysis.
npx skillsauth add awfixers-stuff/opencode-config rust-profilingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.
# Release with debug symbols (needed for readable profiles)
# Cargo.toml:
[profile.release-with-debug]
inherits = "release"
debug = true
cargo build --profile release-with-debug
# Or quick: release + debug info inline
CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release
# Install
cargo install flamegraph
# Linux: uses perf (requires perf_event_paranoid ≤ 1)
sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid'
cargo flamegraph --bin myapp -- arg1 arg2
# macOS: uses DTrace (requires sudo)
sudo cargo flamegraph --bin myapp -- arg1 arg2
# Profile tests
cargo flamegraph --test mytest -- test_filter
# Profile benchmarks
cargo flamegraph --bench mybench -- --bench
# Output
# Generates flamegraph.svg in current directory
# Open in browser: firefox flamegraph.svg
Custom flamegraph options:
# More samples
cargo flamegraph --freq 1000 --bin myapp
# Filter to specific threads
cargo flamegraph --bin myapp -- args 2>/dev/null
# Using perf directly for more control
perf record -g -F 999 ./target/release-with-debug/myapp args
perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg
# Install
cargo install cargo-bloat
# Show top functions by size
cargo bloat --release -n 20
# Show per-crate size breakdown
cargo bloat --release --crates
# Include only specific crate
cargo bloat --release --filter myapp
# Compare before/after a change
cargo bloat --release --crates > before.txt
# make changes
cargo bloat --release --crates > after.txt
diff before.txt after.txt
Typical output:
File .text Size Crate Name
2.4% 3.0% 47.0KiB std <std macros>
1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process
1.2% 1.5% 23.1KiB serde serde::de::...
# Install
cargo install cargo-llvm-lines
# Show LLVM IR line counts (proxy for monomorphization)
cargo llvm-lines --release | head -40
# Filter to your crate only
cargo llvm-lines --release | grep '^myapp'
Typical output:
Lines Copies Function name
85330 1 [LLVM passes]
7761 92 core::fmt::write
4672 11 myapp::process::<impl MyTrait for T>
3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop
High Copies count = monomorphization expansion. Fix:
// Before: generic, gets monomorphized for every T
fn process<T: AsRef<[u8]>>(data: T) -> usize {
do_work(data.as_ref())
}
// After: thin generic wrapper + concrete inner
fn process<T: AsRef<[u8]>>(data: T) -> usize {
fn inner(data: &[u8]) -> usize { do_work(data) }
inner(data.as_ref())
}
# Cargo.toml
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
[[bench]]
name = "my_bench"
harness = false
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
fn bench_process(c: &mut Criterion) {
// Simple benchmark
c.bench_function("process 1000 items", |b| {
let data: Vec<i32> = (0..1000).collect();
b.iter(|| process(black_box(&data))) // black_box prevents optimization
});
}
fn bench_sizes(c: &mut Criterion) {
let mut group = c.benchmark_group("process_sizes");
for size in [100, 1000, 10000].iter() {
let data: Vec<i32> = (0..*size).collect();
group.bench_with_input(
BenchmarkId::from_parameter(size),
&data,
|b, data| b.iter(|| process(black_box(data))),
);
}
group.finish();
}
criterion_group!(benches, bench_process, bench_sizes);
criterion_main!(benches);
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench --bench my_bench
# Run with filter
cargo bench -- process_sizes
# Compare with baseline (save/load)
cargo bench -- --save-baseline before
# make changes
cargo bench -- --baseline before
# View HTML report
open target/criterion/report/index.html
# Record
perf record -g ./target/release-with-debug/myapp args
perf record -g -F 999 ./target/release-with-debug/myapp args # higher freq
# Report
perf report # interactive TUI
perf report --stdio --no-call-graph | head -40 # text
# Annotate specific function
perf annotate myapp::hot_function
# stat (quick counters)
perf stat ./target/release/myapp args
Rust-specific perf tips:
debug = 1 (line tables only) for faster builds with line-level attributionRUSTFLAGS="-C force-frame-pointers=yes" for better call graphs without DWARF unwindingsetarch $(uname -m) -R ./myapp# heaptrack (Linux)
heaptrack ./target/release/myapp args
heaptrack_print heaptrack.myapp.*.zst | head -50
# DHAT via Valgrind
valgrind --tool=dhat ./target/debug/myapp args
# Open dhat-out.* with dh_view.html
For flamegraph setup and Criterion configuration, see references/cargo-flamegraph-setup.md.
skills/rust/rustc-basics for build configuration (debug symbols, profiles)skills/profilers/linux-perf for perf fundamentalsskills/profilers/flamegraphs for reading and interpreting flamegraph SVGsskills/profilers/valgrind for allocation profiling with massif/DHATdevelopment
Use when starting dev servers, watchers, tilt, or any process expected to outlive the conversation. Provides zmx session management patterns for long-lived processes.
development
Zig testing skill for writing and running tests. Use when using zig build test, writing comptime tests, using test filters, working with test allocators to detect leaks, or using Zig's built-in fuzz testing (0.14+). Activates on queries about Zig tests, zig test, zig build test, comptime testing, test allocators, Zig fuzz testing, or detecting memory leaks in Zig tests.
development
Zig debugging skill. Use when debugging Zig programs with GDB or LLDB, interpreting Zig runtime panics, using std.debug.print for tracing, configuring debug builds, or debugging Zig programs in VS Code. Activates on queries about debugging Zig, Zig panics, zig gdb, zig lldb, std.debug.print, Zig stack traces, or Zig error return traces.
tools
Zig cross-compilation skill. Use when cross-compiling Zig programs to different targets, using Zig's built-in cross-compilation for embedded, WASM, Windows, ARM, or using zig cc to cross-compile C code without a system cross-toolchain. Activates on queries about Zig cross-compilation, zig target triples, zig cc cross-compile, Zig embedded targets, or Zig WASM.