ahrav

96 verified skills96 total stars

plan-review

Use when a markdown plan file exists and needs validation before implementation — catches design flaws, logic holes, footguns, unnecessary complexity, and performance concerns while changes are still cheap

devops1

plan-review

devops1

tla-spec

Use when writing or reviewing TLA+ specifications for coordination protocols, when verifying safety/liveness properties of distributed algorithms, or when TLC model checking fails and you need diagnostic guidance. Evidence-backed TLA+ correctness methodology.

development1

archive-src

Use when the user wants a minimal source-only archive for upload or checkpoint. Creates a tar.gz with just source and test files from all workspace crates.

testing1

bench-compare

Use when measuring optimization impact against a baseline, when validating that a code change didn't regress performance, or when comparing two implementation approaches. Criterion benchmark baseline comparison workflow.

development1

deep-research

Use when designing safety-critical code, distributed protocols, or novel algorithms where getting the design wrong is expensive. Parallel research agents survey papers, production systems, and prior art, then synthesize into an evidence-backed codebase plan.

development1

execute-review-findings

Use when you have code review findings, PR comments, or review reports that need to be systematically addressed — especially when there are multiple findings across different files and severities

development1

guide-sync

Use when the learning guide may be stale after codebase changes, when types have been renamed or modules deleted, or when verifying guide code examples still compile. Synchronizes gossip-rs-learning-guide with current codebase state.

development1

jepsen-test

Use when validating coordination correctness under real network partitions, when DST passes but you suspect distributed bugs, or before releasing coordination protocol changes. Runs Jepsen-style cluster tests via Maelstrom or full Jepsen.

testing1

perf-topdown

Use when you need to classify why code is slow (front-end vs back-end vs speculation), when hunting branch misprediction sites, after /bench-compare or /perf-regression finds a regression needing root cause, or when building an isolated hot-loop harness. Cross-arch TMA and branch tracing.

development1

plan-forge

Use when a task needs an implementation plan that is iteratively created and stress-tested through review-and-revise cycles before implementation begins — catches blind spots, incorrect codebase assumptions, unnecessary complexity, and performance pitfalls while changes are still cheap

development1

pr-comment-response

Use when PR review comments claim a bug or incorrect behavior, when multiple reviewer comments need systematic triage, or when a correctness claim needs proof before changing code. Verify-first PR comment response with evidence-based fixes.

development1

rule-optimize

Use when adding or modifying rules in default_rules.yaml, when benchmarking rule performance against test corpuses, or when validating regex anchors and keyword choices. Detection rule edit-bench-compare workflow.

testing1

run-runbook

Use when the user wants to launch a runbook, run an audit, review, or analysis task via Jetty, or says "run the X runbook". Also triggers on "launch runbook", "execute runbook", "run dedup-audit", "run security-reviewer", or any request to run a task from runbooks/.

testing1

sim-run

Use when sanity-checking coordination changes before commit, before merging coordination PRs, when a coordination bug is suspected, or when verifying a simulation-found fix. Progressive DST execution with seed management and fault injection.

testing1

test-consolidate

Use when a test module has many similar unit tests, when repetitive assertions could be replaced by property-based or parameterized tests, or when test maintenance cost is high. Consolidates verbose suites into rstest, proptest, or fuzz tests.

testing1

bench-compare

Run Criterion benchmarks with baseline comparison for performance optimization work

tools1

create-task

Use when creating any beads task — auto-researches the codebase, links related tasks, and produces a rich self-contained description from a structured template. Accepts minimal intent and outputs a complete task ready for agent implementation.

development1

deeper-research

Comprehensive 6-phase research funnel — 8-10 parallel survey agents sweep wide, a synthesizer compiles evidence, deep-dive and adversarial agents run in parallel to elaborate and challenge findings, a final synthesizer reconciles everything, and an integrator maps verified findings to a concrete codebase plan with full traceability

development1

design-tournament

Run a parallel diverge-then-converge design tournament — 3-5 independent agents explore a problem, then 2 ranking agents evaluate and stack-rank the results with confidence scores

data-ai1

interface-design-review

Review Rust interfaces for ease of correct use and resistance to misuse, applying "make interfaces easy to use correctly and hard to use incorrectly"

development1

jepsen-test

Run Jepsen-style cluster tests using Maelstrom (lightweight) or full Jepsen (heavyweight) — validates correctness of the deployed gossip-rs system with real network behavior, complementing in-process DST

development1

performance-analyzer

Analyze Rust code for performance issues, allocation hot spots, and optimization opportunities

development1

plan-forge

development1

pr-comment-response

Respond to PR review comments by building the smallest proof that confirms or refutes the claim before changing code or docs — never blindly trust a reviewer

development1

review-task

Use when a beads task exists and needs validation before implementation — verifies codebase references, identifies edge cases and design flaws, assesses scope and feasibility, splits oversized tasks, dispatches domain-specific skills (test-strategy, unsafe-review, dist-sys-auditor, simd-optimize, asm-forge, performance-analyzer, security-reviewer, interface-design-review, sim-review, safe-over-unsafe) for specialized enrichment, and dispatches /deep-research or /deeper-research for ambiguous areas. The complement of /create-task — ensures tasks are buttoned up and ready for mechanical implementation.

development1

run-fuzz

Run cargo-fuzz targets with proper nightly toolchain and options

tools1

safe-over-unsafe

Use when designing safe public APIs that wrap unsafe Rust code, adding unsafe blocks to existing types, reviewing unsafe code for soundness, or creating new types backed by raw pointers, MaybeUninit, or FFI

development1

sim-review

Simulation-testability code review — enforces DST-compatible patterns in coordination, gossip, and pipeline code based on FoundationDB, TigerBeetle, sled, and Firezone evidence

development1

sim-scaffold

Scaffold simulation-testable modules with sans-IO pattern, proptest state machine tests, and fault injection points — prevents retrofitting costs by making code DST-ready from the start

development1

test-consolidate

Consolidate verbose test suites by replacing repetitive unit tests with property-based tests, parameterized tests (rstest), or fuzz tests. Less code to maintain, same or better coverage.

development1

test-strategy

Assess and recommend the appropriate testing strategy for Rust code - unit tests, parameterized tests (rstest), property-based tests, fuzz tests, Kani model checking, or simulation testing

development1

unsafe-review

Comprehensive review of unsafe code — audits safety invariants, demands benchmark+ASM proof of performance benefit, and verifies Miri/Kani/fuzz/property test coverage for every unsafe block

development1

archive-full

Use when the user wants to package all source code into a tar.gz archive for upload or checkpoint. Creates a comprehensive archive of all workspace crates, docs, and config excluding binaries.

development1

archive-full

Use when the user wants to package all source code into a tar.gz archive for upload or checkpoint. Creates a comprehensive archive of all workspace crates, docs, and config excluding binaries.

development1

asm-forge

ASM-guided deep performance optimization. Collects assembly, audits codegen quality, applies targeted transforms, validates with benchmarks. Uses cargo-show-asm + Criterion as ground truth.

development1

causal-profile

Use when flamegraph/perf profiling identified hot functions but you are unsure which are on the critical path, when optimizing a hot function yields no measurable improvement, when concurrent code has hidden contention or pipeline imbalance, or when you need to prioritize optimization effort across multiple hot spots. Linux-only, synchronous code paths only (not async/Tokio).

development1

create-task

development1

design-doc-audit

Use when design documents in docs/ may be stale after code changes, when verifying boundary specs match current types and APIs, when checking for missing documentation coverage of new crates or features, or before merging branches that touch documented subsystems.

development1

design-doc-audit

development1

dist-sys-auditor

Distributed systems design and implementation auditor — enforces evidence-backed coordination decisions, citation requirements, invariant tracking, and correctness verification against academic literature, battle-tested systems, and the project's locked architectural decisions

testing1

doc-rigor

Write-then-verify documentation pipeline. Use when a user asks to improve comments or docs, explain algorithms or design choices, write or upgrade docstrings, or raise documentation quality for a codebase (especially Rust crates). Writes docs, then automatically verifies every claim against code reality using a fresh agent to eliminate confirmation bias.

development1

doc-verify

Verify documentation accuracy against code reality and external claims — runs as a fresh agent after /doc-rigor to prevent confirmation bias

development1

execute-review-findings

Use when you have code review findings, PR comments, or review reports that need to be systematically addressed — especially when there are multiple findings across different files and severities

development1

invariant-test-review

Review tests to ensure they actually prove the claimed invariant, especially state-machine, simulation, oracle, and regression tests where extra setup, missing negative paths, or order-sensitive comparisons can hide the real signal

testing1

linux-perf-profile

Deep Linux perf profiling — PMU counters, topdown analysis, flamegraphs, and annotated hotspot drill-down on ARM/Graviton

research1

perf-regression

Performance regression testing workflow for hot path changes

testing1

pgo-bolt

Use when optimizing Rust binary performance via profile-guided compilation and post-link layout — squeezing 10-30% from I-cache, branch prediction, and function placement without source changes

development1

review-dispatch

Parallel specialist code review — 6 focused agents (correctness, design, performance, safety, docs, complexity) diverge independently, then a single ranker merges findings into an importance-ranked report with confidence scores

development1

rule-optimize

Workflow for modifying and benchmarking detection rules

tools1

safe-over-unsafe

development1

security-reviewer

Audit memory safety and security in unsafe code blocks, buffer handling, and security-sensitive operations

development1

sim-run

Run deterministic simulation tests with progressive difficulty levels (sunny/stormy/radioactive) inspired by TigerBeetle VOPR — orchestrates seed management, workload selection, and invariant verification

testing1

sqlite-review

Review and tune SQLite schemas, queries, indexes, and pragmas. Connects to the actual database to gather concrete evidence (EXPLAIN QUERY PLAN, page counts, table stats) before recommending changes.

data-ai1

tla-spec

TLA+ specification correctness guide — evidence-backed methodology for writing correct temporal logic specs, covering canonical form, abstraction selection, safety/liveness decomposition, fairness, TLC soundness, and distributed systems patterns, with every rule grounded in literature

testing1

asm-forge

Use when /performance-analyzer identifies a hot function, when /bench-compare shows regression and you need instruction-level analysis, or when you suspect bounds checks or register spills in a tight loop. ASM-guided optimization with cargo-show-asm + Criterion.

development1

.claude/skills/autoresearch

--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id

development1

dist-sys-auditor

Use when adding or modifying coordination protocols, implementing consensus or gossip mechanisms, or changing distributed state management. Audits designs against academic literature and battle-tested systems with citation requirements.

development1

doc-verify

Use when /doc-rigor has written or updated documentation and you need independent accuracy verification, or when existing docs may contain stale API claims or wrong command examples. Fresh-agent verification against code reality.

development1

doc-rigor-verify

Use when writing or updating documentation that makes API claims, includes command examples, or states platform-specific behavior. Write-then-verify pipeline where a fresh agent checks accuracy against code reality with zero confirmation bias.

development1

first-principles

Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,

development1

linux-perf-profile

Use when profiling on Linux/ARM/Graviton targets, when you need PMU counter data beyond what flamegraphs show, or when /perf-topdown identifies a bottleneck class that needs source-level drill-down. Deep perf profiling with annotated hotspot analysis.

data-ai1

perf-pipeline

Use when /bench-compare or /perf-regression identifies a regression needing root cause, when multiple performance dimensions need simultaneous triage, or when optimization work should be dispatched automatically. Two-phase diagnose-then-optimize pipeline.

devops1

perf-regression

Use when modifying hot-path code in coordination or scanner engine, before merging performance-sensitive changes, or when CI benchmarks flag a regression. Performance regression testing with before/after comparison.

development1

postgres-review

Use when designing or auditing PostgreSQL schemas, reviewing migrations for lock safety, investigating query performance, or optimizing indexes and partitioning. Connects to the database for concrete evidence via EXPLAIN ANALYZE and pg_stat_* views.

testing1

review-pipeline

Use when you want review AND automated fixes in one pass, when /review-dispatch alone would leave findings unaddressed, or before merging a feature branch that needs thorough diagnosis and remediation. Two-phase diagnose-then-fix pipeline.

devops1

run-fuzz

Use when testing gossip-contracts or gossip-stdx data structures for crashes, when verifying new Arbitrary impls, or when reproducing a fuzz crash artifact. Runs cargo-fuzz targets with nightly toolchain.

tools1

security-reviewer

Use when modifying unsafe blocks, adding parsing or decoding logic, changing buffer pool or scratch internals, or before merging changes to data structure implementations with raw pointers. Memory safety and security audit.

testing1

sqlite-review

Use when designing or auditing SQLite schemas, investigating slow queries, tuning indexes or pragmas, or reviewing WAL/journal configuration. Connects to the database for concrete evidence via EXPLAIN QUERY PLAN and page stats.

testing1

task-forge

Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research

development1

test-strategy

Use when writing tests for new code and unsure which test type fits, when choosing between unit/rstest/proptest/fuzz/kani/sim, or when coordination or unsafe code changes need test coverage guidance. Recommends the optimal testing approach per code characteristic.

development1

unsafe-review

Use when adding or modifying unsafe blocks, when reviewing code that uses raw pointers or transmute, or before merging changes to types with unsafe internals. Audits safety invariants and demands benchmark+ASM proof of performance benefit.

development1

simd-optimize

Use when /asm-forge shows autovectorization missed opportunities, when hot loops process arrays of bytes or integers, or when porting x86 SIMD to ARM NEON/SVE. Generates platform-specific intrinsics with correctness and performance validation.

testing1

doc-rigor-verify

Write-then-verify documentation pipeline — a doc-rigor agent writes/improves docs, then a separate fresh doc-verify agent checks accuracy of API claims, command examples, units, and platform assumptions against code reality with zero confirmation bias

development1

sim-scaffold

Use when creating a new module in gossip-coordination, adding a gossip protocol component, or building a pipeline stage that touches distributed state. Generates DST-ready boilerplate with sans-IO pattern and proptest harnesses.

development1

test-pipeline

Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.

development1

sim-review

Use when modifying gossip-coordination or coordination contracts, after adding trait methods to CoordinationBackend, after changing state machine transitions, or before marking coordination PRs ready. DST-compatibility code review.

development1

review-dispatch

Use when preparing to merge a feature branch, after completing a significant implementation, or when critical code paths need deeper review than a single pass. Six parallel specialist agents plus ranked synthesis.

development1

test-dedup

Review test suites for duplicate, redundant, or low-value tests — especially unit tests already subsumed by property-based tests. Remove noise, keep signal.

testing1

semantic-search

Use when exploring the codebase conceptually — semantic search via claude-context MCP for queries like "how does X work", "find implementation of Y pattern", "where is the architecture for Z", understanding unfamiliar code, finding code by description rather than exact identifier

tools1

test-dedup

Use when test suites feel bloated, when unit tests duplicate coverage already provided by property-based or simulation tests, or during periodic test hygiene. Identifies and removes redundant tests while keeping signal.

testing1

review-task

development1

invariant-test-review

Use when writing or reviewing state-machine tests, simulation tests, oracle tests, or regression tests to verify they actually prove the claimed invariant. Catches hidden weaknesses like missing negative paths and order-sensitive comparisons.

testing1

design-tournament

Use when facing a design decision with multiple viable approaches, when you want competing proposals evaluated objectively, or when brainstorming needs structured diverge-then-converge evaluation. 3-5 independent design agents plus ranked synthesis.

testing1

archive-src

Use when the user wants a minimal source-only archive for upload or checkpoint. Creates a tar.gz with just source and test files from all workspace crates.

testing1

pr-explainer

Explain a PR's purpose, motivation, and architectural context with ASCII diagrams. Use when the user wants to understand what a PR does, why it exists, how it fits into the system, or asks for a visual summary of changes. Triggers on "explain this PR", "what does this PR do", "summarize this branch", "show me what changed", or `/pr-explainer`.

testing1

semantic-search

tools1

performance-analyzer

Use when writing hot-path code in coordination or scanner engine, before committing changes to scanner-engine modules, when benchmarks show unexpected regressions, or during optimization of gossip-stdx data structures. Static performance analysis.

development1

interface-design-review

Use when designing new public APIs, adding trait methods, or refactoring type signatures to ensure they are easy to use correctly and hard to use incorrectly. Reviews Rust interfaces for misuse resistance.

development1

doc-code-audit

Use when design docs in docs/ may be stale after code changes, when verifying diagrams match current types, or when checking that prose claims about invariants and APIs still hold. Audits documentation against code reality with incremental and full modes.

development1

dedup-audit

Systematic multi-pass code deduplication audit for Rust workspaces. Use when duplication has accumulated across crates, when error boilerplate is excessive, when repeated From/Display/Error impls appear across modules, when onboarding thiserror, or when establishing CI duplication gates. Triggers on "find duplicates", "reduce duplication", "dedup audit", "thiserror migration", "error boilerplate".

development1

create-runbook

Use when the user wants to create a new runbook, write agent instructions for a repeatable task, or says "create a runbook for X", "write a runbook", "new runbook". Also triggers on "make this into a runbook" or converting an existing skill into a runbook.

documentation1

simd-optimize

SIMD vectorization for Rust — detects ISA features, identifies vectorizable patterns, generates platform-specific intrinsics (ARM NEON/SVE, x86 SSE/AVX/AVX-512), validates correctness and performance. Uses tiered research with baked-in references and /deep-research fallback.

development1

deep-research

Deep research before design — 3-5 parallel research agents survey papers, production systems, failure modes, and prior art, then a synthesizer compiles evidence, and an integrator maps findings to a concrete codebase plan with citations

development1

heap-profile

Use when AllocGuard trips and you need the call site, when /performance-analyzer flags allocations but you need attribution, when verifying HOT-tier allocation silence, or when /bench-compare shows regression and you suspect allocation overhead. Heap allocation profiling with DHAT.

testing1

deeper-research

Use when /deep-research isn't thorough enough, when a topic needs adversarial challenge and deep-dive elaboration, or when producing a polished research report for a complex design decision. 6-phase funnel with 8-10 parallel survey agents plus adversarial review.

testing1