.codex/skills/doc-verify/SKILL.md
Verify documentation accuracy against code reality and external claims — runs as a fresh agent after /doc-rigor to prevent confirmation bias
npx skillsauth add ahrav/gossip-rs doc-verifyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Verify that documentation is factually correct against what the code actually does and what external sources actually say. The verifier has zero knowledge of author intent — every claim is checked against code reality and external sources. Trust nothing, verify everything.
/doc-rigor generates or updates documentation/doc-rigor/review-dispatch/dist-sys-auditor/interface-design-review/doc-verify [files]
.rs files in that directoryCRITICAL: Non-negotiable. Every verification runs inside a fresh Task agent
with subagent_type="general-purpose". The invoking agent resolves scope and
dispatches; the fresh agent performs all verification.
Why: The invoking agent may have written or reviewed the documentation. A fresh agent has zero prior context about what the documentation should say — it can only check what it does say against what the code does do. This eliminates confirmation bias.
Never run verification inline. Always dispatch to a fresh agent.
The invoking agent performs these steps before dispatching the verification agent:
git diff --name-only HEAD and git diff --cached --name-only
to find recently changed .rs filesParse any flags from the invocation:
| Flag | Effect |
|------|--------|
| --code-only | Skip external claim verification (Step D) |
| --external-only | Skip code-level verification (Step B), only check external claims |
| --strict | Promote all WARNs to BLOCKs |
| --summary | Output only the summary table, suppress detailed write-ups |
Launch a Task agent with subagent_type="general-purpose". The prompt must include:
These instructions are sent to the fresh verification agent.
Read every doc comment in the target files. Extract every testable claim — any statement that can be verified as true or false against the code or an external source.
Categorize each claim:
| Category | What to look for | |----------|------------------| | Behavioral | "this function returns X when Y", "panics if Z", "calls W internally" | | Invariant | "this value is always positive", "the list is sorted", "monotonically increasing" | | Type/Structural | "contains N fields", "implements trait T", "generic over X" | | Count | "there are N variants", "supports M strategies", "N phases" | | Relationship | "A calls B", "X depends on Y", "Z wraps W" | | Precondition/Postcondition | "caller must ensure X", "on return, Y holds" | | Complexity | "O(n log n)", "amortized O(1)", "linear in the number of X" | | Performance | "zero-copy", "no allocation", "lock-free", "wait-free" | | Safety | "safe because X", "unsafe requires Y", "SAFETY: Z" | | External | "uses algorithm from [paper]", "follows RFC N", "compatible with library X" | | Negative | "does NOT do X", "never Y", "no Z" |
For each claim, record:
For each claim category, verify against the actual code:
Behavioral: Trace the code path. Does the function actually return X when Y? Does it actually panic if Z? Read the implementation, not just the signature.
Invariant: Check constructors, mutation methods, and all code paths that touch the value. Is the invariant enforced everywhere, or only in some paths?
Type/Structural: Count the actual fields, variants, or implementations. Open the
struct/enum definition and count. Check impl blocks for trait implementations.
Count: Count the actual items. If the doc says "5 variants", count the variants. If it says "3 phases", trace the phases. Off-by-one counts are a WARN.
Relationship: Verify call graphs. Does A actually call B? Use Grep to check. Does X actually depend on Y? Check imports and usage.
Precondition/Postcondition: Check if the precondition is enforced (assert/debug_assert) or just documented. Check if the postcondition actually holds by reading the return paths.
Complexity: Verify the algorithm matches the claimed complexity. Count nested loops, check data structure operations. A HashMap lookup claimed as O(1) is fine; a Vec scan claimed as O(1) is a BLOCK.
Performance: "Zero-copy" — check for .clone(), .to_vec(), .to_string() in the
path. "No allocation" — check for Vec::new(), Box::new(), String::from(), etc.
"Lock-free" — check for Mutex, RwLock, or any blocking operations.
Safety: For unsafe blocks, verify the stated safety justification. Does the code
actually uphold the invariants claimed in the // SAFETY: comment?
Negative: These are often the most important claims. "Never panics" — check for
.unwrap(), .expect(), indexing, division. "No allocation" — check for heap
allocations. Verify the absence of what the doc says is absent.
Extract all references to:
For each external claim, determine if web verification is needed:
Always verify:
Skip web research for:
For claims requiring verification, use WebFetch or WebSearch to find
authoritative sources:
Record the verification result:
Assemble the complete findings report using the output format below.
When the verification agent returns:
Three-tier system matching /dist-sys-auditor:
The verification agent must produce this structured report:
# Documentation Verification Report
## Summary
| Metric | Count |
|--------|-------|
| Files verified | N |
| Claims extracted | N |
| Verified correct | N |
| BLOCK | N |
| WARN | N |
| INFO | N |
**Verdict**: PASS / PASS WITH WARNINGS / FAIL
(FAIL if any BLOCKs exist. PASS WITH WARNINGS if WARNs but no BLOCKs.)
## Findings
| # | Severity | File:Line | Category | Claim | Verdict | Evidence |
|---|----------|-----------|----------|-------|---------|----------|
| 1 | BLOCK | path:42 | Behavioral | "returns None on empty input" | WRONG — returns panic | `fn foo()` at line 45: `input[0]` with no empty check |
| 2 | WARN | path:88 | Count | "5 variants" | STALE — now 6 | `enum Bar` at line 12 has 6 variants |
| ... | | | | | | |
## Detailed Findings
### BLOCK-1: [title]
- **Location**: file:line
- **Claim**: "[quoted claim]"
- **Reality**: [what the code actually does]
- **Evidence**: [specific code reference]
- **Suggested fix**: [concrete wording correction]
### WARN-1: [title]
...
## External Claims Verification
| # | Claim | Source Checked | Result | Notes |
|---|-------|---------------|--------|-------|
| 1 | "SipHash is collision-resistant" | [source URL] | Confirmed | ... |
| 2 | "follows RFC 7519 section 4" | [RFC URL] | Partially correct | Section 4.1, not 4 |
## Verification Coverage
| File | Claims Found | Verified | BLOCK | WARN | INFO |
|------|-------------|----------|-------|------|------|
| path/a.rs | 12 | 12 | 0 | 1 | 0 |
| path/b.rs | 8 | 8 | 1 | 0 | 2 |
Documentation most commonly becomes stale after refactoring in these areas. Pay extra attention to:
pub was addedcrate::old::path that was movedFooError" when the error type was renamedRecommended pipeline for documentation quality:
write code
|
v
/doc-rigor (write/improve documentation)
|
v
/doc-verify (verify accuracy — fresh agent, no bias)
|
v
fix findings (manual or /execute-review-findings)
|
v
/doc-verify (re-verify — confirm fixes are correct)
The key property: /doc-rigor and /doc-verify run in separate agents.
The verifier never sees what the writer intended — only what the writer wrote
and what the code does.
/doc-rigor — writes documentation. Doc-verify checks what doc-rigor wrote./dist-sys-auditor — verifies distributed systems claims with citations.
Use dist-sys-auditor for coordination patterns; doc-verify for general doc accuracy./review-dispatch — multi-lens code review. Doc-verify focuses exclusively
on documentation accuracy, going deeper than review-dispatch's docs lens./interface-design-review — checks API ergonomics. Doc-verify checks whether
the API docs accurately describe the API./execute-review-findings — implements fixes. Feed doc-verify BLOCK findings
to execute-review-findings for automated correction./deep-research — gathers evidence. Use before doc-verify when external claims
reference obscure papers or niche standards.development
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.