verifying-claims/SKILL.md
--- name: verifying-claims description: Check that a document's claims about code are actually true by reading the prose, the code, and the tests and reporting (or fixing) where they disagree. Use whenever the user wants to verify a README, guide, spec, or docstring still matches the code; whenever they mention documentation drift, doc-code sync, "is this still accurate", stale docs, or keeping docs/tests/code consistent; before publishing or merging a docs change; or as a periodic doc-accuracy
npx skillsauth add oaustegard/claude-skills verifying-claimsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Check that what a document says about code is true, by reading the document, the code, and the tests together and reporting where they disagree.
v0.1 was a comment-DSL: you hand-wrote <!-- claim: ... --> next to prose and
a script checked the comment against the code. That had a fatal gap — the
comment and the prose were two artifacts stapled together, and only the comment
was checked, while humans read the prose. The prose could lie with a green run.
v0.2 drops the DSL. The reviewer is the agent: it reads the prose's meaning directly and compares it to what the code does and what the tests assert. No shadow copy, because the thing being checked is the thing the human reads. (Existing tools already own the alternatives — Gherkin binds executable scenarios, Lean's Verso transcludes facts into prose, TDD couples code to tests. This fills the remaining slot: free-prose documentation, judged.)
This skill does NOT gate merges and is NOT a test framework.
Tests are the anchor. The docs are correct when they agree with what the tests assert about the code. So write/keep good tests first; this skill keeps the prose pinned to them.
scripts/gather_context.py --doc DOC --src SRC --tests TESTS. It ast-parses source (no imports, no execution) and
bundles the document text, the public API surface, and the test inventory.docs/api.md still match pkg/?"Run it at moments that matter — pre-publish, post-refactor, on a docs PR — not on every commit. The deterministic gate is the test suite; this is the layer tests can't reach.
scripts/gather_context.py — deterministic input bundler (doc + API surface +
test inventory), ast-only, no imports.references/drift-report-example.md — what a review report looks like.tools
Query, filter, and transform Markdown structurally with mq — a jq-like CLI for Markdown. Use to extract headings/sections/code-blocks/links from .md files, build a table of contents, pull code blocks of a given language, slice or reshape LLM prompt/output Markdown, or batch-transform docs. Triggers on "extract sections from this markdown", "get all the code blocks", "jq for markdown", "mq", or any structural query over Markdown that grep/Read can't do cleanly.
development
Composes single-file HTML artifacts (PR review writeups, status reports, incident postmortems, slide decks, design systems, prototypes, flowcharts, module maps, feature explainers, kanban boards, prompt tuners) from a small JSON spec instead of hand-written HTML/CSS/JS. Use when the user asks to "compare options side-by-side", requests an HTML version of a report or review or deck, asks for a flowchart, status update, postmortem, design system reference, interactive prototype, custom editor — or explicitly says "HTML artifact", "single HTML file", "self-contained HTML". Skip for ad-hoc HTML snippets (forms, emails, embedded widgets) where there's no template fit.
development
DAG workflow runner that encodes control flow in code, not prose. Use when a procedure has 3+ steps with branching, retries, or validation that must be enforced — gates as `when=`, edge contracts as `validate=`, predicate loops as `retry_until=`. The runner owns the graph; the LLM provides leaves. Also covers parallel execution, checkpoint resume, detached side-effects.
testing
Disciplined, validation-gated revision of an EXISTING skill so each edit is a measured improvement rather than a guess. Use when editing, revising, or tuning a skill that already exists and there is evidence it underperforms (observed failures, drift, complaints) — invoke by name, or have versioning-skills / creating-skill defer to it before applying edits. Not for authoring a brand-new skill from scratch (use creating-skill) or one-off prose.