skills/spec-verify/SKILL.md
Verify a spec's implementation against its requirements using Serena structural analysis, build verification, and test coverage. Use when the user says "verify the spec", "check spec implementation", "does this match the spec", "spec coverage", "verify acceptance criteria", or invokes /spec-verify; also after /cure to validate the result. Do NOT use for writing code or for general code review — use /age.
npx skillsauth add paulnsorensen/dotfiles spec-verifyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Verify implementation against spec. Trust the symbol graph, not vibes.
You are the spec verification agent. You read a spec document (produced by /spec),
then systematically verify that the implementation satisfies every user story,
functional requirement, and quality gate — using the Serena MCP for structural
verification, builds for correctness, and test analysis for coverage.
A path to a spec file (.claude/specs/<slug>.md). If no path given, list available
specs and ask.
Optional: --quick flag skips Serena deep analysis and test coverage, only checks
quality gates and requirement mapping.
Read the spec file and extract into structured categories:
Build a verification checklist from these. Each item gets a status:
PASS | FAIL | PARTIAL | UNTESTED | SKIPPED
Phase 1 (quality gates) runs first — it's a stop gate. If it passes, spawn Phase 2 (Serena verification) and Phase 3 step 5 (test execution via whey-drainer) in parallel. Phase 3 steps 1-4 (coverage shape analysis) and Phase 4 (acceptance criteria) run sequentially after Phase 2 because they depend on Serena results.
Run quality gate commands from the spec directly — the rtk hook claude PreToolUse hook filters build output automatically. This is the fastest signal — if the build is broken, everything else is moot.
For each quality gate command documented in the spec:
If no quality gates are documented, run the project's default build check (cargo check, tsc --noEmit, go build ./..., uv run mypy ., etc.) as a baseline.
Stop gate: If quality gates fail, report immediately with the failures. Don't waste time on structural analysis of broken code.
For each functional requirement and user story, verify the implementation exists and has the right shape using the Serena MCP — not just file reads.
For each requirement:
Locate implementation — Use mcp__serena__get_symbols_overview and
mcp__serena__find_referencing_symbols to find the symbols that implement
this requirement. Cross-reference with the spec's proposed approach section
for expected file/module locations.
Verify public API — Use mcp__serena__get_symbols_overview on barrel/index
files to confirm expected exports exist. Check that the spec's described
interfaces match what's actually exported.
Trace data flow — Use mcp__serena__find_declaration (the Serena
equivalent of go-to-definition) and mcp__serena__find_referencing_symbols
to verify that data flows through the expected path. If the spec says
"orders calls pricing via the public API", verify that import chain through
Serena.
Check boundary compliance — Verify Sliced Bread rules:
find_referencing_symbols)find_declaration on imports)Serena availability check: Call mcp__serena__get_symbols_overview on the
first source file. If it errors, fall back to ast-grep (sg) for structural
patterns and note degraded confidence.
ast-grep fallback patterns:
# Verify exports
sg --lang typescript -p 'export { $$$NAMES }' --json {file}
sg --lang python -p '__all__ = [$$$NAMES]' --json {file}
# Verify imports cross boundaries correctly
sg --lang typescript -p 'import $$$IMPORTS from "$MODULE"' --json {file}
# Find implementations
sg --lang typescript -p 'class $NAME implements $IFACE { $$$BODY }' --json {file}
sg --lang python -p 'class $NAME($BASE): $$$BODY' --json {file}
Verify that tests exist and cover the spec's requirements. Steps 1-4 analyze coverage shape — mapping test names to requirements via Serena. Step 5 runs the test suite (in parallel with Phase 2, per the Execution Strategy above).
Find test files — Glob for test files in scope:
{scope}/**/*.test.{ts,tsx,js,jsx}
{scope}/**/test_*.py
{scope}/**/*_test.{go,rs}
Map tests to requirements — Use mcp__serena__get_symbols_overview on test files to
extract test names/descriptions. Match against user story IDs and functional
requirements by name, keyword, or described behavior.
Check coverage gaps — For each user story and functional requirement, determine if at least one test covers it. Score:
mcp__serena__find_referencing_symbols from
test to implementation)Red/Green path verification — For each red/green path in the spec, check that a corresponding integration or E2E test exists. These are the most critical coverage items.
Run tests — Spawn a whey-drainer agent to execute the test suite and capture pass/fail counts. If tests fail, include failure details in the report.
For each user story's acceptance criteria (checkbox items):
Structural check — Use Serena to verify the described behavior has a code path. E.g., "user can filter by date" → find a filter function that accepts date parameters.
Test check — Verify a test exercises this specific criterion.
Score the criterion:
PASS — Code path exists (Serena-verified) AND test covers itPARTIAL — Code path exists but no specific test, OR test exists but
code path couldn't be Serena-verifiedFAIL — Neither code path nor test foundUNTESTED — Code path exists (Serena-verified) but zero test coverageEach verification item uses 0-100 confidence scoring:
Step 1: Classify verification type
| Type | Base | Cap | |------|------|-----| | Quality gate (build/lint) | 90 | 100 | | Functional requirement (Serena-verified) | 60 | 100 | | User story acceptance criterion | 50 | 95 | | Test coverage mapping | 40 | 90 | | Boundary/architecture compliance | 45 | 95 |
Step 2: Evidence grounding
| Evidence | Modifier | |----------|----------| | Serena-verified (find_declaration, find_referencing_symbols) | +25 | | ast-grep structural match | +15 | | Test explicitly references requirement | +15 | | File/symbol name matches but not Serena-verified | +5 | | Inferred from file read only (no Serena) | -10 |
Step 3: Context modifiers
| Signal | Modifier | |--------|----------| | Red/green path with no test | -15 | | Public API boundary verified | +10 | | Requirement is vague/ambiguous in spec | -10 | | Multiple tests cover same requirement | +5 |
Step 4: Borderline re-assessment — Items scoring 55-69: re-verify with a
second Serena pass. If scores diverge >15, mark as PARTIAL rather than making
a definitive call.
Surfacing threshold: >= 50 for PASS. < 50 = PARTIAL or FAIL depending on evidence.
Return the full structured report as output. Since this skill runs in a forked context, the full output is already contained — it won't pollute the caller's context window. Lead with the summary, then the details.
## Spec Verification: <spec title>
### Verdict: PASS | PARTIAL | FAIL
<one-sentence summary>
### Quality Gates
| Gate | Status | Detail |
|------|--------|--------|
| cargo check | PASS | clean |
| cargo test | PASS | 42 passed, 0 failed |
### Requirements Coverage
| ID | Requirement | Status | Confidence | Evidence |
|----|-------------|--------|------------|----------|
| FR-1 | Order creation | PASS | 92 | Serena: OrderService.create verified, 3 tests |
| FR-2 | Price calculation | PARTIAL | 65 | Code exists, no test for edge case |
| US-001 | User can place order | PASS | 88 | 4/4 acceptance criteria met |
### Test Coverage
- **Covered**: N requirements
- **Weakly covered**: N requirements
- **Uncovered**: N requirements
- **Test results**: N passed, N failed, N skipped
### Architecture Compliance
- Slice boundary violations: N
- Model purity violations: N
- Import direction violations: N
### Gaps (action required)
1. <most critical gap — what's missing and why it matters>
2. <second gap>
### Below Threshold
N items scored < 50 (not shown above — details in the full report below)
/wreck for that/age for thattest_order_creation maps to
FR "Order creation" but test_helper_utils doesn't map to anything specific.
When in doubt, use mcp__serena__find_referencing_symbols to check if the test actually calls the
implementation.tools
Reconstruct what a past coding-agent session was doing so you can resume it — goal, files touched, last verified state, and the next step — by querying the session logs. Use when the user says "what was I working on", "recover that session", "reconstruct where I left off", "resume my last session", "what did that session change", "rebuild context from logs", or invokes /work-recovery. Report-only — it never scores or judges. Do NOT use for usage scoring (that is /skill-improver, /tool-efficiency, /prompt-analytics) or one-off interactive log queries (that is /session-analytics).
development
Curate this repo's hallouminate wiki (.hallouminate/wiki/, the repo:dotfiles:wiki corpus) — add or update architecture pages, per-harness docs, and gotchas. Use when the user says "update the wiki", "document this in the wiki", "refresh the harness docs", "add a wiki page", "curate the wiki", "the wiki is stale", or invokes /wiki-curator. Also use at session end to write back a non-obvious decision or gotcha worth preserving. Grounds the existing wiki first, follows one-topic-per-file conventions, verifies every external doc URL before writing, and reindexes. Do NOT use for general code search (that is cheez-search) or for editing AGENTS.md command reference.
tools
Audit how a tool, command, or MCP server is actually used across coding-agent sessions and produce calibrated recommendations — tool-vs-task fit, error forensics, fix recommendations, permission friction, MCP health, and token economics. Use when the user says "tool efficiency", "am I using X efficiently", "audit tool usage", "why does X keep failing", "how do I fix this error", "what should I change", "permission friction", "is this MCP worth it", "tool error rate", "fix recommendations", or invokes /tool-efficiency. Do NOT use for auditing a skill or agent definition (that is /skill-improver) or for one-off interactive log queries (that is /session-analytics).
tools
Analyze how prompts and skill routing behave across coding-agent sessions and produce calibrated recommendations — prompt-pattern analysis, routing accuracy, and knowledge gaps. Use when the user says "analyze my prompts", "prompt patterns", "is routing working", "which skill should have fired", "knowledge gaps", "what do I keep asking", or invokes /prompt-analytics. Do NOT use for auditing a single skill/agent definition (that is /skill-improver), tool/MCP efficiency (that is /tool-efficiency), or one-off interactive log queries (that is /session-analytics).