plugins/coordinator/skills/verification-before-completion/SKILL.md
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
npx skillsauth add oduffy-delphi/coordinator-claude verification-before-completionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Claiming work is complete without verification is dishonesty, not efficiency.
Core principle: Evidence before claims, always.
Violating the letter of this rule is violating the spirit of this rule.
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
If you haven't run the verification command in this message, you cannot claim it passes.
BEFORE claiming any status or expressing satisfaction:
1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
- If NO: State actual status with evidence
- If YES: State claim WITH evidence
5. ONLY THEN: Make the claim
Skip any step = lying, not verifying
| Claim | Requires | Not Sufficient | |-------|----------|----------------| | Tests pass | Test command output: 0 failures | Previous run, "should pass" | | Linter clean | Linter output: 0 errors | Partial check, extrapolation | | Build succeeds | Build command: exit 0 | Linter passing, logs look good | | Bug fixed | Test original symptom: passes | Code changed, assumed fixed | | Regression test works | Red-green cycle verified | Test passes once | | Agent completed | VCS diff shows changes | Agent reports "success" | | Requirements met | Line-by-line checklist | Tests passing |
| Excuse | Reality | |--------|---------| | "Should work now" | RUN the verification | | "I'm confident" | Confidence ≠ evidence | | "Just this once" | No exceptions | | "Linter passed" | Linter ≠ compiler | | "Agent said success" | Verify independently | | "I'm tired" | Exhaustion ≠ excuse | | "Partial check is enough" | Partial proves nothing | | "Different words so rule doesn't apply" | Spirit over letter |
Two universal rules that apply after any executor or apply-agent dispatch:
Executor and apply-agents consistently under-count their own work in chat (observed repeatedly in distill and architecture-audit runs). After any multi-file executor dispatch:
git diff --stat <expected-path-glob> — treat the diff as ground truth, not the agent's completion report.grep -l "<canonical phrase>" <target-files>. File count alone is not proof — the canonical content must actually appear.After a sequence of Edit calls — especially before claiming a fix is in or before commit — run git diff <file> (or git diff --stat) to confirm the bytes actually moved. Edit returns success on no-ops where the new_string already matched.
Subagents conflate "this is correct now" with "I made it correct." Before reporting fixes applied, executor prompts should include git status --short + git diff --stat; report actual diff stats, not self-narrated counts of intended changes. "No-op, target was already correct" is a valid outcome — and an honest one.
Verification must target the actual side effects of YOUR action. Running an unrelated expensive process ("ran the full test suite, all green") as "verification" of a one-line change is cargo-cult.
grep -l the pattern across those files."I made the edit" without re-Read is an assertion, not evidence.
| Verification Claim | Must Run | Not Sufficient |
|--------------------|----------|----------------|
| N files updated by executor | git diff --stat showing N files | Agent chat summary |
| Canonical phrase applied across files | grep -l "<phrase>" <targets> | "All files processed" |
| One-line bug fix | Re-Read file + grep for change | Full test suite passing |
| Pattern applied consistently | Targeted grep on changed files | Build success |
For batch outputs with a known schema, existence checks are not enough. Prefer a sweep confirming each file contains the canonical block before reporting completion.
Why this is distinct from existence checks: A file can exist and still be schema-nonconformant. In a 64-nation pipeline, 3 nations produced prose-only syntheses (no JSON block) and 2 had non-standard JSON root keys — 5/64 files would have silently passed an existence check.
Sweep pattern:
# Confirm JSON block present
grep -l '```json' outputs/*.md
# Confirm expected root key (jq)
for f in outputs/*.json; do jq -e '.expected_root_key' "$f" > /dev/null || echo "FAIL: $f"; done
Failure modes to check explicitly:
data instead of results, output instead of expected key)Run this sweep before reporting batch completion — not after.
Tests:
✅ [Run test command] [See: 34/34 pass] "All tests pass"
❌ "Should pass now" / "Looks correct"
Regression tests (TDD Red-Green):
✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
❌ "I've written a regression test" (without red-green verification)
Build:
✅ [Run build] [See: exit 0] "Build passes"
❌ "Linter passed" (linter doesn't check compilation)
Requirements:
✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
❌ "Tests pass, phase complete"
Agent delegation:
✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
❌ Trust agent report
From 24 failure memories:
ALWAYS before:
Rule applies to:
Before staging any executor output: (1) run git diff --stat to enumerate changed paths, (2) confirm each path is within the dispatch's declared scope, (3) stash or revert any out-of-scope edits.
Out-of-scope edits are common failure modes: test file deletions, unrelated refactors, autonomous commits the executor made despite instructions. The check is mechanical and must happen before the coordinator reads the diff semantically.
See commands/delegate-execution.md → "Scope-Conformance Check" for the dispatch-prompt clause that enforces this on the executor side.
For any plan with declared acceptance criteria, "done" means more than green tests. Before claiming completion or moving to merge:
agents/vp-product.md) has been dispatched and findings are integrated. YK verdict line is staged for the PR body.coordinator:merging-to-main).This is the bridge between engineering output and PM confidence. "The agent says it's done" is not the gate; this is.
No shortcuts for verification.
Run the command. Read the output. THEN claim the result.
This is non-negotiable.
tools
Orient session — preflight, load context, choose work
documentation
Wrap up finished work — capture lessons, update docs
testing
Use before commit, /merge-to-main, /workday-complete, or to validate repo state. Resolves and runs the project's configured fast-test command.
development
Root-cause discipline for ONE identified bug, test failure, or unexpected behavior — pin the premise, reproduce, trace to source, fix at source, verify. For a single known issue, not a codebase sweep.