skills/verification-before-completion/SKILL.md
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
npx skillsauth add nhouseholder/nicks-claude-code-superpowers verification-before-completionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Claiming work is complete without verification is dishonesty, not efficiency.
Core principle: Evidence before claims, always.
Violating the letter of this rule is violating the spirit of this rule.
Not every change needs a full verification ceremony:
| Change Type | Verification | Example | |------------|-------------|---------| | Config/text/style only | Mental trace — no command needed | Changed a color, updated a string, edited docs | | Single function, clear logic | One targeted test or quick run | Added a null check, fixed a typo in logic | | Multi-file feature or bug fix | Full verification (run tests, check output) | New endpoint, scoring change, data pipeline fix | | Deploy, migration, auth | Full verification + smoke test | Anything user-facing in production |
The Iron Law still applies to Tier 3-4: No completion claims without fresh verification evidence. But Tier 1-2 don't need a command run — mental verification or a quick check is sufficient.
Before modifying ANY data, values, or calculations, record the current state.
BEFORE: ML = +107.89u (355W-142L), Method = +171.76u (205W-150L), ...
CHANGING: [what you're about to change]
AFTER: [verify these match or improve on BEFORE — any decrease = regression]
If you don't record the BEFORE values, you cannot detect regressions. This is how "fixing combo" breaks ML — you never checked what ML was before your edit.
Before sending ANY response that contains data (tables, stats, P/L, numbers):
Before sending, check that your WORDS match your DATA:
The rule: Triumphant claim + contradicting evidence = don't send it. Fix the data first, or retract the claim.
Quick scan before sending: Does my summary text accurately describe what the data/code/output actually shows? If not, fix one or the other.
When the user's request contains N distinct items (bugs, features, fixes, questions):
The failure pattern: User lists 5 things. Claude fixes #1 and #2, gets absorbed in the work, says "Fixed!" User replies: "What about #3, #4, and #5?" This is entirely preventable by counting.
If you run out of context or hit a blocker before completing all items, say: "Completed items 1-3 of 5. Items 4-5 (X, Y) still need to be addressed." Never claim done when you're partially done.
BEFORE claiming any status or expressing satisfaction:
1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
- If NO: State actual status with evidence
- If YES: State claim WITH evidence
5. ONLY THEN: Make the claim
Skip any step = lying, not verifying
"I edited the code" is NOT verification. Verification means checking the OUTPUT.
For web/UI changes:
For data/table changes:
For algorithm/pipeline changes:
The "I edited the code" trap: This is the #1 failure mode. Claude edits 3 files, says "Fixed! The table now shows all bet types with correct units." But never actually looked at the rendered table. The user opens the page and finds 2 missing columns and wrong math. THIS IS WHAT WE'RE PREVENTING.
| Claim | Requires | Not Sufficient | |-------|----------|----------------| | Tests pass | Test command output: 0 failures | Previous run, "should pass" | | Linter clean | Linter output: 0 errors | Partial check, extrapolation | | Build succeeds | Build command: exit 0 | Linter passing, logs look good | | Bug fixed | Test original symptom: passes | Code changed, assumed fixed | | UI looks right | Screenshot or DOM check of actual page | "I updated the component" | | Table is correct | Verify actual cell values against expected | "I added the column" | | Requirements met | Line-by-line checklist | Tests passing |
| Excuse | Reality | |--------|---------| | "Should work now" | RUN the verification | | "I'm confident" | Confidence ≠ evidence | | "Just this once" | No exceptions | | "Linter passed" | Linter ≠ compiler | | "Agent said success" | Verify independently | | "I'm tired" | Exhaustion ≠ excuse | | "Partial check is enough" | Partial proves nothing | | "Different words so rule doesn't apply" | Spirit over letter |
Tests:
✅ [Run test command] [See: 34/34 pass] "All tests pass"
❌ "Should pass now" / "Looks correct"
Regression tests (TDD Red-Green):
✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
❌ "I've written a regression test" (without red-green verification)
Build:
✅ [Run build] [See: exit 0] "Build passes"
❌ "Linter passed" (linter doesn't check compilation)
Requirements:
✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
❌ "Tests pass, phase complete"
Agent delegation:
✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
❌ Trust agent report
From 24 failure memories:
ALWAYS before:
Rule applies to:
For major completions (features, PRs, deploys), run the full 6-phase verification:
npm run build 2>&1 | tail -20
If build fails → STOP. Fix before continuing.
npx tsc --noEmit 2>&1 | head -30 # TypeScript
pyright . 2>&1 | head -30 # Python
npm run lint 2>&1 | head -30 # JS/TS
ruff check . 2>&1 | head -30 # Python
npm run test -- --coverage 2>&1 | tail -50
Report: total, passed, failed, coverage %.
Check for hardcoded secrets, console.log statements, debug artifacts.
git diff --stat
Review each changed file for unintended changes, missing error handling, edge cases.
VERIFICATION REPORT
==================
Build: [PASS/FAIL]
Types: [PASS/FAIL] (X errors)
Lint: [PASS/FAIL] (X warnings)
Tests: [PASS/FAIL] (X/Y passed, Z% coverage)
Security: [PASS/FAIL] (X issues)
Diff: [X files changed]
Overall: [READY/NOT READY] for PR
For quick fixes and single-file changes, the standard Gate Function is sufficient.
When the user reports the SAME issue again after you claimed it was fixed:
| Attempt | What Happened | Required Response | |---------|--------------|-------------------| | 1st claim → user says "still broken" | Your verification was insufficient | Re-verify with ACTUAL testing, not mental trace. Reproduce the failure first. | | 2nd claim → user says "STILL broken" | You failed to test the real code path | Full reproduce → fix → prove-fixed cycle. Show exact evidence. | | 3rd+ claim → user is frustrated | Something fundamental is wrong with your approach | STOP fixing. Step back. Re-read the entire flow. You're likely fixing symptoms, not the root cause. |
The escalation rule: Each repeated bug report DOUBLES your verification obligation. If mental trace was enough the first time and failed, real testing is required. If real testing failed, reproduce-first is required. Never apply the same level of verification that already failed.
Escalation cap: After 3 escalations (8x base verification), stop escalating and flag to user: "This may be a flaky test or infrastructure issue rather than a code bug. Recommend investigating the test environment." External failures (network, CI infra, flaky tests) do NOT trigger escalation — only genuine code bugs do.
No shortcuts for verification.
Run the command. Read the output. THEN claim the result.
This is non-negotiable.
tools
Unified context management and session continuity skill. Combines total-recall, strategic-compact, /ledger, and session continuity. Runs in background to preserve critical context across compaction and sessions.
tools
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
tools
Suggest /ultraplan for complex planning tasks on Claude Code CLI (2.1.91+ only). Research preview.
tools
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.