skills/dev-verify/SKILL.md
This skill should be used when the user asks to 'verify completion', 'check that tests pass', 'confirm feature works', or 'verify the feature is done'.
npx skillsauth add edwinhu/workflows dev-verifyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Announce: "Using dev-verify (Phase 7) to confirm completion with fresh evidence."
Iteration topology: one-shot fresh-subagent verifier (read-only)
Before starting this phase, check remaining context:
| Level | Remaining | Action | |-------|-----------|--------| | Normal | >35% | Proceed | | Warning | 25-35% | Finish the current step, then invoke dev-handoff | | Critical | ≤25% | Invoke dev-handoff immediately — resume fresh |
At Warning/Critical: Read ${CLAUDE_SKILL_DIR}/../../skills/dev-handoff/SKILL.md and follow its instructions.
Load shared enforcement:
Auto-load all constraints matching applies-to: dev-verify:
!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py dev-verify
You MUST have these constraints loaded before proceeding. No claiming you "remember" them.
Dynamic plan re-read: Before starting verification, re-read .planning/SPEC.md to verify against the latest requirements. Do not rely on cached state from prior phases.
The automated test IS your deliverable. The implementation just makes the test pass.
Reframe your task:
The test proves value. The implementation is a means to an end.
Without a REAL automated test (executes code, verifies behavior), you have delivered NOTHING. </EXTREMELY-IMPORTANT>
<EXTREMELY-IMPORTANT> ## The Iron Law of VerificationNO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE. This is not negotiable.
Before claiming ANYTHING is complete, you MUST:
This applies even when:
If you catch yourself about to claim completion without fresh evidence, STOP. </EXTREMELY-IMPORTANT>
| Thought | Why It's Wrong | Do Instead | |---------|----------------|------------| | "It should work" | "Should" isn't evidence | Run the command | | "I'm pretty sure it passes" | Confidence isn't verification | Run the command | | "The agent reported success" | Agent reports need confirmation | Run it yourself | | "I ran it earlier" | Earlier isn't fresh | Run it again | | "The code exists" | Existing ≠ working | Run and check output | | "Grep shows the function" | Pattern match ≠ runtime test | Run the function |
Checkpoint type: decision (user confirms requirements met — cannot auto-advance)
Before making ANY status claim:
1. IDENTIFY → Which command proves your assertion?
2. RUN → Execute the command fresh
3. READ → Review full output and exit codes
4. VERIFY → Confirm output supports your claim
5. CLAIM → Only after steps 1-4
Skipping any step is not verification — it's shipping unverified work the user will have to debug.
| Claim | Required Evidence | |-------|-------------------| | "Tests pass" | Test output showing 0 failures | | "Build succeeds" | Exit code 0 from build command | | "Linter clean" | Linter output showing 0 errors | | "Bug fixed" | Test that failed now passes | | "Feature complete" | All acceptance criteria verified | | "User-facing feature works" | E2E test output showing PASS |
<EXTREMELY-IMPORTANT> ## The E2E Evidence GateUSER-FACING CLAIMS REQUIRE E2E EVIDENCE. Unit tests are insufficient.
| Claim | Unit Test Evidence | E2E Evidence Required | |-------|--------------------|-----------------------| | "API works" | ❌ Insufficient | ✅ Full request/response test | | "UI renders" | ❌ Insufficient | ✅ Playwright snapshot/interaction | | "Feature complete" | ❌ Insufficient | ✅ User flow simulation | | "No regressions" | ❌ Insufficient | ✅ E2E suite passes |
These are NOT E2E tests. They are observability, not verification.
| ❌ Fake E2E | ✅ Real E2E | |-------------|-------------| | "Log shows function was called" | "Screenshot shows correct UI rendered" | | "grep papirus in logs" | "grim screenshot + visual diff confirms icon changed" | | "Console output contains 'success'" | "Playwright assertion: element.textContent === 'Success'" | | "File was created" | "E2E test opens file and verifies contents" | | "Process exited 0" | "Functional test verifies actual output matches spec" | | "Mock returned expected value" | "Real integration returns expected value" |
Red Flag: If you catch yourself thinking "logs prove it works" - STOP, you're about to claim false verification. Logs prove code executed, not that it produced correct results. E2E means verifying the actual output users see.
| Thought | Reality | |---------|---------| | "Unit tests cover it" | Unit tests don't simulate users. Where's YOUR E2E? | | "E2E would be redundant" | YOU'LL catch bugs with redundancy. Write E2E. | | "No time for E2E" | YOU don't have time to fix production bugs? Write E2E. | | "Feature is internal" | Does it affect user output? Then YOU need E2E. | | "I manually tested" | YOU provided no evidence. Automate it. | | "Log checking verifies it works" | YOUR log checking only verifies code executed, not results. Not E2E. | | "E2E with screenshots is too complex" | If YOU can't verify it simply, your feature isn't done. Complexity = bugs hiding. | | "Implementation is done, testing is just verification" | Testing IS YOUR implementation. Untested code is unfinished code. |
For user-facing changes, add to verification:
1. IDENTIFY → Which E2E test proves user-facing behavior?
2. RUN → Execute E2E test fresh
3. READ → Review full output (screenshots if visual)
4. VERIFY → User flow works as specified
5. CLAIM → Only after E2E evidence exists
"Unit tests pass" is not "feature complete" for user-facing changes.
GATE 1: BUILD
GATE 2: LAUNCH (with file-based logging)
GATE 3: WAIT
GATE 4: CHECK PROCESS
GATE 5: READ LOGS ← MANDATORY, CANNOT SKIP
GATE 6: VERIFY LOGS
THEN AND ONLY THEN: E2E tests/screenshots
You cannot skip GATE 5 (READ LOGS). If you catch yourself about to take screenshots without reading logs first, STOP.
For the full gate sequence with examples, discover and read skills/dev-tdd/SKILL.md via cache lookup.
</EXTREMELY-IMPORTANT>
If verification discovers stale or fabricated evidence in LEARNINGS.md, DELETE the contaminated entries. Do not amend false claims — remove them entirely and re-run the verification from scratch.
These do NOT count as verification:
When you say "Feature complete", you are asserting:
Saying "complete" based on stale data or agent reports is not "summarizing" - it is creating a false sense of completion that wastes the user's time.
"Still verifying" protects the user. "Complete" without evidence creates rework. </EXTREMELY-IMPORTANT>
These thoughts mean STOP—you're about to claim falsely:
| Thought | Reality | |---------|---------| | "I just ran it" | "Just" = stale. YOU must run it AGAIN. | | "The agent said it passed" | Agent reports need YOUR confirmation. YOU run it. | | "It should work" | "Should" is hope. YOU run and see output. | | "I'm confident" | YOUR confidence ≠ verification. YOU run the command. | | "We already verified earlier" | Earlier ≠ now. YOU need fresh evidence only. | | "User will verify it" | NO. YOU verify before claiming. User trusts YOUR claim. | | "Close enough" | Close ≠ complete. YOU verify fully. | | "Time to move on" | YOU only move on after FRESH verification. |
STRUCTURAL VERIFICATION IS NOT RUNTIME VERIFICATION:
| ❌ NOT Verification | ✅ IS Verification | |---------------------|-------------------| | "Code exists in file" | "Code ran and produced output X" | | "Function is defined" | "Function was called and returned Y" | | "Grep found the pattern" | "Program output shows expected behavior" | | "ast-grep found the code" | "Test executed and passed with output" | | "Diff shows the change" | "Change tested with actual input/output" | | "Implementation looks correct" | "Ran test, saw PASS in logs" |
The key difference:
If you find yourself saying "the code exists" or "I verified the implementation" without running it, STOP - you're doing structural analysis, not verification.
| Your Drive | Why You Skip | What Actually Happens | The Drive You Failed | |------------|-------------|----------------------|---------------------| | Helpfulness | "Report 'verified' to unblock the user" | The user discovers the failure — your shortcut created rework | Anti-helpful | | Competence | "I'm confident it works without running it" | Confidence without evidence is delusion, not competence | Incompetent | | Efficiency | "Prior tests still pass, skip fresh evidence" | They don't — your assumption is the bug the user discovers | Inefficient |
The protocol is not overhead you pay. It is the service you provide.
# Run tests (e.g., npm test, pytest, cargo test)
npm test
# Check results: "34/34 pass" = can claim tests pass
# "33/34 pass" = cannot claim success (partial fail)
Tool description: Run automated test suite to verify all tests pass
# 1. Write test → run (should fail initially)
# 2. Apply fix → run (should pass)
# 3. Revert fix → run (must fail again to confirm fix)
# 4. Restore fix → run (must pass to confirm success)
Tool description: Execute regression test cycle to validate bug fix reproducibility
npm run build && echo "Exit code: $?"
# Must see "Exit code: 0" to claim success
Tool description: Build application and verify exit code is 0
Before spawning the goal-backward verifier, run the auto-discovering constraint runner:
uv run python3 ${CLAUDE_SKILL_DIR}/../../references/constraints/check-all.py .
If any constraint FAILS: Address the failure before proceeding. Constraint failures are hard blocks — do not proceed to goal-backward verification with failing constraints.
If all constraints PASS: Proceed to goal-backward verification below.
Record the evidence (do not assert coverage from memory): copy the runner's summary line — e.g. N/M passed, K failed, J conventions — into .planning/LEARNINGS.md as the verification record. Coverage is proven by the runner's actual output, not by a count documented elsewhere.
After technical tests pass, spawn the dev-verifier agent to check that phase GOALS were achieved, not just tasks completed:
Tool Restrictions: The verifier is READ-ONLY. It runs tests via Bash and reads output but MUST NOT use Write or Edit.
Agent(subagent_type="workflows:dev-verifier",
allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"],
prompt="""
Verify that the dev workflow goals have been achieved for this feature.
**Tool Restrictions:** You are READ-ONLY. You MUST NOT use Write or Edit tools. You run tests and read output to verify goals — you do NOT modify code. If you find gaps, you report them — the main chat fixes them.
Read these files:
- .planning/SPEC.md (requirements and success criteria)
- .planning/PLAN.md (implementation plan)
- .planning/STATE.md (workflow state)
For each success criterion in SPEC.md, verify with FRESH runtime evidence that the goal was met.
Task completion ≠ goal achievement. A file existing ≠ feature working.
**Trace to Requirements:** For each success criterion, reference its requirement ID (e.g., "AUTH-01: Login returns JWT token — VERIFIED with test output showing..."). This creates end-to-end traceability from SPEC.md through PLAN.md through VALIDATION.md through verification.
Report:
- GOAL: [from SPEC.md success criteria]
- REQUIREMENT: [REQ-ID from SPEC.md]
- STATUS: MET | NOT_MET | PARTIAL
- EVIDENCE: [fresh runtime output proving it]
If ANY goal is NOT_MET, list the specific gaps.
""")
If dev-verifier finds gaps: Return to dev-implement to address them before proceeding to user acceptance. If all goals MET: Proceed to user acceptance below.
Post-subagent boundary (the highest-risk moment). After the verifier returns, main chat is verifying, not investigating — stay inside this line:
| Main chat CAN (verification) | Main chat CANNOT (investigation) |
|------------------------------|----------------------------------|
| Read the verifier's report + LEARNINGS.md | Re-read source files to "double-check" the finding |
| Re-run the test command / check-all.py | Grep/explore the codebase to form a new theory |
| Route gaps back to dev-implement | Edit code to "quickly fix" what the verifier flagged |
If you catch yourself opening source files to re-litigate the verifier's verdict, STOP — that is investigation. Route the gap to dev-implement. (Full rule: auto-loaded verification-vs-investigation constraint, C1b.)
Checkpoint type: decision (user confirms completion — cannot auto-advance)
After technical verification and goal-backward verification pass, confirm with user. Use the AskUserQuestion pattern:
Tool description: Request user confirmation that implementation meets specified requirements
question: "Does this implementation meet your requirements?"
options:
- label: "Yes, requirements met"
description: "Feature works as designed, ready to merge"
- label: "Partially"
description: "Core works but missing some requirements"
- label: "No"
description: "Does not meet requirements, needs more work"
Reference .planning/SPEC.md when asking—remind user of the success criteria they defined.
Log the review pattern (observe → record → offer): after the user answers this acceptance decision, append one line to .planning/LEARNINGS.md recording what the user inspected before deciding — e.g. "ran the app and watched the GUI", "read the test summary only", "asked for a before/after diff", "checked specific acceptance criteria". If the same artifact is requested 3+ times across episodes, offer to bundle a generator script under skills/dev-verify/scripts/. Observe first, automate after the 3rd occurrence — never build speculatively.
When the offer triggers, map the observed request to a concrete artifact:
| If the user keeps asking to… | Consider building | |------------------------------|-------------------| | "see it actually run" | a launch/screenshot script (use visual-verify) | | "see a before/after diff" | tracked-changes / redline view | | "confirm all criteria are met" | acceptance-criteria → evidence coverage table | | "see the test results" | test-summary renderer from the suite output |
The phase offers to run the script — it never forces it.
If user responds "Partially" or "No":
/dev-implement to address gapsNO COMPLETION CLAIMS WITHOUT RE-VERIFICATION AFTER USER FEEDBACK. This is not negotiable.
If the user says "Partially" or "No":
.planning/VERIFY_STATE.md:
iteration: 1
max_iterations: 3
user_feedback: "Partially - missing X"
Escalation: After 3 iterations without "Yes", escalate to user:
Claiming 'verified' after user said 'Partially' without re-running verification is NOT HELPFUL — you're telling the user their problem is solved when it isn't. </EXTREMELY-IMPORTANT>
Only claim COMPLETE when:
Two types of verification required:
Both must pass. No shortcuts exist.
Phase summary (append to LEARNINGS.md):
## Phase: Verify
---
phase: verify
status: completed
implements: [<all v1 REQ-IDs traced to passing evidence>]
requires: [REVIEW_STATE.md, all-tests-passing]
provides: [user-acceptance, workflow-complete]
affects: [] # verification is read-only; no files modified
constraint-check: PASS
goal-backward-verification: all-goals-met
user-verdict: "Yes, requirements met"
---
When user confirms "Yes, requirements met":
Announce: "Dev workflow complete. All 7 phases passed."
The /dev workflow is now finished. Offer to:
.planning/ files/devFresh Evidence Always: Every claim requires proof from a fresh command execution, not cached results or agent reports.
Runtime Over Structural: Verify code works by running it, not by checking if code exists. Structural analysis cannot prove behavior.
E2E for User-Facing: User-visible features require end-to-end evidence (screenshots, user flow tests), not unit tests alone.
Drive-Aligned Framing: Claiming completion without fresh evidence creates rework for the user. Only advance when fully verified.
tools
Use when "query Dewey Data", "deweydata.io", "SafeGraph places/patterns/spend", "Advan foot traffic", "POI / points of interest", "mobility data", "dataplor", "Veraset", "PassBy", "crypto/Bitcoin ATM locations", or any pull from the Dewey Data academic marketplace (UVA/NYU Platform Subscription) via the deweypy/deweydatapy client, DuckDB, or the Dewey MCP server.
testing
Internal skill for literature review and source materialization. Called after brainstorm, before setup. NOT user-facing.
development
Use this skill when the user asks to 'generate a docx', 'create the Word file', 'export to docx', 'apply the law review template', 'build the document', 'make a Word version', or wants to convert their law review markdown drafts into a formatted .docx file.
documentation
This skill should be used when the user asks to 'write a paper', 'start a writing project', 'draft an article', 'write about', 'brainstorm writing topics', 'gather sources for a paper', 'what should I write about', or needs the writing workflow entry point for any writing task.