skills/skill-verification-gate/SKILL.md
Evidence before claims — run verification commands before declaring work complete, fixed, or passing
npx skillsauth add nyldn/claude-octopus skill-verification-gateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Host: Codex CLI — This skill was designed for Claude Code and adapted for Codex. Cross-reference commands use installed skill names in Codex rather than
/octo:*slash commands. Use the active Codex shell and subagent tools. Do not claim a provider, model, or host subagent is available until the current session exposes it. For host tool equivalents, seeskills/blocks/codex-host-adapter.md.
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
If you haven't run the verification command in this turn, you cannot claim it passes.
Before claiming any success or expressing satisfaction:
Skip any step = the claim is unverified.
| Claim | Requires | NOT Sufficient |
|-------|----------|----------------|
| Tests pass | Test command output showing 0 failures | Previous run, "should pass" |
| Build succeeds | Build command exit 0 | Linter passing |
| Bug fixed | Reproduce original symptom: now passes | "Code changed, should work" |
| Regression test works | Red (fail without fix) → Green (pass with fix) | Test passes once |
| Subagent completed task | git diff shows expected changes | Subagent says "done" |
| Requirements met | Line-by-line checklist against spec | Tests passing |
| Provider dispatch worked | Output contains expected content | No error ≠ success |
If you catch yourself thinking any of these, STOP:
| Thought | What to do instead | |---------|-------------------| | "Should work now" | Run the verification | | "I'm confident" | Confidence ≠ evidence | | "Just this once" | No exceptions | | "The linter passed" | Linter ≠ tests ≠ build | | "The agent said it worked" | Verify independently | | "It's a small change" | Small changes cause big bugs |
In Claude Octopus workflows, verification is especially critical because:
After any multi-provider workflow:
# Verify synthesis file exists and is recent
ls -la ~/.claude-octopus/results/*-synthesis-*.md | tail -1
# Verify it has content (not just headers)
wc -l ~/.claude-octopus/results/*-synthesis-*.md | tail -1
ALWAYS before:
In orchestrate.sh workflows:
probe (discover) — verify synthesis file existsgrasp (define) — verify consensus score meets thresholdtangle (develop) — verify tests pass, not just that code was writtenink (deliver) — verify review actually ran, not just that it was dispatched$ npm test
✓ user.create() saves to database (45ms)
✓ user.create() validates email (12ms)
Tests: 2 passed, 2 total
All 2 tests pass. ← Claim backed by output.
I've implemented the feature. It should work now. The tests should pass.
← No test was run. "Should" is not evidence.
1. Write test → run → FAIL (expected, proves test detects the bug)
2. Implement fix → run → PASS (proves fix works)
3. Revert fix → run → FAIL (proves test isn't false-positive)
4. Restore fix → run → PASS (final confirmation)
This skill is referenced by:
flow-develop.md — verification gate after implementationflow-deliver.md — verification gate before deliveryskill-code-review.md — verify review findings before reportingskill-tdd.md — red-green cycle requires evidence at each stepskill-factory.md — autonomous pipeline must verify at every phasetesting
Run a configurable multi-LLM council with personas, budget caps, synthesis, veto gates, and optional implementation handoff.
testing
Evidence before claims — run verification commands before declaring work complete, fixed, or passing
development
Structured four-way AI debates between Claude, Sonnet, Gemini, and Codex — use for critical decisions
development
System architecture and API design with multi-AI consensus — use for design reviews and new subsystems