skills/verification-discipline/SKILL.md
Use when verifying that completed work actually works. Auto-surface during /verify mode, post-implementation review, or before claiming a task is done. Teaches the discipline of testing outcomes vs implementation, the unit/integration/smoke gradient, and what "done" actually means.
npx skillsauth add microsoft/amplifier-bundle-skills verification-disciplineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Unit tests verify that code-as-written behaves as-written. Smoke and integration tests verify that the system achieves the intended outcome. Those are different questions. You need both.
"All unit tests pass" is necessary. It is rarely sufficient. A finding from the field: four consecutive integration-blocking bugs, all of which passed unit tests, all of which would have been caught by a five-minute smoke test on a fresh environment. The bugs were not exotic — they were the cost of declaring "done" too early.
Treat verification as a ladder. Skip a rung and you discover its bugs in production.
| Tier | What it verifies | Example |
|---|---|---|
| 1. Unit | Code does what I wrote it to do | pytest tests/unit/ |
| 2. Integration | Component pairs interact correctly | pytest tests/integration/, real DB |
| 3. Smoke / E2E | System achieves the user-visible outcome | Fresh DTU launch, run real pipeline, observe artifacts |
| 4. Production-equivalent | Real environment, real load, real data | Staging deployment, canary, replay traces |
Each tier catches bugs the tier below it cannot. Each tier costs more time than the tier below it. The economic choice is not "skip the expensive tiers." The economic choice is "spend five minutes on tier 3 to avoid five hours of rollback."
Before claiming a task is done, satisfy this checklist:
AGENTS.md and .github/PULL_REQUEST_TEMPLATE.md
are satisfied.If any box is unchecked, the work is not done. Say so, explicitly.
Different from classic TDD. TDD writes unit tests first. Tests-from-outcomes writes the outcome assertion first.
1. Before writing implementation, write down the user-observable outcome.
"After running this pipeline, events.jsonl contains a `branch_completed`
event for each branch and no `contract_violation` events."
2. Write a test asserting that outcome. The test runs the real pipeline,
inspects the real events.jsonl, checks the real conditions.
3. Implement code until the test passes.
Both patterns are valuable. Unit-level TDD verifies internal correctness. Outcome-level testing verifies that the system behaves as the user expects. Use both.
Four integration-blocking bugs in four consecutive shippings. All would have been caught by a smoke test on a fresh environment. None were caught by the unit tests that did exist, because the unit tests asked the wrong question.
The fix:
Cultural change is hard. Changing the form is easy. Change the form first.
skills/per-repo-conventions/ — how to discover the specific gates a
given repo requires (AGENTS.md, PR template, CONTRIBUTING.md).foundation:docs/PER_REPO_CONVENTIONS.md — canonical principle for
per-repo discovery.skills/integration-testing-discipline/ — concrete tactics for running
the integration tier (observe first, fix in batches, expect long
durations).tools
Curmudgeonly engineering advisor that provides grounded skepticism, evidence-linked judgment, and constructive progress on architectural decisions, legacy refactors, tooling choices, and broad "how should I start?" questions. Sounds like a senior systems engineer who has reviewed too many designs to be impressed, but still cares about correctness. Use when: architectural decisions, legacy replacements, new tooling evaluation, broad planning questions.
development
Use when starting work in any repository. Auto-surface when an agent is about to write code, create a PR, or verify work. Teaches the discovery pattern for finding and applying per-repo conventions (AGENTS.md, PR templates, CONTRIBUTING.md) before acting.
tools
Use when designing a curl-piped install script for a project that cannot use uv tool install or npm publish — multi-service stacks (Docker Compose), raw TS/React apps, tools that bootstrap system dependencies, or installs for non-technical audiences. Documents the security trade-off, the community convention used by rustup, bun, deno, fly, ollama, and supabase, and the cases where this pattern is the wrong answer.
development
Authoritative consultant for all skills-related questions. Use when creating or modifying skills, understanding the Agent Skills spec, troubleshooting skill loading or invocation issues, leveraging enhanced format features (context fork, model_role, user-invocable), writing cross-harness portable skills, ensuring Claude Code Skills 2.0 compatibility, or deciding between skills vs agents.