skills/test-first-development/SKILL.md
Acceptance tests as immutable scope fence — loaded by test-architect and orchestrator to enforce test-before-implementation discipline and completion contracts
npx skillsauth add bostonaholic/team test-first-developmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Acceptance tests define the scope fence. They are written before implementation begins and remain immutable throughout. Implementation is complete when — and only when — all acceptance tests pass.
Write ALL acceptance tests from the plan BEFORE any implementation code.
This is non-negotiable. Tests are the contract between the plan and the code. They answer two questions with certainty:
Follow
skills/progress-tracking/SKILL.md: when this procedure has two or more steps, seed one todo item per step before starting and mark each complete as you go.
The plan's Tests section enumerates every acceptance test by name and description. Write all of them. Use the exact names from the plan. Do not add tests. Do not omit tests. Do not rename tests.
Edge cases are part of the contract. The test list is expected to include boundary values, invalid inputs, failure paths, concurrency, auth, and resource-limit scenarios identified during design. If the test list reads as happy-path only, that is a plan defect — return to the PLAN phase rather than silently filling the gap during implementation.
Run the full test suite after writing all acceptance tests. Every new test must:
A test that errors is not a failing test — it is a broken test. The distinction matters.
When a test errors instead of fails:
These stubs exist solely to make the test runner execute the test. They must be obviously incomplete — empty functions that return nothing, interfaces with no implementation.
Never write implementation code to fix a test error. If the test needs real implementation to even run, the test is testing at the wrong level of abstraction.
Once all tests fail correctly, the test list is IMMUTABLE for the duration of implementation. This means:
If the test list needs to change, that is a plan change. Return to the PLAN phase, update the plan, and get re-approval.
These are the tests written in the TEST-FIRST phase. They verify the feature's external behavior as described in the plan. They are coarse-grained, testing observable outcomes rather than internal mechanics.
The acceptance test list is immutable during implementation.
During IMPLEMENT, the developer may use traditional TDD cycles to build up the implementation:
These step-level tests are implementation details. They can be freely added, modified, or removed during implementation. They are not part of the scope fence.
The distinction: acceptance tests define what must work. Step-level TDD helps build how it works.
These rules govern every acceptance test written under this skill. Each rule catches a different class of test-suite decay.
Tests assert externally observable outcomes. A refactor that preserves behavior must leave every acceptance test green. Tests that break on internal restructuring are change-detectors — they produce noise on every refactor without proving the system works.
Test code is read far more than it is run. Inline the setup a reader needs to understand a failing test. Tolerate duplication; favor a linear arrange-act-assert story.
create_account(BALANCE), not
create_account() that hides the value the assertion checks.if, no loops, no string-building inside a test body — they can carry
the same bug as the code under test.Assert on the specific field the test cares about (account.balance == 2000),
not full equality on a complex object. Reserve full-snapshot assertions for
at most one default test per common case.
Use subset matchers when available (comparingExpectedFieldsOnly,
UnorderedElementsAre, protocmp.FilterField). Brittle failures are a
signal that the test toolkit is missing a matcher — propose one rather than
blaming the test author.
A failing test must be diagnosable from name + assertion output alone, without rerunning.
EXPECT_OK(loadMetadata()) beats EXPECT_TRUE(loadMetadata().ok())
because the failure prints the actual error.assertEqual(actual, expected) beats assert(predicate) because it prints
both values.sendsEmailWhenBalanceIsLow, not
testProcessTransaction_1.Replace every fixed sleep(N) with a wait-for-condition primitive (exist,
not-exist, wait-to-exist with timeout + interval). A fixed sleep both masks
race-condition bugs and pads runtime when the system is fast.
When a test needs a collaborator, prefer in this order:
Default-to-mocking collapses fidelity and produces mock chains that mirror production graphs without surfacing real bugs.
If a vendor type needs to be substituted, wrap it behind your own interface and mock the wrapper. Upstream API changes then ripple through one boundary instead of through every test.
End-to-end tests are expensive — budget about one engineer-week per quarter per E2E test to keep stable. Reserve them for a small list of user goal-plus-task workflows. Do not chase exhaustive E2E coverage.
Features ship into a system; bugs live at the seams. When the design introduces a feature whose behavior overlaps an existing one, the acceptance-test list must include at least one cross-feature interaction test.
Implementation is done when:
If all three conditions hold, the feature is complete as specified. Proceed to VERIFY.
data-ai
Todo-first progress convention for multi-step procedures — loaded by every multi-step agent to track its own steps without drift
testing
Adversarially review a technical design document with fresh context before the human gate. Dispatches the built-in `general-purpose` subagent (clean context, no shared history with the design-author) against `docs/plans/<id>/design.md` and presents its verdict — APPROVE, REQUEST CHANGES, or COMMENT. Optional, not part of the QRSPI pipeline. Trigger on "review the design doc", "audit design.md", "is this design ready", or `/eng-design-doc-review`.
development
Generator-evaluator separation and review methodology — loaded by review agents to enforce fresh-context review discipline, Conventional Comments format, and gate verdicts
data-ai
Prepare one or more isolated git worktrees — one per repository the topic touches. Router action — no agent. Trigger on "set up the worktree", "isolate this work", or "/team-worktree".