plugins/tdd/skills/design-testing-strategy/SKILL.md
Use before writing any type of tests. Distills 14 industry sources into deterministic decision gates, schemas, and worked test examples.
npx skillsauth add NeoLabHQ/context-engineering-kit design-testing-strategyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A reference manual for designing a fit-for-purpose, fit-for-criticality testing strategy.
This skill is decision-oriented, not philosophical: every gate is deterministic (ON when X / OFF when Y), every schema is enforced (field ordering matters), every example is worked end-to-end.
selected_types entry) — the field order is load-bearing.rejected_types and deliberate skips in deliberately_skipped.Apply gates in numeric order. Each gate produces an independent boolean (applies: true|false). Gates do NOT veto each other — a single artifact may have unit + integration + contract + property-based all ON.
| # | Type | ON when | OFF when | Source |
|---|------|---------|----------|--------|
| 0 | Skip All | Criticality is NONE (docs-only, comments, formatting, generated code, config without logic, throwaway prototypes) | Anything with branching, computed output, side effects, or user-visible behavior | Pragmatic Programmer — "Test ruthlessly and effectively" implies effective skipping when ROI is zero |
| 1 | Unit | Code contains any logic: branches, loops, conditionals, computation, transformation, parsing, validation, formatting | Pure declarative wiring (DI registration, route table) with no behavior | Test Pyramid (Vocke) base layer + Beck TDD Red-Green-Refactor unit |
| 2 | Integration | Boundary crossing: HTTP call, DB query, external SDK, message queue, filesystem I/O, OR collaboration with >=2 distinct collaborators where unit doubles distort behavior | Pure function with no I/O and 0-1 stable collaborators | Testing Trophy (Dodds) — integration is the highest-ROI layer; Google "Follow the User" |
| 3 | Component or E2E | UI surface AND criticality >= MEDIUM-HIGH AND user-facing critical path (signup, checkout, auth, payment, primary CTA) | Internal admin-only screens, dev tooling, or non-critical UI | Test Pyramid top + ISO/IEC/IEEE 29119 risk ranking + Google e2e principles |
| 4 | Contract | Public API consumed by >=1 distinct clients (mobile + web, multiple internal services, external partners) AND independent deploy cadence | API where consumer and provider deploy together | Pact / CDC + Pactflow CDC explainer |
| 5 | Smoke | Deployable surface (web app, API, service) AND a deploy/CI pipeline exists where post-deploy validation is meaningful | Library, internal helper, or no deploy pipeline | Google "What Makes a Good End-to-End Test" — smoke = minimal e2e for deploy gate |
| 6 | Property-Based | Input domain is large or unbounded (numeric ranges, strings, lists, parsers, serializers, encoders, math) AND invariants are stable (round-trip, idempotency, monotonicity, commutativity) AND criticality >= MEDIUM-HIGH | Small finite input domain, unstable invariants, or LOW criticality | Hypothesis / QuickCheck |
for gate in [Gate 0, Gate 1, ..., Gate 6]:
if gate.ON_condition_met(artifact):
result[gate.type] = applies: true
else:
result[gate.type] = applies: false
if Gate 0 is true:
short-circuit: emit empty selected_types, document criticality=NONE, stop
Criticality Scale (used by Gates 3 and 6):
| Level | Definition |
|-------|------------|
| NONE | Docs, formatting, generated code, throwaway code, configs without logic |
| LOW | Internal dev tooling, admin-only screens, logging formatters |
| MEDIUM | Standard CRUD, internal APIs with a single team consumer, non-critical UI, helpers and utilities |
| MEDIUM-HIGH | User-facing UI on critical paths, public APIs with multiple consumers, business workflows |
| HIGH | Money movement, auth/authz decisions, security-critical validation, data integrity, regulated domains |
| Type | Use when | Do NOT use when | Frameworks | Typical dependencies | Google Size | |------|----------|-----------------|------------|----------------------|-------------| | unit | Pure logic, single function/method/class, deterministic inputs | Code is just I/O orchestration with no logic | vitest, jest, pytest, go test, JUnit, xUnit, RSpec | None (or in-memory fakes) | Small | | integration | Boundary crossing (DB, HTTP, queue, FS); multiple collaborators where mocking distorts behavior | Pure function with no boundary | vitest, jest, pytest, go test, JUnit + Testcontainers, supertest, TestRestTemplate | Real Postgres/Redis/Kafka via Testcontainers, in-process HTTP server, real FS in tmpdir | Medium (single machine, localhost OK) | | component | UI rendering + interaction within a single component, no full app context | Backend-only logic; multi-page user flow | React Testing Library, Vue Test Utils, Angular TestBed, Storybook interaction tests | jsdom or happy-dom, mocked network at fetch/axios level | Small to Medium | | e2e | Full user path through running app: real browser, real backend, real DB | Internal helper, single component, non-critical UI | Playwright, Cypress, Selenium | Real running app + Testcontainers-backed DB or seeded staging | Large (multi-process, possibly multi-machine) | | smoke | Post-deploy go/no-go: hit / health, key endpoints respond, login works | Detailed correctness; smoke is shallow by design | Playwright (1-3 critical paths), HTTP probe scripts, k6 minimal scenarios | Real deployed environment | Large | | contract | Public API consumed by 2+ distinct clients with independent deploy cadence | Single-consumer internal API; provider and consumer deploy together | Pact, Spring Cloud Contract, OpenAPI schema validators | Pact broker or contract files in repo | Medium | | property-based | Large/unbounded input domain with stable invariants (parser, serializer, encoder, math) | Small finite input space; unstable invariants | Hypothesis (Python), fast-check (TS), QuickCheck (Haskell), jqwik (Java), proptest (Rust) | Same as unit | Small |
Google Test Sizes (Bland) and SWE at Google Ch.11 classify tests by resources (size), independent of scope (paths covered):
| Size | Process model | Network | Filesystem | Time budget | Notes |
|------|---------------|---------|------------|-------------|-------|
| small | Single process, single thread | None | None (in-memory only) | < 100ms | Fast, hermetic, parallelizable |
| medium | Single machine, multiple processes allowed | localhost only | tmpdir allowed | < 1s | Testcontainers fits here |
| large | Multi-machine | External network allowed | Persistent FS allowed | < 15min | Full e2e |
| enormous | Distributed | Wide network | Anywhere | longer | Cluster / chaos |
A test's type (unit/integration/e2e) and size (small/medium/large) are orthogonal: a small integration test (Testcontainers Postgres in same process via JDBC) is legitimate.
| Dimension | Playwright | Cypress | |-----------|---------------------------------------|-----------------------------------| | Browsers | Chromium, Firefox, WebKit | Chromium, Firefox, WebKit (limited) | | Multi-tab / multi-origin | Yes | Limited | | Parallelism | Built-in shards | Paid dashboard or external | | Network interception | Robust route-level | cy.intercept | | Default | Choose Playwright for new projects unless team already standardized on Cypress | Choose Cypress when team has heavy investment |
Use ISTQB Foundation Level black-box techniques to derive what to test inside each chosen test type. References: ISTQB BVA white paper, ASTQB black-box techniques.
Divide input domain into partitions where the system is expected to behave the same way; ONE test per partition is sufficient.
Worked example — discount(orderTotal: number) -> number:
| Partition | Range | Representative test input | Expected |
|-----------|-------|---------------------------|----------|
| Below threshold | 0 <= total < 100 | 50 | 0% discount |
| Mid tier | 100 <= total < 500 | 250 | 5% discount |
| Top tier | total >= 500 | 1000 | 10% discount |
| Invalid (negative) | total < 0 | -1 | throw / error |
Four tests cover all partitions. EP alone misses boundaries — combine with BVA.
Bugs cluster at boundaries. For every boundary value B, test B-1, B, B+1 (or for floats, the smallest representable step).
Worked example — same discount function, boundary at 100:
| Test input | Why | Expected |
|------------|-----|----------|
| 99 (= B-1) | Last value of "below threshold" partition | 0% discount |
| 100 (= B) | First value of "mid tier" partition | 5% discount |
| 101 (= B+1) | Confirms not off-by-two | 5% discount |
Repeat for boundary at 500: test 499, 500, 501. Total: 6 boundary tests + 4 EP tests = 10 cases.
The B-1 / B / B+1 triplet has the same shape across boundaries (vary input, vary expected output, identical assertion); this is a natural fit for a table-driven test (see sub-section 5 below).
When output depends on combinations of conditions. Each column is a rule.
Worked example — canCheckout(cartHasItems, paymentValid, addressOnFile):
| Condition / Rule | R1 | R2 | R3 | R4 | |------------------|----|----|----|----| | cartHasItems | T | T | T | F | | paymentValid | T | T | F | * | | addressOnFile | T | F | * | * | | Result | allow | block:address | block:payment | block:cart |
Four tests, one per rule (* = don't care, dropped via merging).
When behavior depends on history. Identify states, events, and forbidden transitions.
Worked example — Order state machine with states {draft, submitted, paid, shipped, cancelled}:
| From | Event | To | Test | |------|-------|----|----| | draft | submit | submitted | happy path | | submitted | pay | paid | happy path | | paid | ship | shipped | happy path | | draft | cancel | cancelled | early cancel | | paid | cancel | reject | forbidden — refund flow required, NOT direct cancel | | shipped | submit | reject | forbidden |
Cover one test per legal transition + one per forbidden transition (negative path).
When EP, BVA, or decision-table analysis yields 3+ cases with the same shape (same setup, same assertion, only inputs and expected outputs differ — e.g., parsing valid/invalid date formats; computing tax across brackets; routing rules) collapse them into a single table-driven test. The cases become rows in a data table; the test body iterates the rows and runs one assertion per row. References: Dave Cheney, Prefer table-driven tests; Go wiki: TableDrivenTests.
Do NOT force a table when setup, framework calls, or the assertion shape varies substantially across cases. Forced uniformity hides real differences behind a single name and produces obscure failure messages — keep those as separate, individually named tests.
Worked example — six EP+BVA cases for discount(orderTotal) (boundary at 100) collapsed into one table-driven unit test (TS / vitest syntax; the same pattern applies to Go t.Run, JUnit @ParameterizedTest, pytest parametrize):
describe("discount", () => {
const cases: Array<{ name: string; input: number; expected: number }> = [
{ name: "EP: below threshold (typical)", input: 50, expected: 0 },
{ name: "BVA: B-1 at boundary 100", input: 99, expected: 0 },
{ name: "BVA: B at boundary 100", input: 100, expected: 0.05 },
{ name: "BVA: B+1 at boundary 100", input: 101, expected: 0.05 },
{ name: "EP: mid tier (typical)", input: 250, expected: 0.05 },
{ name: "EP: top tier (typical)", input: 1000, expected: 0.10 },
];
for (const c of cases) {
it(c.name, () => {
expect(discount(c.input)).toBe(c.expected);
});
}
});
The name column is mandatory: each row must produce an individually addressable test so failures point to the specific case, not "row 3 of 6". Rows that need a different assertion (e.g., the negative-input case throws) stay as separate tests outside the table.
For Gate 2 (Integration) and Gate 3 (Component/E2E), choose dependencies deliberately. The goal is maximum realism that still runs deterministically in CI.
| Dependency style | Use when | Avoid when | Notes | |------------------|----------|------------|-------| | Real infra via Testcontainers | DB/Redis/Kafka/Browser, dev needs real driver behavior, hermetic CI required | Cold-start budget < 1s, no Docker available | Default for integration tests on Postgres / Redis / Kafka / Localstack | | In-memory fake | Owned interface, semantics are simple (key-value, list), test speed critical | Fake diverges from real — silent bugs at integration boundary | Acceptable for repository ports in hexagonal architectures, IF the port has its own contract test against real infra | | Mock (test double) | Single collaborator with pure interface; test focuses on protocol (was X called with Y) | You're mocking >2 collaborators or mocking data structures (anti-pattern: incomplete mocks) | Mocks are tools to isolate, not things to test | | Stubbed HTTP | Calling external SaaS where Testcontainers / Localstack option doesn't exist | When Pact / CDC is needed (use contract tests instead) | nock (Node), responses (Python), WireMock (JVM) | | Real external service | Smoke test in staging only | Unit / integration / CI — always non-deterministic | Reserve for smoke tests against staging |
Tradeoff summary: Testcontainers > in-memory fake > mock, but cost goes the same direction. Pick the cheapest level that doesn't lie about the boundary's behavior.
Explicit "don't bother" rules. Skipping these is not laziness — it is risk-adjusted ROI per ISO/IEC/IEEE 29119 risk-based testing and Risk-Based Testing.
| Skip | Rule |
|------|------|
| No e2e for internal helpers | If artifact has no UI surface and no user-facing path, skip e2e. Unit + integration is sufficient. |
| No contract test for bound by deploy consumer API | If only one client consumes the API and they deploy together, contract testing adds maintenance with no decoupling benefit. |
| No property-based on small finite domains | If input space is enum {A, B, C}, EP + BVA already covers it; property-based adds infra without finding more bugs. |
| No integration test for pure functions | Adding a Postgres container to test a formatCurrency helper is waste. Unit only. |
| No component test for static markup | If the component has no state, no events, no conditional rendering, a snapshot is enough — or skip entirely. |
| No unit test for declarative wiring | DI bindings, route registration, schema declarations: assert at integration level (does the route serve the right handler) instead. |
| No e2e for things integration covers reliably | Per Google e2e principles: the smaller the test you can use to cover a behavior, the better. e2e is the exception, not the default. |
| No tests for spike/throwaway code | Per Beck TDD: if the artifact will be deleted within hours, document the exception with the human partner. Then write tests on the kept version. |
| No "and" tests | If a test name contains "and", split it into separate tests (one assertion per behavior). |
Every test strategy MUST be expressed as the YAML block below. Field ordering inside each list entry is load-bearing — judges and downstream tools parse the first key as the critical one (rationale / reason / why), and the second key as the categorical one (type / what).
test_strategy:
artifact: "<path or short identifier>"
rationale: "Why this test strategy is being applied to this artifact (specific, evidence-based)"
criticality: "NONE | LOW | MEDIUM | MEDIUM-HIGH | HIGH"
selected_types:
- rationale: "Why this type is being applied to this artifact (specific, evidence-based)"
type: "unit | integration | component | e2e | smoke | contract | property-based"
size: "small | medium | large | enormous"
framework: "vitest | jest | pytest | go test | JUnit | playwright | cypress | pact | hypothesis | ..."
dependencies:
- "List of dependencies: real Postgres via Testcontainers, in-memory fake, mocked HTTP via nock, etc."
gate: "Gate N (the gate that triggered this selection)"
rejected_types:
- reason: "Why this type does NOT apply to this artifact (cite Strategic Skip Heuristic or gate that did not trigger)"
type: "unit | integration | component | e2e | smoke | contract | property-based"
deliberately_skipped:
- why: "Cost / risk justification for skipping despite a partial signal"
what: "A specific category of test cases being skipped (e.g., 'browser compatibility on IE11', 'load testing beyond 100 RPS')"
test_strategy:
artifact: "POST /users (user registration endpoint)"
rationale: "User registration is a critical user-facing path; can be used by web and mobile apps independently of each other."
criticality: "MEDIUM-HIGH"
selected_types:
- rationale: "Endpoint contains validation logic (email format, password rules, uniqueness) — Gate 1 ON for branch coverage"
type: "unit"
size: "small"
framework: "vitest"
dependencies: ["in-memory user repository fake"]
gate: "Gate 1"
- rationale: "Endpoint writes to Postgres and emits user.created event to Kafka — Gate 2 ON, real boundary behavior matters"
type: "integration"
size: "medium"
framework: "vitest + supertest + Testcontainers"
dependencies: ["Postgres 15 via Testcontainers", "Kafka via Testcontainers"]
gate: "Gate 2"
- rationale: "Consumed by mobile app and web app on independent deploy cadences — Gate 4 ON, prevents drift"
type: "contract"
size: "medium"
framework: "Pact"
dependencies: ["Pact broker"]
gate: "Gate 4"
rejected_types:
- reason: "No UI surface in this artifact — Gate 3 OFF"
type: "component"
- reason: "No UI surface — Gate 3 OFF; e2e covered by web/mobile apps separately"
type: "e2e"
- reason: "Input domain (email, password) is large but invariants are well-covered by EP+BVA at unit level — property-based ROI is low at MEDIUM-HIGH criticality, only triggers Gate 6 partially"
type: "property-based"
deliberately_skipped:
- why: "Project does not have post-deploy probe pipeline yet; smoke would be no-op"
what: "Smoke test for /users after deploy"
- why: "Non-functional load testing is out of scope for this task; tracked separately in performance backlog"
what: "Load test verifying p99 < 200ms at 1000 RPS"
Field ordering checklist (judges check this verbatim):
test_strategy: artifact BEFORE rationale BEFORE criticality.selected_types[*]: rationale BEFORE type BEFORE size BEFORE framework BEFORE dependencies BEFORE gate.rejected_types[*]: reason BEFORE type.deliberately_skipped[*]: why BEFORE what.After the matrix, produce a flat markdown bullet list of test cases to be implemented. This is separate from the YAML matrix because:
## Test Cases to Cover
### AC-N: [criterion title]
- [type] description
- [type] description
### AC-N: [criterion title]
- [type] description
- [type] description
Where:
type matches one of selected_types[*].type from the matrixdescription follows AAA / Given-When-Then (Dan North BDD) shape — see Bill Wake AAA (2001)AC-N references the acceptance criterion the case verifies (omit if non-AC-bound, e.g., infrastructure smoke)## Test Cases to Cover
### AC-1: Discount returns the correct percentage based on the total
- [unit] discount returns 0% when total = 0 [EP partition: below threshold]
- [unit] discount returns 0% when total = 99 [BVA: B-1 at boundary 100]
- [unit] discount returns 5% when total = 100 [BVA: B at boundary 100]
- [unit] discount returns 5% when total = 101 [BVA: B+1 at boundary 100]
### AC-2: Discount fails when total is invalid
- [unit] discount throws when total = -1 [EP partition: invalid]
### AC-3: /orders saves the order to the database
- [integration] POST /orders persists order to Postgres and returns 201 with order id
### AC-4: /orders rejects duplicate idempotency key
- [integration] POST /orders rejects duplicate idempotency key with 409
### AC-5: /orders/:id returns order by id
- [contract] GET /orders/:id returns schema matching mobile-app pact
These 14 sources back every gate and rule above. When in doubt, consult the source linked at that gate.
Each example shows:
test_strategy YAML following the schemaTest Cases to Cover listformatCurrency(amount: number, code: string): stringArtifact
function formatCurrency(amount: number, code: string): string;
// e.g. formatCurrency(1234.5, "USD") -> "$1,234.50"
// formatCurrency(1234.5, "EUR") -> "€1.234,50"
Acceptance criteria:
$ prefix, comma thousands, period decimal, two decimal places.€ prefix, period thousands, comma decimal, two decimal places.Error("Unknown currency code") for unsupported codes.amount = 0 formats as "$0.00" / "€0,00".Criticality: LOW (helper used in display only, no money movement here).
Gate Walkthrough
| Gate | Decision | Reason | |------|----------|--------| | 0 Skip | OFF | Has logic | | 1 Unit | ON | Pure logic with branches per currency code — Test Pyramid base | | 2 Integration | OFF | No I/O, no boundary — Skip Heuristic: no integration for pure functions | | 3 Component/E2E | OFF | No UI surface | | 4 Contract | OFF | Not a public API | | 5 Smoke | OFF | Not deployable | | 6 Property-Based | ON (partial) | Numeric input is unbounded, but invariants exist (round-trip via parse, monotonicity in amount) — Hypothesis. Promote at MEDIUM-HIGH; here LOW criticality means we apply it sparingly (1-2 properties) |
test_strategy YAML
test_strategy:
artifact: "src/util/formatCurrency.ts"
rationale: "Pure helper function used in display only; no money movement here."
criticality: "LOW"
selected_types:
- rationale: "Pure logic with currency-specific branches and number formatting; EP+BVA on amount, decision table on currency code"
type: "unit"
size: "small"
framework: "vitest"
dependencies: []
gate: "Gate 1"
- rationale: "Amount domain is unbounded floats; invariant 'parseCurrency(formatCurrency(x, c)) ~= x' is stable; sparingly applied (1-2 properties) at LOW criticality"
type: "property-based"
size: "small"
framework: "fast-check"
dependencies: []
gate: "Gate 6"
rejected_types:
- reason: "No I/O, no boundary, no collaborators - Gate 2 OFF"
type: "integration"
- reason: "No UI surface - Gate 3 OFF"
type: "component"
- reason: "No UI surface - Gate 3 OFF"
type: "e2e"
- reason: "Internal helper, not consumed across deploys - Gate 4 OFF"
type: "contract"
- reason: "Library helper, no deploy pipeline target - Gate 5 OFF"
type: "smoke"
deliberately_skipped:
- why: "Locale list is finite (USD, EUR); exhaustive enumeration via decision table is sufficient and more maintainable than i18n property tests"
what: "Property-based fuzzing of currency code beyond known list"
Test Cases to Cover
### AC-1: USD output uses `$` prefix, comma thousands, period decimal, two decimal places.
- [unit] formatCurrency(1234.5, "USD") returns "$1,234.50" [EP: typical USD]
- [unit] formatCurrency(0.01, "USD") returns "$0.01" [BVA: B+1 smallest non-zero]
- [unit] formatCurrency(-0.01, "USD") returns "-$0.01" [BVA: B-1 negative side]
### AC-2: EUR output uses `€` prefix, period thousands, comma decimal, two decimal places.
- [unit] formatCurrency(1234.5, "EUR") returns "€1.234,50" [EP: typical EUR]
- [property-based] for any non-NaN finite x in [-1e9, 1e9] and code in {USD, EUR}: parseCurrency(formatCurrency(x, code)) is within 0.005 of x [round-trip invariant]
### AC-3: Throws `Error("Unknown currency code")` for unsupported codes.
- [unit] formatCurrency(1, "XYZ") throws Error("Unknown currency code") [Decision table: unknown code]
### AC-4: `amount = 0` formats as `"$0.00"` / `"€0,00"`.
- [unit] formatCurrency(0, "USD") returns "$0.00" [BVA: B at amount=0]
- [unit] formatCurrency(0, "EUR") returns "€0,00" [BVA: B at amount=0 for EUR]
Why types were rejected: Helper has no boundaries (no integration), no UI (no component/e2e), is internal and library-style (no contract/smoke), and at LOW criticality the cost of additional test types far exceeds the benefit.
POST /usersArtifact
A user-registration endpoint that:
user.created event to Kafka.201 with {id, email, createdAt}.400 for invalid input, 409 for duplicate email.Consumed by: mobile app (iOS/Android) and web app on independent deploy cadences.
Acceptance criteria:
201 and persists user.400 with field-level error.400.409.user.created event.Criticality: MEDIUM-HIGH (auth surface, identity domain, multi-consumer public API).
Gate Walkthrough
| Gate | Decision | Reason |
|------|----------|--------|
| 0 Skip | OFF | Has substantial logic |
| 1 Unit | ON | Validators (email, password, age) are pure logic — Test Pyramid base |
| 2 Integration | ON | Boundary crossing: HTTP, Postgres, Kafka — Testing Trophy ROI sweet spot |
| 3 Component/E2E | OFF (here) | No UI in this artifact; UI lives in mobile + web repos and tests itself |
| 4 Contract | ON | Two distinct consumers (mobile + web) on independent deploy cadences — Pact CDC |
| 5 Smoke | ON | Deployable HTTP service; post-deploy probe of /users registration is meaningful — Google e2e |
| 6 Property-Based | OFF | Input domain (email, password, age) is constrained and well-covered by EP+BVA at unit; criticality is MEDIUM-HIGH but Gate 6 OFF on bounded inputs — Skip Heuristic |
test_strategy YAML
test_strategy:
artifact: "POST /users (user registration endpoint)"
rationale: "User registration is a critical user-facing path; can be used by web and mobile apps independently of each other."
criticality: "MEDIUM-HIGH"
selected_types:
- rationale: "Validators (email, password, age) are pure logic; EP+BVA on each field; one test per partition"
type: "unit"
size: "small"
framework: "vitest"
dependencies: ["in-memory user repository fake (for service-level unit if needed)"]
gate: "Gate 1"
- rationale: "Endpoint writes to Postgres and emits to Kafka; mocking these distorts transactional and ordering behavior - Testcontainers gives real boundary fidelity"
type: "integration"
size: "medium"
framework: "vitest + supertest + Testcontainers"
dependencies: ["Postgres 15 via Testcontainers", "Kafka via Testcontainers"]
gate: "Gate 2"
- rationale: "Public API consumed by mobile + web on independent deploy cadences; contract testing prevents schema drift breaking either consumer"
type: "contract"
size: "medium"
framework: "Pact (provider verification)"
dependencies: ["Pact broker", "consumer-published pacts from mobile and web"]
gate: "Gate 4"
- rationale: "Deployable HTTP service with a post-deploy pipeline; one minimal smoke verifies /users responds 201 in the deployed environment"
type: "smoke"
size: "large"
framework: "Playwright (1 critical path)"
dependencies: ["deployed environment URL", "test account seeding"]
gate: "Gate 5"
rejected_types:
- reason: "No UI surface in this artifact - Gate 3 OFF; mobile and web repos own their own component tests"
type: "component"
- reason: "No UI surface - Gate 3 OFF; consumer e2e lives in mobile/web repos"
type: "e2e"
- reason: "Input domain is bounded and EP+BVA at unit level covers it; property-based on this glue endpoint adds infra without finding more bugs - Gate 6 OFF"
type: "property-based"
deliberately_skipped:
- why: "Performance/load testing is out of scope here; tracked in dedicated performance backlog"
what: "Load test verifying p99 < 200ms at 1000 RPS"
- why: "Cross-region failover is owned by infrastructure team, not this endpoint"
what: "Multi-region availability test"
Test Cases to Cover
### AC-1: Valid request returns `201` and persists user.
- [unit] validateEmail accepts "[email protected]" [EP: well-formed]
- [integration] POST /users with valid body returns 201 and persists row in Postgres
- [smoke] POST /users in deployed environment returns 201 for a synthetic test account
### AC-2: Invalid email format returns `400` with field-level error.
- [unit] validateEmail rejects "alice@" [EP: missing domain]
- [unit] validateEmail rejects "" [BVA: empty boundary]
- [integration] POST /users with invalid email returns 400 and does NOT persist
### AC-3: Password not meeting policy returns `400`.
- [unit] validatePassword rejects 7-char password [BVA: B-1 at min length 8]
- [unit] validatePassword accepts 8-char password meeting policy [BVA: B at min length]
- [unit] validatePassword accepts 9-char password [BVA: B+1]
- [unit] validateAge rejects 12 [BVA: B-1 at boundary 13]
- [unit] validateAge accepts 13 [BVA: B at boundary 13]
### AC-4: Duplicate email returns `409`.
- [integration] POST /users with duplicate email returns 409 and does NOT emit event
### AC-5: Successful registration emits exactly one `user.created` event.
- [integration] POST /users emits exactly one user.created event to Kafka on success
- [integration] POST /users transaction rolls back when Kafka publish fails [State Transition: failure path]
### AC-6: Response schema is stable for mobile + web consumers.
- [contract] Provider satisfies mobile pact: POST /users response shape matches mobile contract
- [contract] Provider satisfies web pact: POST /users response shape matches web contract
Why types were rejected: No UI surface (component/e2e belong to consumer apps), bounded input space (property-based ROI low), out-of-scope concerns (load, multi-region) deliberately skipped with rationale.
<RegistrationForm /> (web)Artifact
A React form component:
POST /users.aria-live, focus moves to first error on validation failure.Acceptance criteria:
/welcome."Enter a valid email"."Passwords must match"."This email is already registered" at form level.aria-live="polite".Criticality: MEDIUM-HIGH (registration is a critical user-facing path; accessibility is regulated in many jurisdictions).
Gate Walkthrough
| Gate | Decision | Reason |
|------|----------|--------|
| 0 Skip | OFF | Behavior + accessibility logic |
| 1 Unit | ON | Validation helpers (validateEmail, passwordsMatch, parseAge) are pure logic |
| 2 Integration | OFF (here) | The component itself does not cross a real boundary; network is mocked at fetch level. Network integration is owned by POST /users (Example B) |
| 3 Component/E2E | ON (component) + ON (e2e for the registration path) | UI surface, criticality MEDIUM-HIGH, user-facing critical path — Test Pyramid top + Follow the User |
| 4 Contract | OFF | UI consumes API; provider-side contract tests live in Example B |
| 5 Smoke | ON | Web app is deployed; smoke for "registration page renders and submits" is meaningful |
| 6 Property-Based | OFF | Bounded form inputs; EP+BVA covers them |
test_strategy YAML
test_strategy:
artifact: "src/components/RegistrationForm.tsx"
rationale: "React form component used in web app; registration is a business-critical user-facing path."
criticality: "MEDIUM-HIGH"
selected_types:
- rationale: "Validation helpers (validateEmail, passwordsMatch, parseAge) are pure logic; EP+BVA per field"
type: "unit"
size: "small"
framework: "vitest"
dependencies: []
gate: "Gate 1"
- rationale: "UI rendering + interaction within a single component; network mocked at fetch level - tests focus on user-facing behavior per Follow the User"
type: "component"
size: "small"
framework: "vitest + React Testing Library"
dependencies: ["happy-dom", "msw (mock service worker) for fetch"]
gate: "Gate 3"
- rationale: "Registration is a critical user-facing path; one e2e covers the full happy path with real backend (Testcontainers-backed)"
type: "e2e"
size: "large"
framework: "Playwright"
dependencies: ["app server running locally", "Postgres via Testcontainers", "Kafka via Testcontainers"]
gate: "Gate 3"
- rationale: "Web app deploys to staging/prod; smoke verifies /register page loads and form submits in deployed env"
type: "smoke"
size: "large"
framework: "Playwright (1 critical path)"
dependencies: ["deployed environment URL", "test account seeding"]
gate: "Gate 5"
rejected_types:
- reason: "Component does not own a real boundary; network integration is owned by POST /users (provider) - Gate 2 OFF for this artifact"
type: "integration"
- reason: "UI consumes the API; provider contract tests live with the provider (POST /users) - Gate 4 OFF for the consumer"
type: "contract"
- reason: "Bounded input space; EP+BVA at unit level is sufficient - Gate 6 OFF"
type: "property-based"
deliberately_skipped:
- why: "Cross-browser e2e on legacy browsers (IE11) is out of support per project browser matrix"
what: "Browser compatibility e2e on IE11 / Edge Legacy"
- why: "Visual regression (pixel diff) is owned by a separate Storybook chromatic pipeline"
what: "Pixel-level visual regression assertions"
Test Cases to Cover
### AC-1: User can submit a valid form and is navigated to `/welcome`.
- [unit] validateEmail accepts "[email protected]" [EP: well-formed]
- [unit] parseAge rejects 12 [BVA: B-1 at boundary 13]
- [unit] parseAge accepts 13 [BVA: B at boundary 13]
- [e2e] user fills valid form, submits, and lands on /welcome page
- [smoke] /register page loads and form submits in deployed environment
### AC-2: Invalid email shows inline `"Enter a valid email"`.
- [unit] validateEmail rejects "" [BVA: empty boundary]
- [unit] validateEmail rejects "alice@" [EP: missing domain]
- [component] entering invalid email and blurring shows "Enter a valid email" inline
### AC-3: Mismatched passwords show inline `"Passwords must match"`.
- [unit] passwordsMatch returns true when both equal "Abcd1234"
- [unit] passwordsMatch returns false when one is "" [BVA: empty]
- [component] entering mismatched passwords shows "Passwords must match" inline
### AC-4: Submit is disabled while request is in flight.
- [component] submit is disabled when password and confirmPassword differ
- [component] submit click disables button while request is pending [State Transition: idle -> pending]
### AC-5: 409 response from server shows `"This email is already registered"` at form level.
- [component] 409 response shows form-level "This email is already registered"
### AC-6: Form is keyboard navigable; focus moves to first error on validation failure.
- [component] validation failure moves focus to first error field [a11y]
### AC-7: All inputs have programmatic labels; errors are announced via `aria-live="polite"`.
- [component] form renders email, password, confirmPassword, age, submit [happy path render]
- [component] all inputs have programmatic labels and errors live in aria-live="polite" region [a11y]
Why types were rejected: This artifact is a UI consumer — its real boundary is the API, which is tested as integration in Example B (provider side). Property-based testing is not justified for bounded UI input handling. Cross-browser legacy and visual-regression are out of scope and explicitly skipped with rationale.
Before declaring a strategy complete, the loading verify:
selected_types[*] order is rationale -> type -> size -> framework -> dependencies -> gate.rejected_types[*] order is reason -> type.deliberately_skipped[*] order is why -> what.B-1, B, B+1 for each numeric boundary.If any check fails, revise the strategy before delivering.
Mutation testing and other coverage-analysis methods (used after tests are written to assess test-suite quality) are documented in the companion test-coverage skill.
development
Use after writing tests to assess coverage quality across structural, mutation, requirements, and API/integration dimensions; organized knowledge for choosing and interpreting coverage analyses.
testing
Refine, parallelize, and verify a draft task specification into a fully planned implementation-ready task
data-ai
Implement a task with automated LLM-as-Judge verification per step
development
Use when creating or developing, before writing code or implementation plans - refines rough ideas into fully-formed designs through collaborative questioning, alternative exploration, and incremental validation. Don't use during clear 'mechanical' processes