nWave/skills/nw-tdd-review-enforcement/SKILL.md
Test design mandate enforcement, test budget validation, TDD phase validation (3-phase canon per ADR-025), and external validity checks for the software crafter reviewer
npx skillsauth add nwave-ai/nwave nw-tdd-review-enforcementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Domain knowledge for reviewing TDD implementations against 5 test design mandates, test budget, TDD phase compliance (3-phase canon per ADR-025), and external validity.
All assertions validate observable outcomes, never internal structure.
Observable: return values from driving ports | state changes via queries | side effects at driven port boundaries | exceptions from public API | business invariants
Violations: asserting private fields | verifying internal method call order | inspecting intermediate calculations | checking internal class instantiation
Severity: Blocker. Rewrite to assert observable outcomes only.
Zero unit tests of domain entities/value objects/services directly. Test indirectly through application service (driving port) tests.
Violations: imports domain entity (Order, Customer) | instantiates value object (Money, Email) | invokes domain service method
Exception: complex standalone algorithm with stable public interface (rare). Severity: Blocker. Delete domain tests, add application service test.
All unit tests enter through driving ports (application services, controllers, CLI handlers, event handlers). Never internal classes.
Detection: grep for internal imports (from domain.entity, from internal.validator).
Severity: Blocker. Rewrite through driving port.
Adapters have integration tests with real infrastructure (testcontainers, test servers). No mocked unit tests.
Violations: mocking IDbConnection | mocking SMTP client | stubs instead of real infrastructure Acceptable: in-memory implementations if behavior-complete. Severity: Blocker. Convert to integration test.
Input variations of same behavior use parametrized tests, not duplicates.
Violations: test_valid_email_1/test_valid_email_2 | copy-pasted tests with only inputs changed Severity: High. Consolidate into @pytest.mark.parametrize.
Formula: max_unit_tests = 2 x number_of_distinct_behaviors
A behavior = ONE observable outcome from a driving port action. Edge cases of same behavior = ONE (use parametrized).
One behavior: happy path for one operation | error handling for one error type | validation for one rule Not a behavior: testing internal class | same behavior different inputs | testing getters/setters | testing framework code
budget = 2 x countTEST BUDGET VALIDATION: FAILED
Acceptance Criteria Analysis:
- "User can register with valid email" = 1 behavior
- "Invalid email format rejected" = 1 behavior
- "Duplicate email rejected" = 1 behavior
Budget: 3 behaviors x 2 = 6 unit tests maximum
Actual: 14 unit tests
Violations:
1. Budget exceeded: 14 > 6 (Blocker)
2. Internal class testing: test_user_validator.py tests UserValidator directly (Blocker)
3. Parameterization missing: 5 separate tests for valid email variations
Required: delete internal tests, consolidate via parametrize, re-submit
Verify TDD phases in execution-log.json. Current canonical (ADR-025, 2026-05-07): 3-phase cycle — RED → GREEN → COMMIT. RED absorbs the legacy PREPARE / RED_ACCEPTANCE / RED_UNIT phases — it unskips the AT scaffold authored by DISTILL, verifies fail-for-right-reason, and writes PBT unit tests ONLY when the AT requires them to reach GREEN.
Legacy 5-phase contract (ADR-024 era): PREPARE / RED_ACCEPTANCE / RED_UNIT / GREEN / COMMIT — preserved for audit-log replay of pre-2026-05-07 commits. Existing execution-log.json files using the 5-phase contract remain valid; the gates below apply equivalently to merged phases under the 3-phase canon.
| Gate | Description | Phase (3-phase canon) | Legacy phase (5-phase) | |------|-------------|-----------------------|------------------------| | G1 | Exactly one acceptance test active | RED | PREPARE | | G2 | Acceptance test fails for valid reason | RED | RED_ACCEPTANCE | | G3 | Unit test fails on assertion (when authored) | RED | RED_UNIT | | G4 | No mocks inside hexagon | RED | RED_UNIT | | G5 | Business language in tests | GREEN | GREEN | | G6 | All tests green | GREEN | GREEN | | G7 | 100% passing before commit | COMMIT | COMMIT | | G8 | Test count within budget | RED | RED_UNIT | | G9 | No test modifications to accommodate implementation | GREEN | GREEN |
Gates G2, G4, G7, G8, G9 are Blockers if not verified.
Note: Review/refactoring quality verified at deliver-level Phase 4 (Adversarial Review).
When is_walking_skeleton: true: don't flag missing unit tests | verify exactly one E2E test | thinnest slice OK (hardcoded values) | unit-test authoring inside RED may be skipped (3-phase canon) or RED_UNIT/GREEN entries SKIPPED with "NOT_APPLICABLE: walking skeleton" (5-phase legacy logs).
Verify features are invocable through entry points, not just existing in code.
Question: "If I follow these steps, will the feature WORK or just EXIST?"
EXTERNAL VALIDITY CHECK: FAILED
Issue: All 6 acceptance tests import des.validator.TemplateValidator directly.
No test imports des.orchestrator.DESOrchestrator (the entry point).
Consequence: Tests pass, coverage is 100%, but TemplateValidator is never
called in production because DESOrchestrator doesn't use it.
Required: update acceptance test to invoke through entry point, wire component.
The single worst TDD violation: modifying a test to make it pass instead of fixing the implementation. This inverts the TDD feedback loop -- the test no longer protects behavior. Instant rejection, no exceptions, no conditional approval.
| Signal | How to Detect | Severity |
|--------|---------------|----------|
| Test + implementation changed in same commit | Git diff shows test file edits alongside production code edits during GREEN phase | BLOCKER |
| Assertion weakened | assertEquals(expected, actual) changed to assertNotNull(actual) or assertTrue(result) | BLOCKER |
| Expectations reduced | Test previously checked 5 fields, now checks 1-2 | BLOCKER |
| Test deleted or skipped | @skip, @pytest.mark.skip, @Disabled, xfail, entire test method removed | BLOCKER |
| Deferred fix comments | # TODO: fix later, # temporarily relaxed, # workaround, # adjusted for now in test files | BLOCKER |
| Assertion count decreased | Previous commit had N assertions, current has fewer for same test | BLOCKER |
ESCALATION_NEEDED marker in execution log)TEST MODIFICATION DETECTION: BLOCKER
File: tests/unit/test_order_service.py
Commit: abc123 (GREEN phase)
Before (RED phase):
assert result.total == Decimal("150.00")
assert result.tax == Decimal("15.00")
assert result.items == 3
assert result.status == OrderStatus.CONFIRMED
After (GREEN phase):
assert result is not None # <-- weakened from 4 specific assertions to existence check
Verdict: REJECTED. Implementation could not satisfy the original assertions.
The crafter modified the test instead of fixing the implementation.
Required: revert test to RED-phase version, fix implementation to satisfy original assertions.
Acceptance tests pass because test fixtures implement the expected behavior directly, rather than exercising production code through the driving port. The tests verify the correct outcome from the wrong source. This is a form of Testing Theater where the entire GREEN phase is fraudulent -- no production code was changed.
| Signal | How to Detect | Severity |
|--------|---------------|----------|
| No production files in git diff | git diff --name-only after GREEN shows only test files, none of the files_to_modify entries | BLOCKER |
| Given steps create end-state | Test Given/Arrange steps construct the expected output directly instead of setting up preconditions for production code | BLOCKER |
| Fixture implements behavior | Test helper/fixture contains domain logic that should live in production code | BLOCKER |
| RED-to-GREEN without production changes | Acceptance test flips from failing to passing but git diff --stat shows zero production file changes | BLOCKER |
git diff --name-only and compare against files_to_modify from the roadmap stepfiles_to_modify MUST appear in the diff (excluding test files). If any production file is missing: BLOCKERfiles_to_modify lists only test files)FIXTURE THEATER DETECTION: BLOCKER
Step: 02-01 (implement gitignore support in DES plugin)
Files to modify: [src/des/adapters/.../des_plugin.py]
git diff --name-only after GREEN:
tests/des/acceptance/test_plugin_gitignore.py (fixture modified)
tests/conftest.py (helper added)
Missing from diff: src/des/adapters/.../des_plugin.py
Verdict: REJECTED. Agent modified test fixtures to produce expected state
instead of implementing production code in des_plugin.py.
The acceptance test passes because the fixture creates the expected output,
not because the driving port (DESPlugin.install()) produces it.
Required: revert fixture changes, implement production code in des_plugin.py.
When a crafter gets stuck, the correct action is to escalate -- not to silently weaken tests. The reviewer verifies proper escalation protocol was followed.
escalation_needed: true with reason if the crafter hit a wallpo_approved: true or requirement_change: {ticket} in execution log)| Failure | Detection | Severity |
|---------|-----------|----------|
| Silent test modification | No escalation marker + test weakened | BLOCKER |
| Insufficient attempts | Fewer than 3 GREEN attempts before test change | BLOCKER |
| Missing PO approval | Test changed for "requirement change" without PO reference | BLOCKER |
| Proper escalation | ESCALATION_NEEDED marker present, 3+ attempts logged | PASS (reviewer verifies test change validity) |
All required phases present per schema version (3-phase canon: RED/GREEN/COMMIT; legacy 5-phase: PREPARE/RED_ACCEPTANCE/RED_UNIT/GREEN/COMMIT), all PASS, all gates satisfied (G1-G9), zero defects, budget met, no internal class tests, no test modifications, no testing theater.
Missing phases | any FAIL | any defect | budget exceeded | internal class tested | test modified to accommodate implementation (G9) | testing theater detected | silent test modification without escalation. Zero tolerance.
2 review iterations | persistent gate failures | unresolved architectural violations | crafter properly escalated (ESCALATION_NEEDED marker present with 3+ attempts). Escalate to tech lead.
testing
Acceptance test creation methodology for the DISTILL wave. Domain knowledge for the acceptance designer agent: port-to-port principle, prior wave reading, wave-decision reconciliation, graceful degradation, and document back-propagation.
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation
development
Design mandates for acceptance tests - hexagonal boundary, business language abstraction, user journey completeness, pure function extraction, 3 Pillars (domain language / chained narrative / production composition), and the layered ATD discipline (Universe-bound assertion, layer-dependent PBT mode, two-tier acceptance, example-based sad paths)