skills/use-workflow-tdd-cycle/SKILL.md
TDD with RGRC cycle and Baby Steps.
npx skillsauth add thkt/dotclaude use-workflow-tdd-cycleInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Test behavior via public API. Mock only at system boundaries.
| Trigger | Variant | Reference |
| --------------------------------- | --------------- | ------------------------------------------------ |
| spec.md / new feature (/code) | Feature-driven | ${CLAUDE_SKILL_DIR}/references/feature-driven.md |
| Bug report / regression (/fix) | Bug-driven | ${CLAUDE_SKILL_DIR}/references/bug-driven.md |
| Coverage gap in existing codebase | Coverage-driven | Active tests, no skip. Reuse RGRC below |
| Priority | What | | ---------- | ---------------------------------------------------- | | Must | Business logic, services, critical paths, edge cases | | Contextual | Complex utils, custom hooks, transformations | | Skip | Simple accessors, UI layout, external lib behavior |
| Context | Reason | | ------------------------ | --------------------------------- | | Prototypes (throwaway) | Discard likely, cost > benefit | | External API integration | Mock the API, not the integration | | Simple one-off scripts | Shorter than the test would be | | UI experiments | Visual first, extract logic later |
| Aspect | Feature-Driven | Bug-Driven | | ---------- | --------------------------- | --------------------- | | Trigger | Specification | Bug report | | Test state | Skip state initially | Active | | Test count | All tests generated upfront | 1 main + edge cases | | Activation | User-controlled | Immediate | | Focus | Feature completion | Regression prevention |
| Principle | Rule | | ---------------------------- | --------------------------------------------------------- | | Behavior over implementation | Test public API output, not internal calls | | State verification | Assert on result values, not "was X called" | | Real objects first | Use real dependencies. Mock only external I/O | | Black-box perspective | Treat the unit as a black box via its public interface | | Sociable tests | Let collaborators participate. Isolate only at boundaries |
| Phase | Goal | Rule | Common Mistake | | -------- | ------------ | -------------------------------------------------------------------------- | ------------------------------ | | Red | Failing test | Verify failure matches the intended behavior gap, not syntax/import errors | Test passes immediately | | Green | Pass test | "You can sin" - dirty OK | Over-implementing | | Refactor | Refine | Keep tests green. Shrink only while it reads easier (rules/PRINCIPLES.md) | Changing behavior; compressing | | Commit | Save state | All checks pass | Skipping checks |
30s: Write failing test → 1min: Make pass → 10s: Run tests → 30s: Tiny refactor → 20s: Commit if green. Bugs are always in the last 2-minute change.
Stack RGRC cycles vertically per behavior. Never expand horizontally by writing all tests first and all implementations later.
Wrong (horizontal):
Red: test1, test2, test3, test4, test5
Green: impl1, impl2, impl3, impl4, impl5
Right (vertical):
Red → Green: test1 → impl1
Red → Green: test2 → impl2
...
| # | Hazard from horizontal slices | | - | ---------------------------------------------------------------------------- | | 1 | Bulk-written tests verify imagined behavior instead of real behavior | | 2 | Tests degrade into structural assertions (data shape, signature) only | | 3 | Sensitivity to behavior change drops (pass when broken, fail when correct) | | 4 | Implementation knowledge follows test structure instead of guiding it |
Reference: mattpocock/skills tdd SKILL.md.
When a test fails, decide whether to fix the test or the implementation.
| Judgment | Condition | Action | | -------- | ------------------------- | ------------------------------------ | | Impl bug | Test matches spec/FR-xxx | Fix implementation. Don't touch test | | Test bug | Test diverges from spec | Fix test | | Unclear | Spec ambiguous or missing | Escalate to user |
For bug-driven flows (/fix), reproduction steps serve as the spec.
| Technique | Use For | Example | | ------------------------ | --------------------- | ---------------------- | | Equivalence Partitioning | Group same behavior | Age: <18, 18-120 | | Boundary Value | Test edges | 17, 18, 120, 121 | | Decision Table | Multi-condition logic | isLoggedIn × isPremium |
Every test must verify a specific outcome. Weak assertions alone are forbidden.
| Category | Matchers | When acceptable | | ------------------ | ----------------------------------------------------------------------- | ------------------------------------------------- | | Weak (existence) | toBeTruthy, toBeDefined, toBeFalsy, toBeNull, toBeUndefined | Only with a meaningful assertion in the same test | | Meaningful (value) | toBe, toEqual, toStrictEqual, toMatch, toContain, toThrow, toHaveLength | Always preferred | | Meaningful (call) | toHaveBeenCalledWith, toHaveBeenCalledTimes, toHaveReturnedWith | When verifying side effects |
Bad: expect(result).toBeTruthy()
Good: expect(result).toEqual({ id: 1, name: "Alice" })
One test, one concept. If two tests assert the same function with the same argument pattern, merge or parameterize with test.each.
Mock at system boundaries: external APIs, databases, file system, network, non-deterministic dependencies (time, random), slow dependencies that block the 2-min cycle.
| Rule | Threshold | | ------------------- | -------------------------------- | | Mock count per test | Must not exceed assertion count | | Mock scope | External dependencies only | | Mock target | Never mock the module under test |
| Anti-Pattern | Problem | Instead | | --------------------------- | ------------------------------------------- | ------------------------------------------ | | Assert mock was called | Tests mock behavior, not component behavior | Assert on observable output or side effect | | Test-only production method | Pollutes production API for test access | Extract to test utility or use public API | | Mock before understanding | Hides real dependency behavior | Understand dependency first, then mock | | Partial mock structure | Missing fields cause false passes | Mirror complete real API structure | | Mock overuse | More mocks than assertions = testing wiring | Reduce mocks or add meaningful assertions |
Unit tests import only: target module + types + test infrastructure. Build test data from types or literals.
test("name", () => {
// Arrange - Setup
// Act - Execute
// Assert - Verify
});
| Level | Pattern |
| ----- | ------------------------------------------------ |
| Suite | describe("[Target]", ...) |
| Group | describe("[Method]", ...) |
| Test | it("when [condition], should [expected]", ...) |
| Condition | Framework |
| ------------------ | --------- |
| vitest in deps | Vitest |
| jest in deps | Jest |
| bun as runtime | Bun test |
| No framework found | Vitest |
| Topic | File | | -------------- | --------------------------------------------------------- | | Feature-driven | ${CLAUDE_SKILL_DIR}/references/feature-driven.md | | Bug-driven | ${CLAUDE_SKILL_DIR}/references/bug-driven.md | | Flaky tests | ${CLAUDE_SKILL_DIR}/references/flaky-test-management.md | | Coverage | ${CLAUDE_SKILL_DIR}/../../rules/development/THRESHOLDS.md |
tools
Internal helper for /think Step 11. Renders SOW.md + Spec.md as an integrated Astro view and returns a dev server URL.
development
Extract repository spec while detecting bugs, spec gaps, and consistency drift via dual-purpose documentation. OUTCOME.md-axis question-driven exploration with ephemeral output. Do NOT use for code review (use /audit or /polish), feature implementation (use /code), planning only (use /think), or single-bug fix (use /fix).
development
Discover undocumented design decisions and challenge each candidate via critic-design before promotion. Rank by impact and reversibility, produce ADR promotion candidates. Treat each candidate as a position arguing for ADR status, not a fact to be filed. Pairs with audit-adr-drift, which scans existing ADRs for drift against code.
development
Scan ADR Decision sections against current code and report drift with modification direction and priority. Do NOT use for repos without ADRs (use audit-adr-gaps instead).