claude/test-writer/skills/test-writer/SKILL.md
Write tests that verify behavior (not implementation), use table-driven/parameterized patterns, and minimize mocking. Triggers when asked to write tests, add test coverage, or create test files. Also triggers when reviewing existing tests for quality.
npx skillsauth add smykla-skalski/sai test-writerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Write tests that survive refactoring, catch real bugs, and don't waste maintenance effort.
Philosophy: Test what the code does, not how it does it. If you refactor internals and tests break — the tests are wrong, not the code.
Parse from $ARGUMENTS:
--review — review existing tests for anti-patterns instead of writing new ones--lang — override language detection (go, python, ts, java, rust)If no arguments: ask what to test.
--lang flagBehavior identification checklist:
Read the knowledge base before writing:
cat "$(dirname "$0")/references/testing-principles.md" 2>/dev/null || cat references/testing-principles.md
Use table-driven when:
Use individual tests when:
Mock only external boundaries:
time.Now()Do not mock:
Preference hierarchy (try in order):
If you need >2 mocks, stop and reconsider — the code may need restructuring, not more mocks.
Every test follows AAA with blank line separation:
<example> ``` // Arrange — set up test data and preconditions// Act — execute the single behavior under test
// Assert — verify the expected outcome
</example>
### Naming Convention
Test names describe behavior, not method names:
<example>
- `TestTransferFunds_RejectsInsufficientBalance` (Go)
- `test_rejects_withdrawal_when_balance_insufficient` (Python)
- `it("rejects withdrawal when balance is insufficient")` (JS/TS)
</example>
**Format:** `[action]_[scenario]_[expected outcome]` or `should_[behavior]_when_[condition]`
### Table-Driven Patterns by Language
Read [references/language-patterns.md](references/language-patterns.md) for idiomatic table-driven test patterns in Go, Python, TypeScript, Java, and Rust.
### Assertions
- **Assert outcomes** — return values, state changes, observable side-effects
- **Never assert interactions** — don't verify internal method call order
- **Use concrete literals** — `want: "Hello, Alice"` not `want: fmt.Sprintf("Hello, %s", name)`
- **Multiple assertions OK** if they verify facets of the same behavior
- **No logic in assertions** — no string concatenation, no computation, no conditionals
### Edge Cases Checklist
Always consider:
- `null`/`nil`/`undefined` inputs
- Empty string, empty slice/array, empty map
- Boundary values (0, -1, max int, min int)
- Unicode, emoji, special characters in strings
- Duplicate entries where uniqueness expected
- Concurrent access if applicable
## Phase 4: Quality Check
Before finishing, verify each test against this checklist:
### Behavior Tests (must pass ALL)
- [ ] Test name describes a behavior/requirement, not a method name
- [ ] Assertions check outcomes (state, return values), not interactions
- [ ] Test would survive internal refactoring without changes
- [ ] No `verify()` on internal method calls
- [ ] Can explain what this tests without reading production code
### Table Quality (if table-driven)
- [ ] Every case has a descriptive name (not "case 1")
- [ ] No conditional logic in the test loop
- [ ] Expected values are concrete literals, not computed
- [ ] One table = one behavior (not mixing validation + formatting + error handling)
- [ ] Table struct has <=8 fields (otherwise restructure)
### Mock Discipline
- [ ] Only external boundaries are mocked (DB, HTTP, clock, filesystem)
- [ ] No internal collaborators mocked
- [ ] No data structures/value objects mocked
- [ ] <=2 mocks per test (if more: reconsider design)
- [ ] Using real objects or fakes where possible
### General Quality
- [ ] AAA structure with blank line separation
- [ ] No logic in test code (no if/for/switch)
- [ ] Each test is independent — runs in any order
- [ ] No flakiness sources (time, randomness, network)
- [ ] Error paths tested, not just happy path
## Phase 5: Review Mode (--review)
When `--review` flag is set, analyze existing tests for anti-patterns:
### Anti-Pattern Detection
Use Grep to scan for these smells and report with file:line references:
1. **Change detectors** — tests that mirror implementation structure, verify internal call order
2. **Mock explosion** — tests with 3+ mocks, especially mocking internal collaborators
3. **Missing table opportunities** — 3+ tests with identical structure differing only in data
4. **Obscure tests** — hard to understand what's being tested (magic numbers, unclear names)
5. **Conditional test logic** — if/switch inside test methods
6. **General fixture** — shared setup with fields most tests don't use
7. **Fragile tests** — coupled to implementation (private field access, internal API calls)
8. **Missing edge cases** — no error path testing, no boundary values
9. **Computed expected values** — expected values derived from same logic as production code
10. **Interaction verification** — `verify()`/`assert_called_with()` on non-boundary dependencies
### Review Output Format
<example>
```markdown
## Test Review: [file]
### Critical (must fix)
- **[anti-pattern]** at line N: [explanation + fix suggestion]
### Improvement (should fix)
- **[anti-pattern]** at line N: [explanation + fix suggestion]
### Opportunities
- Lines N-M: could consolidate into table-driven test
- Missing coverage: [behavior not tested]
</example>
verify(mock.someMethod()) on an internal dependency, stopdevelopment
Run the council workflow from a normal Copilot session only when the user explicitly asks for council review, multi-persona critique, debate, design review, code review, architecture feedback, UX review, or tradeoff analysis. Do not use it for commit, stage, merge, approval, or generic pre-commit requests. Accept the same mode syntax as the bundled council reviewers: `core|auto|core-eng|core-ux|core-mix|all|debate <problem|@file>`. During council slash-command use, the current session agent moderates reviewer agents directly. Runs broader than 6 reviewers require explicit AskUserQuestion approval before launch.
tools
Use when the user invokes $council, $council:council, Council review, or Council debate. Use loaded SKILL body or one direct installed `skills/council/SKILL.md` read. Direct read path must contain `/.codex/plugins/cache/sai/council/` and end `/skills/council/SKILL.md`. `cd <cwd> && sed -n ... <path>` is valid. Do not use `pwd`, `ls`, `find`, `rg`, `cat`, multiple `&&`, or `;`. Never use repo-local paths. If unavailable, stop exactly `Council not run: skill unavailable.` At most one pre-tool message, exact `Council progress:` line only. Non-final lines start `Council progress:`.
development
Run council reviews with sourced engineering, UX, reliability, performance, AI, and strategy persona lenses. Use when the user asks for council review, multi-persona critique, debate, design review, code review, architecture feedback, UX review, or tradeoff analysis.
development
Use when the user asks for council review, multi-persona critique, debate, design review, code review, architecture feedback, UX review, or tradeoff analysis. Bare invocations use `core` profile auto-detect; explicit `auto` selects the best-fit 6 personas from the sourced 27-persona engineering and UX roster. Users can still pin `core`, `core-eng`, `core-ux`, `core-mix`, `all`, or `debate`.