Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

smykla-skalski/test-writer

Name: test-writer
Author: smykla-skalski

claude/test-writer/skills/test-writer/SKILL.md

npx skillsauth add smykla-skalski/sai test-writer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Test Writer

Write tests that survive refactoring, catch real bugs, and don't waste maintenance effort.

Philosophy: Test what the code does, not how it does it. If you refactor internals and tests break — the tests are wrong, not the code.

Arguments

Parse from $ARGUMENTS:

target — file path, function name, or module to test
--review — review existing tests for anti-patterns instead of writing new ones
--lang — override language detection (go, python, ts, java, rust)

If no arguments: ask what to test.

Phase 1: Understand the Code

Read the target — understand what it does, its public API, inputs/outputs
Identify behaviors — list what the code guarantees to callers. Think in Given/When/Then:
- "Given [state], when [action], then [outcome]"
- Each behavior = one test (or one row in a table)
Find boundaries — identify external dependencies (DB, HTTP, filesystem, clock, randomness)
Check existing tests — use Glob to find test files, Read them, understand what's covered
Detect language — from file extension or --lang flag

Behavior identification checklist:

What does a caller/user care about? (outcomes, not internals)
What are the success cases?
What are the error/edge cases? (null, empty, overflow, boundary values)
What side effects are observable? (not internal method calls)

Phase 2: Design Test Structure

Read the knowledge base before writing:

cat "$(dirname "$0")/references/testing-principles.md" 2>/dev/null || cat references/testing-principles.md

Decision: Table-Driven or Individual Tests?

Use table-driven when:

Multiple cases share the same test logic (same arrange/act/assert shape)
Testing input/output mapping, validation, parsing, transformation
Cases differ only in data, not in assertion logic

Use individual tests when:

Each case needs different setup/teardown
Each case asserts different things (state vs error vs side-effect)
Complex scenarios that would require conditionals in the test loop

Decision: What to Mock?

Mock only external boundaries:

Database/storage calls
HTTP/network requests
Filesystem I/O
System clock / time.Now()
Random number generators
Message queues, email gateways

Do not mock:

Internal collaborators (classes/functions you own)
Data structures, value objects, DTOs
Things you can use for real (in-memory, fast, deterministic)

Preference hierarchy (try in order):

Real implementation (if fast + deterministic)
Fake (in-memory implementation of same interface)
Stub (hardcoded return values)
Mock (behavior verification) — absolute last resort

If you need >2 mocks, stop and reconsider — the code may need restructuring, not more mocks.

Phase 3: Write Tests

Structure: Arrange-Act-Assert

Every test follows AAA with blank line separation:

<example> ``` // Arrange — set up test data and preconditions

// Act — execute the single behavior under test

// Assert — verify the expected outcome

</example>

### Naming Convention

Test names describe behavior, not method names:

<example>
- `TestTransferFunds_RejectsInsufficientBalance` (Go)
- `test_rejects_withdrawal_when_balance_insufficient` (Python)
- `it("rejects withdrawal when balance is insufficient")` (JS/TS)
</example>

**Format:** `[action]_[scenario]_[expected outcome]` or `should_[behavior]_when_[condition]`

### Table-Driven Patterns by Language

Read [references/language-patterns.md](references/language-patterns.md) for idiomatic table-driven test patterns in Go, Python, TypeScript, Java, and Rust.

### Assertions

- **Assert outcomes** — return values, state changes, observable side-effects
- **Never assert interactions** — don't verify internal method call order
- **Use concrete literals** — `want: "Hello, Alice"` not `want: fmt.Sprintf("Hello, %s", name)`
- **Multiple assertions OK** if they verify facets of the same behavior
- **No logic in assertions** — no string concatenation, no computation, no conditionals

### Edge Cases Checklist

Always consider:
- `null`/`nil`/`undefined` inputs
- Empty string, empty slice/array, empty map
- Boundary values (0, -1, max int, min int)
- Unicode, emoji, special characters in strings
- Duplicate entries where uniqueness expected
- Concurrent access if applicable

## Phase 4: Quality Check

Before finishing, verify each test against this checklist:

### Behavior Tests (must pass ALL)
- [ ] Test name describes a behavior/requirement, not a method name
- [ ] Assertions check outcomes (state, return values), not interactions
- [ ] Test would survive internal refactoring without changes
- [ ] No `verify()` on internal method calls
- [ ] Can explain what this tests without reading production code

### Table Quality (if table-driven)
- [ ] Every case has a descriptive name (not "case 1")
- [ ] No conditional logic in the test loop
- [ ] Expected values are concrete literals, not computed
- [ ] One table = one behavior (not mixing validation + formatting + error handling)
- [ ] Table struct has <=8 fields (otherwise restructure)

### Mock Discipline
- [ ] Only external boundaries are mocked (DB, HTTP, clock, filesystem)
- [ ] No internal collaborators mocked
- [ ] No data structures/value objects mocked
- [ ] <=2 mocks per test (if more: reconsider design)
- [ ] Using real objects or fakes where possible

### General Quality
- [ ] AAA structure with blank line separation
- [ ] No logic in test code (no if/for/switch)
- [ ] Each test is independent — runs in any order
- [ ] No flakiness sources (time, randomness, network)
- [ ] Error paths tested, not just happy path

## Phase 5: Review Mode (--review)

When `--review` flag is set, analyze existing tests for anti-patterns:

### Anti-Pattern Detection

Use Grep to scan for these smells and report with file:line references:

1. **Change detectors** — tests that mirror implementation structure, verify internal call order
2. **Mock explosion** — tests with 3+ mocks, especially mocking internal collaborators
3. **Missing table opportunities** — 3+ tests with identical structure differing only in data
4. **Obscure tests** — hard to understand what's being tested (magic numbers, unclear names)
5. **Conditional test logic** — if/switch inside test methods
6. **General fixture** — shared setup with fields most tests don't use
7. **Fragile tests** — coupled to implementation (private field access, internal API calls)
8. **Missing edge cases** — no error path testing, no boundary values
9. **Computed expected values** — expected values derived from same logic as production code
10. **Interaction verification** — `verify()`/`assert_called_with()` on non-boundary dependencies

### Review Output Format

<example>
```markdown
## Test Review: [file]

### Critical (must fix)
- **[anti-pattern]** at line N: [explanation + fix suggestion]

### Improvement (should fix)
- **[anti-pattern]** at line N: [explanation + fix suggestion]

### Opportunities
- Lines N-M: could consolidate into table-driven test
- Missing coverage: [behavior not tested]

</example>

Hard Rules

Do not test implementation — if you catch yourself writing verify(mock.someMethod()) on an internal dependency, stop
Do not compute expected values — hardcode them as literals
Do not add conditional logic to tests — split into separate tests/cases
Do not mock internal collaborators — use real objects
Do not name tests after methods — name them after behaviors
Table-driven by default when 3+ cases share the same assertion shape
Every test must be readable standalone — no jumping to helpers to understand what's tested (DAMP > DRY)
One behavior per test — one "when", one "then"

smykla-skalski/test-writer

claude/test-writer/skills/test-writer/SKILL.md

Write tests that verify behavior (not implementation), use table-driven/parameterized patterns, and minimize mocking. Triggers when asked to write tests, add test coverage, or create test files. Also triggers when reviewing existing tests for quality.

2 stars

testing

Updated Apr 14, 2026

$ install --global

skillsauth

npx skillsauth add smykla-skalski/sai test-writer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 14, 2026, 2:06 AM30.9s3 files scanned

SKILL.md

name:: test-writer
description:: Write tests that verify behavior (not implementation), use table-driven/parameterized patterns, and minimize mocking. Triggers when asked to write tests, add test coverage, or create test files. Also triggers when reviewing existing tests for quality.
argument-hint:: [file-or-function-to-test] [--review] [--lang go|python|ts|java|rust]
user-invocable:: true
context:: fork
agent:: general-purpose

Test Writer

Write tests that survive refactoring, catch real bugs, and don't waste maintenance effort.

Philosophy: Test what the code does, not how it does it. If you refactor internals and tests break — the tests are wrong, not the code.

Arguments

Parse from $ARGUMENTS:

target — file path, function name, or module to test
--review — review existing tests for anti-patterns instead of writing new ones
--lang — override language detection (go, python, ts, java, rust)

If no arguments: ask what to test.

Phase 1: Understand the Code

Read the target — understand what it does, its public API, inputs/outputs
Identify behaviors — list what the code guarantees to callers. Think in Given/When/Then:
- "Given [state], when [action], then [outcome]"
- Each behavior = one test (or one row in a table)
Find boundaries — identify external dependencies (DB, HTTP, filesystem, clock, randomness)
Check existing tests — use Glob to find test files, Read them, understand what's covered
Detect language — from file extension or --lang flag

Behavior identification checklist:

What does a caller/user care about? (outcomes, not internals)
What are the success cases?
What are the error/edge cases? (null, empty, overflow, boundary values)
What side effects are observable? (not internal method calls)

Phase 2: Design Test Structure

Read the knowledge base before writing:

cat "$(dirname "$0")/references/testing-principles.md" 2>/dev/null || cat references/testing-principles.md

Decision: Table-Driven or Individual Tests?

Use table-driven when:

Multiple cases share the same test logic (same arrange/act/assert shape)
Testing input/output mapping, validation, parsing, transformation
Cases differ only in data, not in assertion logic

Use individual tests when:

Each case needs different setup/teardown
Each case asserts different things (state vs error vs side-effect)
Complex scenarios that would require conditionals in the test loop

Decision: What to Mock?

Mock only external boundaries:

Database/storage calls
HTTP/network requests
Filesystem I/O
System clock / time.Now()
Random number generators
Message queues, email gateways

Do not mock:

Internal collaborators (classes/functions you own)
Data structures, value objects, DTOs
Things you can use for real (in-memory, fast, deterministic)

Preference hierarchy (try in order):

Real implementation (if fast + deterministic)
Fake (in-memory implementation of same interface)
Stub (hardcoded return values)
Mock (behavior verification) — absolute last resort

If you need >2 mocks, stop and reconsider — the code may need restructuring, not more mocks.

Phase 3: Write Tests

Structure: Arrange-Act-Assert

Every test follows AAA with blank line separation:

<example> ``` // Arrange — set up test data and preconditions

// Act — execute the single behavior under test

// Assert — verify the expected outcome

</example>

### Naming Convention

Test names describe behavior, not method names:

<example>
- `TestTransferFunds_RejectsInsufficientBalance` (Go)
- `test_rejects_withdrawal_when_balance_insufficient` (Python)
- `it("rejects withdrawal when balance is insufficient")` (JS/TS)
</example>

**Format:** `[action]_[scenario]_[expected outcome]` or `should_[behavior]_when_[condition]`

### Table-Driven Patterns by Language

Read [references/language-patterns.md](references/language-patterns.md) for idiomatic table-driven test patterns in Go, Python, TypeScript, Java, and Rust.

### Assertions

- **Assert outcomes** — return values, state changes, observable side-effects
- **Never assert interactions** — don't verify internal method call order
- **Use concrete literals** — `want: "Hello, Alice"` not `want: fmt.Sprintf("Hello, %s", name)`
- **Multiple assertions OK** if they verify facets of the same behavior
- **No logic in assertions** — no string concatenation, no computation, no conditionals

### Edge Cases Checklist

Always consider:
- `null`/`nil`/`undefined` inputs
- Empty string, empty slice/array, empty map
- Boundary values (0, -1, max int, min int)
- Unicode, emoji, special characters in strings
- Duplicate entries where uniqueness expected
- Concurrent access if applicable

## Phase 4: Quality Check

Before finishing, verify each test against this checklist:

### Behavior Tests (must pass ALL)
- [ ] Test name describes a behavior/requirement, not a method name
- [ ] Assertions check outcomes (state, return values), not interactions
- [ ] Test would survive internal refactoring without changes
- [ ] No `verify()` on internal method calls
- [ ] Can explain what this tests without reading production code

### Table Quality (if table-driven)
- [ ] Every case has a descriptive name (not "case 1")
- [ ] No conditional logic in the test loop
- [ ] Expected values are concrete literals, not computed
- [ ] One table = one behavior (not mixing validation + formatting + error handling)
- [ ] Table struct has <=8 fields (otherwise restructure)

### Mock Discipline
- [ ] Only external boundaries are mocked (DB, HTTP, clock, filesystem)
- [ ] No internal collaborators mocked
- [ ] No data structures/value objects mocked
- [ ] <=2 mocks per test (if more: reconsider design)
- [ ] Using real objects or fakes where possible

### General Quality
- [ ] AAA structure with blank line separation
- [ ] No logic in test code (no if/for/switch)
- [ ] Each test is independent — runs in any order
- [ ] No flakiness sources (time, randomness, network)
- [ ] Error paths tested, not just happy path

## Phase 5: Review Mode (--review)

When `--review` flag is set, analyze existing tests for anti-patterns:

### Anti-Pattern Detection

Use Grep to scan for these smells and report with file:line references:

1. **Change detectors** — tests that mirror implementation structure, verify internal call order
2. **Mock explosion** — tests with 3+ mocks, especially mocking internal collaborators
3. **Missing table opportunities** — 3+ tests with identical structure differing only in data
4. **Obscure tests** — hard to understand what's being tested (magic numbers, unclear names)
5. **Conditional test logic** — if/switch inside test methods
6. **General fixture** — shared setup with fields most tests don't use
7. **Fragile tests** — coupled to implementation (private field access, internal API calls)
8. **Missing edge cases** — no error path testing, no boundary values
9. **Computed expected values** — expected values derived from same logic as production code
10. **Interaction verification** — `verify()`/`assert_called_with()` on non-boundary dependencies

### Review Output Format

<example>
```markdown
## Test Review: [file]

### Critical (must fix)
- **[anti-pattern]** at line N: [explanation + fix suggestion]

### Improvement (should fix)
- **[anti-pattern]** at line N: [explanation + fix suggestion]

### Opportunities
- Lines N-M: could consolidate into table-driven test
- Missing coverage: [behavior not tested]

</example>

Hard Rules

Do not test implementation — if you catch yourself writing verify(mock.someMethod()) on an internal dependency, stop
Do not compute expected values — hardcode them as literals
Do not add conditional logic to tests — split into separate tests/cases
Do not mock internal collaborators — use real objects
Do not name tests after methods — name them after behaviors
Table-driven by default when 3+ cases share the same assertion shape
Every test must be readable standalone — no jumping to helpers to understand what's tested (DAMP > DRY)
One behavior per test — one "when", one "then"

Related Skills

smykla-skalski/council

development

VerifiedTrustedCommunity

Run the council workflow from a normal Copilot session only when the user explicitly asks for council review, multi-persona critique, debate, design review, code review, architecture feedback, UX review, or tradeoff analysis. Do not use it for commit, stage, merge, approval, or generic pre-commit requests. Accept the same mode syntax as the bundled council reviewers: `core|auto|core-eng|core-ux|core-mix|all|debate <problem|@file>`. During council slash-command use, the current session agent moderates reviewer agents directly. Runs broader than 6 reviewers require explicit AskUserQuestion approval before launch.

2SKILL.mdUpdated Apr 30, 2026

smykla-skalski/council

smykla-skalski/council

tools

VerifiedTrustedCommunity

Use when the user invokes $council, $council:council, Council review, or Council debate. Use loaded SKILL body or one direct installed `skills/council/SKILL.md` read. Direct read path must contain `/.codex/plugins/cache/sai/council/` and end `/skills/council/SKILL.md`. `cd <cwd> && sed -n ... <path>` is valid. Do not use `pwd`, `ls`, `find`, `rg`, `cat`, multiple `&&`, or `;`. Never use repo-local paths. If unavailable, stop exactly `Council not run: skill unavailable.` At most one pre-tool message, exact `Council progress:` line only. Non-final lines start `Council progress:`.

2SKILL.mdUpdated Apr 29, 2026

smykla-skalski/council

smykla-skalski/council

development

VerifiedTrustedCommunity

Run council reviews with sourced engineering, UX, reliability, performance, AI, and strategy persona lenses. Use when the user asks for council review, multi-persona critique, debate, design review, code review, architecture feedback, UX review, or tradeoff analysis.

2SKILL.mdUpdated Apr 29, 2026

smykla-skalski/council

smykla-skalski/council

development

VerifiedTrustedCommunity

Use when the user asks for council review, multi-persona critique, debate, design review, code review, architecture feedback, UX review, or tradeoff analysis. Bare invocations use `core` profile auto-detect; explicit `auto` selects the best-fit 6 personas from the sourced 27-persona engineering and UX roster. Users can still pin `core`, `core-eng`, `core-ux`, `core-mix`, `all`, or `debate`.

2SKILL.mdUpdated Apr 29, 2026

smykla-skalski/council

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/smykla-skalski/sai.git

# Copy into Claude Code skills folder (global)
cp -r sai/claude/test-writer/skills/test-writer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

smykla-skalski/sai

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT