dist/codex/plugins/dev-workflow/skills/improving-tests/SKILL.md
Improve test design and coverage, including TDD/red-green-refactor guidance. Use when improving tests, refactoring tests, adding coverage, using TDD, or removing test waste. NOT for fixing production bugs (use fixing-code) or reviewing non-test code quality (use reviewing-code).
npx skillsauth add alexei-led/claude-code-config improving-testsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Improve tests by making them behavioral, lean, and useful.
Detect your capability from your tools, not from prose:
review mode is its natural fit.Detect the language from the file extensions in scope and load the matching reference for language-specific test patterns and tooling:
Mixed languages: load each matching reference. Unknown language: use the generic principles below only.
review → identify weak, duplicate, brittle, or missing testsrefactor → combine to table-driven/parametrized/test.each, remove wastecoverage → add tests for uncovered business behaviortdd → red-green-refactor loop for a feature or bugfull → review + refactor + coverageUse this for test-first or red-green-refactor work.
Do not write all tests first. Bulk RED creates imagined tests coupled to guessed implementation.
Explore the existing test suite. Engineer runs these commands to gather coverage; reviewer (no Bash) works from the test files in scope plus any coverage output the caller supplies — ask for that context if missing, do not run the commands:
# Go
go test -coverprofile=/tmp/cc-cov.out ./... && go tool cover -func=/tmp/cc-cov.out
# Python
pytest --cov=. --cov-report=term-missing
# TypeScript
bun test --coverage
Look for:
test.eachFor refactoring brittle private-helper tests, state the public behavior surface first. Example: create_user(payload) is the primary test surface; _normalize_user_payload() is not. Replace duplicate helper tests and internal call-count assertions with behavior checks through the public API. Mock only system boundaries. Delete shallow duplicates once the public behavior tests cover them.
t.Run(tc.name, ...)@pytest.mark.parametrize with pytest.param()it.each([{ input, expected, name }])Extract helpers only after 3+ repetitions and only when the helper improves readability.
Engineer runs and names the relevant verification command for the project after applying. Reviewer names the command in the Proposed Changes rationale and does not run it (no Bash). Examples:
go test ./...
pytest -v
bun test
For Python, mention pytest or the project-specific equivalent explicitly. For refactor plans in Python projects, include pytest -v or the repository's configured uv run pytest command by name instead of only saying "run tests." For other stacks, name the equivalent test command instead of saying only "tests passed."
Engineer (applied the changes):
TEST IMPROVEMENT COMPLETE
=========================
Mode: review | refactor | coverage | tdd | full
Tests changed: N
Waste removed: N
Coverage: before → after (if measured)
Key improvements:
- file:line — change
Verification:
- <command> — pass/fail
Reviewer (identified only — emit the changes as a proposal, apply nothing):
## Proposed Changes
### Change 1: <brief description>
File: `path/to/test_file`
Action: CREATE | MODIFY | DELETE
Code:
<complete test code, in the file's language>
Rationale: <weak/missing/brittle test this addresses>
If no tests or framework exist, report that and ask before creating a new testing stack.
tools
Idiomatic shell development for POSIX sh, Bash, Zsh, Fish, hooks, CI shell steps, and scriptable CLI glue. Use when writing or changing `.sh`, `.bash`, `.zsh`, `.fish`, `.bats`, shell functions, shell pipelines, or command-runner recipes. Emphasizes portability, quoting, safe filesystem/process handling, non-TUI CLI tools, ShellCheck, shfmt, Bats, and ShellSpec. NOT for Python, TypeScript, Go, web code, or infrastructure operations.
tools
Use when planning, executing, checkpointing, finishing, or inspecting lightweight spec-driven work. Runs one task at a time using `.spec/` markdown files and the bundled `specctl` helper. NOT for broad product discovery beyond a short requirement interview.
testing
Author, inspect, troubleshoot, and review infrastructure across IaC, Kubernetes, cloud resources, containers, CI/CD, and Linux hosts. Use when changing Terraform/OpenTofu, Kubernetes, Helm, Kustomize, Dockerfiles, GitHub Actions, AWS, GCP, Cloud Run, BigQuery, IAM, logs, instances, or service health. NOT for deploy/apply/rollback workflows (see deploying-infra). NOT for shell scripts or generic command pipelines (see writing-shell).
development
Configure safe git workflow hygiene: pre-commit/pre-push hooks, Gitleaks secret scanning, .gitignore rules, local git config, and guardrails. Use when setting up git hooks, gitleaks/git leaks, staged pre-commit checks, pre-push validation, core.hooksPath, .gitignore, or git config best practices. NOT for creating commits (use committing-code), cleaning branches/worktrees (use cleanup-git), or creating worktrees (use using-git-worktrees).