skills/code-testing/SKILL.md
Plan, write, review, and maintain software tests. Use when building features or fixing bugs test-first with a red-green loop, writing or rewriting unit/integration/e2e tests, reviewing or auditing test quality, choosing what level of test to write, working with mocks and fixtures, or keeping an existing suite in mind while making code changes.
npx skillsauth add nweii/agent-stuff code-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Three modes. Pick by task:
All three share the same philosophy.
Core principle: Tests should verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't.
Good tests are integration-style: they exercise real code paths through public APIs. They describe what the system does, not how it does it. A good test reads like a specification — "user can checkout with valid cart" tells you exactly what capability exists. These tests survive refactors because they don't care about internal structure.
Bad tests are coupled to implementation. They mock internal collaborators, test private methods, or verify through external means (like querying a database directly instead of using the interface). The warning sign: your test breaks when you refactor, but behavior hasn't changed. If you rename an internal function and tests fail, those tests were testing implementation, not behavior.
See tests.md for examples and mocking.md for mocking guidelines.
Before writing any code:
Ask: "What should the public interface look like? Which behaviors are most important to test?"
You can't test everything. Confirm with the user exactly which behaviors matter most. Focus testing effort on critical paths and complex logic, not every possible edge case.
Write ONE test that confirms ONE thing about the system:
RED: Write test for first behavior → test fails
GREEN: Write minimal code to pass → test passes
This is your tracer bullet — proves the path works end-to-end.
Verify RED before writing code. Run the test and confirm it fails for the expected reason — the feature is missing, not a typo or import error. A test that errors instead of failing, or fails for the wrong reason, proves nothing about what it protects. A test that passes immediately is testing existing behavior; fix the test.
For each remaining behavior:
RED: Write next test → fails
GREEN: Minimal code to pass → passes
Rules:
Verify GREEN properly. The new test passes, the rest of the suite still passes, and the output is pristine — no stray errors or warnings. If another test broke, fix it now.
After all tests pass, look for refactor candidates:
Never refactor while RED. Get to GREEN first.
DO NOT write all tests first, then all implementation. This is "horizontal slicing" — treating RED as "write all tests" and GREEN as "write all code."
This produces crap tests:
Correct approach: Vertical slices via tracer bullets. One test → one implementation → repeat. Each test responds to what you learned from the previous cycle.
WRONG (horizontal):
RED: test1, test2, test3, test4, test5
GREEN: impl1, impl2, impl3, impl4, impl5
RIGHT (vertical):
RED→GREEN: test1→impl1
RED→GREEN: test2→impl2
...
Reproduce the bug with the smallest failing behavioral test before implementing the fix. The test proves the fix and prevents regression.
Use Kent Beck's test properties as design lenses (not a rigid checklist), so that you can make explicit tradeoffs among valuable properties.
Apply the desiderata above, then check for the failure modes that survive checklist review. A test can satisfy every desideratum structurally and still be worthless:
assert response is not None passes but protects almost nothing. Assert on the behavior, not on existence.When a test intentionally sacrifices one desideratum for another, make that explicit in a short comment or final summary. Common acceptable tradeoffs:
| Problem | Solution | |---------|----------| | Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. | | Test too complicated | Design too complicated. Simplify interface. | | Must mock everything | Code too coupled. Use dependency injection. | | Test setup huge | Extract helpers. Still complex? Simplify design. |
testing
Command-invoked tutoring pass for understanding something deeply: a concept being learned, or work just done in the session. Locates where the learner is, teaches one step per turn, quizzes to verify, and continues until they can explain the material back and apply it. Can produce durable artifacts (a walkthrough of the work, a record of what was learned, a glossary) saved through whatever the environment supports. Best run after substantive work, or when the aim is to learn something.
testing
Search, read, filter, combine, adapt, and save recipes in the Brain vault collection. Use whenever cooking and the collection are relevant — 'what should I make', 'recipes with miso', 'save this one' all imply it.
testing
Socratic teaching pass over the work just done in a session: incremental comprehension stages, a running checklist doc, restate-understanding-first, and AskUserQuestion quizzes. The session doesn't end until the user has demonstrated understanding. Run after Claude has completed substantive work worth deeply understanding.
development
Writing-partner processes that draw out the user's own writing through questioning: guided drafting sessions, fragment mining, shaping raw material into a piece, and phrase tightening. Use for help discovering, developing, and structuring writing (notes, essays, messages, etc).