skills/superpowers/test-driven-development/SKILL.md
Use when implementing any feature or bugfix, before writing implementation code
npx skillsauth add jbro/pi-agent test-driven-developmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Write the test first. Watch it fail. Write minimal code to pass.
Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.
Violating the letter of the rules is violating the spirit of the rules.
Always: new features, bug fixes, refactoring, behavior changes.
Exceptions (ask your human partner): throwaway prototypes, generated code, configuration files.
Thinking "skip TDD just this once"? Stop. That's rationalization.
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write code before the test? Delete it. Start over.
No exceptions:
digraph tdd_cycle {
rankdir=LR;
red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"];
verify_red [label="Verify fails\ncorrectly", shape=diamond];
green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"];
verify_green [label="Verify passes\nAll green", shape=diamond];
refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"];
next [label="Next", shape=ellipse];
red -> verify_red;
verify_red -> green [label="yes"];
verify_red -> red [label="wrong\nfailure"];
green -> verify_green;
verify_green -> refactor [label="yes"];
verify_green -> green [label="no"];
refactor -> verify_green [label="stay\ngreen"];
verify_green -> next;
next -> red;
}
Write one minimal test showing what should happen.
<Good> ```typescript test('retries failed operations 3 times', async () => { let attempts = 0; const operation = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'success'; };const result = await retryOperation(operation);
expect(result).toBe('success'); expect(attempts).toBe(3); });
Clear name, tests real behavior, one thing.
</Good>
**Requirements:** one behavior, clear name, real code (no mocks unless unavoidable).
### Verify RED — Watch It Fail
**MANDATORY. Never skip.**
```bash
npm test path/to/test.test.ts
Confirm: test fails (not errors); failure message is expected; fails because feature is missing.
Test passes? You're testing existing behavior — fix the test. Test errors? Fix the error, re-run until it fails correctly.
Write the simplest code to pass the test.
<Good> ```typescript async function retryOperation<T>(fn: () => Promise<T>): Promise<T> { for (let i = 0; i < 3; i++) { try { return await fn(); } catch (e) { if (i === 2) throw e; } } throw new Error('unreachable'); } ``` Just enough to pass. </Good>Don't add features, refactor other code, or "improve" beyond the test.
MANDATORY.
npm test path/to/test.test.ts
Confirm: test passes; other tests still pass; output is pristine (no errors, warnings).
Test fails? Fix code, not test. Other tests fail? Fix now.
After green only: remove duplication, improve names, extract helpers. Keep tests green. Don't add behavior.
Next failing test for next feature.
| Quality | Good | Bad |
|---------|------|-----|
| Minimal | One thing; "and" in name → split | test('validates email and domain and whitespace') |
| Clear | Name describes behavior | test('test1') |
| Shows intent | Demonstrates desired API | Obscures what code should do |
| Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. | | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. | | "Need to explore first" | Fine. Throw away exploration, start with TDD. | | "Test hard = design unclear" | Listen to test. Hard to test = hard to use. | | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. | | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. | | "Existing code has no tests" | You're improving it. Add tests for existing code. |
All of these mean: Delete code. Start over with TDD.
Bug: Empty email accepted
RED
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
Verify RED
$ npm test
FAIL: expected 'Email required', got undefined
GREEN
function submitForm(data: FormData) {
if (!data.email?.trim()) {
return { error: 'Email required' };
}
// ...
}
Verify GREEN
$ npm test
PASS
REFACTOR — extract validation for multiple fields if needed.
Before marking work complete:
Can't check all boxes? You skipped TDD. Start over.
| Problem | Solution | |---------|----------| | Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. | | Test too complicated | Design too complicated. Simplify interface. | | Must mock everything | Code too coupled. Use dependency injection. | | Test setup huge | Extract helpers. Still complex? Simplify design. | | Bug found | Write failing test reproducing it. Follow TDD cycle. |
When adding mocks or test utilities, read testing-anti-patterns.md in this skill's directory
to avoid common pitfalls:
Production code → test exists and failed first
Otherwise → not TDD
No exceptions without your human partner's permission.
testing
Use when writing or editing human-facing prose that needs clearer, more concise wording.
development
Use when you have a spec or requirements for a multi-step task, before touching code
data-ai
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
tools
Use when starting any conversation or task, to determine which skills apply before responding or taking any action.