skills/test-driven-development/SKILL.md
Use when writing any new function, implementing a feature, fixing a reported bug, or adding behavior that should have test coverage. Especially when tempted to code first and test later, under time pressure, after sunk cost on existing untested code, or when thinking "this is too simple to test".
npx skillsauth add joshsymonds/gambit test-driven-developmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Write the test first. Watch it fail. Write minimal code to pass. Refactor. Commit.
Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.
Announce at start: "I'm using gambit:test-driven-development to implement this with RED-GREEN-REFACTOR."
LOW FREEDOM — Follow these exact steps in order. Do not adapt, skip, or reorder.
Violating the letter of the rules is violating the spirit of the rules.
| Phase | Action | Expected Result | |-------|--------|-----------------| | RED | Write one failing test | Test FAILS with expected message | | Verify RED | Run test, read failure | Fails because feature missing (not typo) | | GREEN | Write minimal code | Test passes | | Verify GREEN | Run ALL tests | All green, no regressions | | REFACTOR | Clean up while green | Tests still pass | | COMMIT | Commit the increment | Behavior captured |
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Wrote code before the test? Delete it. Start over.
Fresh implementation from tests. No exceptions.
Always when writing production code:
Exceptions (confirm with your human partner):
These skills are complementary, not competing:
| | TDD | Executing-Plans | |---|---|---| | Scope | Single function or behavior | Multi-task epic | | Controls | HOW code is written | WHICH task, WHEN to stop | | Cycle | RED → GREEN → REFACTOR → COMMIT | Load → Execute → Checkpoint → STOP | | Checkpoints | After each test passes | After each task completes | | Relationship | Called BY executing-plans | Calls TDD during execution |
TDD is a coding discipline used WITHIN task execution. Executing-plans decides WHAT to build; TDD decides HOW to build it.
Write a single test for one behavior. Not two. Not "and."
Requirements:
test_rejects_empty_email, not test_emailRun the test. Read the failure message.
Confirm ALL THREE:
If test passes immediately: Your test is wrong. It tests existing behavior, not new behavior. Fix the test before continuing.
If test has syntax errors: Fix syntax, re-run until you get a proper assertion failure.
Write the simplest code that makes the test pass. Nothing more.
Minimal means:
Run ALL tests (not just the new one). Confirm:
If new test fails: Fix code, not test. If other tests break: Fix regressions NOW, before continuing.
Only after ALL tests pass:
Run tests after each refactoring change. If tests break, undo the refactoring step.
Do NOT add behavior during refactoring. No new features, no new edge cases. Those are new RED cycles.
git add [test file] [implementation file]
git commit -m "feat(module): [behavior description]"
Commit message describes the behavior, not the test.
Next behavior → next failing test → next minimal implementation.
Each RED-GREEN-REFACTOR-COMMIT cycle should be small — one behavior at a time.
Bug fixes follow TDD too. The test reproduces the bug.
Task: Add email validation that rejects empty strings and missing @ symbols.
RED:
def test_rejects_empty_email():
result = validate_email("")
assert result is False
Verify RED:
$ pytest -k test_rejects_empty_email
FAILED: NameError: name 'validate_email' is not defined
Good — fails because function doesn't exist.
GREEN:
def validate_email(email):
if not email:
return False
return True
Minimal. Only handles the case the test checks.
Verify GREEN:
$ pytest
4 passed
Next RED (new behavior):
def test_rejects_missing_at_symbol():
result = validate_email("userexample.com")
assert result is False
Verify RED → GREEN → Verify GREEN → REFACTOR → COMMIT → REPEAT.
Most TDD friction is a design signal, not a testing problem. Use this table to translate the symptom into the actual issue:
| Symptom | What it usually means | Action | |---------|-----------------------|--------| | Don't know how to write the test | The API you want doesn't exist yet | Write the wished-for API as the test first — assertion before implementation. Design from the caller's side. | | Test feels too complicated | The design is too complicated | Simplify the interface before writing the test. Complex tests track complex code. | | Must mock everything | Code is too tightly coupled | Apply dependency injection. Mocking-everywhere means the unit isn't really a unit. | | Test setup is huge | Design is doing too much per object | Extract helpers. If extraction doesn't shrink setup, simplify the design itself. | | Test passes immediately | Test doesn't exercise the new behavior | Rewrite the test. It's checking something already true. | | Can't decide what to test first | Start with the happiest single path — one assertion. Edge cases are separate RED cycles. | — |
If none of these apply and you're still stuck, the problem is usually that the behavior isn't clearly defined yet — go back to the plan or epic and sharpen the requirement before writing code.
BAD: assert mock.was_called() # Tests plumbing, not behavior
GOOD: assert result == expected_value # Tests actual output
BAD: def reset(self): ... # Only used in tests, dangerous in prod
GOOD: Test utilities handle setup/teardown externally
Before mocking any dependency:
All of these mean: Delete code. Start over with RED.
| Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Already manually tested" | Ad-hoc ≠ systematic. Can't re-run. | | "Deleting X hours is wasteful" | Sunk cost. Unverified code is debt. | | "Need to explore first" | Fine. Throw away exploration, start with RED. | | "Test is hard to write" | Hard to test = hard to use. Simplify the design. | | "TDD slows me down" | TDD is faster than debugging in production. | | "This is different because..." | It's not. RED-GREEN-REFACTOR. |
See REFERENCE.md for test runner commands by language (Go, TypeScript, Rust, Python, Ruby, etc.).
Before marking work complete:
Can't check all boxes? You skipped TDD. Start over.
Called by:
gambit:executing-plans (when implementing task steps)gambit:debugging (write failing test reproducing bug)Calls:
gambit:verification (running tests to verify RED and GREEN)Workflow:
RED (write test) → Verify fail → GREEN (minimal code) → Verify pass → REFACTOR → COMMIT → REPEAT
testing
Use when creating a new skill, modifying an existing skill, writing or rewriting a SKILL.md file, auditing a skill's description for discoverability, or when user mentions "create a skill", "write a skill", "new skill", "modify skill", "improve skill", "edit the skill".
development
Use before any completion claim, success statement, or marking a task done. Triggers when about to say "Great!", "Perfect!", "Done", "All set", "Ready to commit", before creating a PR, before moving to the next task, or when code has changed since the last test run.
data-ai
Use when starting an isolated feature branch, when working on multiple features simultaneously, when experimenting without affecting the main workspace, or when a clean workspace is needed before beginning implementation. User phrases like "start a new branch", "set up a worktree", "isolated workspace", "work on feature X separately".
development
Use at the start of every session before any response or action. Also invoke whenever uncertain which gambit skill applies, when about to implement / debug / refactor / test / plan / brainstorm, or when a user request could match any gambit skill even at 1% probability.