kit/plugins/code-testing-agent/skills/tdd-loop/SKILL.md
Implements features via TDD with up to 20 red-green-refactor rounds. Writes failing tests first, then iterates until all pass. Use when the user asks to TDD a new feature or write tests first.
npx skillsauth add shawn-sandy/agentics tdd-loopInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Given a feature description, write a failing test suite that covers every
acceptance criterion, commit it, then loop autonomously — run tests, form a
hypothesis, edit production code, re-run — up to 20 iterations. On green,
run typecheck and lint (up to 5 gate-fix iterations), commit the implementation
with a separate feat: commit, and open a PR.
Freedom level: Strict — Follow these steps in order. Do not skip or combine steps. Stop at each hard-stop marker.
Does not fix existing bugs — use tdd-fix. Does not suggest tests without implementing — use code-testing-agent.
Call ExitPlanMode immediately and silently — always, unconditionally, before
any other action. Do not prompt the user. TDD loops involve git mutations
and cannot proceed inside plan mode.
ExitPlanMode is a deferred tool whose schema must be loaded before it can be
called. Use ToolSearch with select:ExitPlanMode first, then call
ExitPlanMode. Both steps happen silently with no user-visible output.
Error handling: If ExitPlanMode returns the exact error "You are not in plan mode", treat that as success — plan mode was already off. Do not abort or surface the error to the user; continue to the next step.
Run git status --porcelain. If the output is non-empty, output:
Working tree is dirty. Commit or stash changes before running tdd-loop.
STOP.
Run git branch --show-current. If empty (detached HEAD), output:
Cannot run: repository is in detached HEAD state. Checkout a branch first.
STOP.
If the current branch is main or master, output:
Create a feature branch first (e.g., invoke git-agent:branch-agent), then re-run.
STOP.
Use TodoWrite to create todos for Steps 1–8, all status: "pending".
Mark each status: "completed" as you finish it.
Extract from the invocation message:
| Field | Source | |-------|--------| | Feature name | Explicit, or derived from description | | Acceptance criteria | Numbered list from message or a linked file | | Target file | Explicit path, or infer from feature name + project layout | | Test file | Explicit path, or infer from target file using naming conventions |
If acceptance criteria cannot be enumerated from the message, output:
Cannot proceed: no acceptance criteria found. Re-invoke with a numbered list
of behaviors the implementation must satisfy.
STOP.
Use the heuristics from the running-tests skill (inspect package.json,
vitest.config.*, pytest.ini, pyproject.toml, go.mod, Cargo.toml,
and nearest-ancestor Makefile) to identify the correct test runner.
If no framework is detected (language not yet covered — Rust, Ruby, Java,
Elixir, etc.), the skill is permitted exactly one AskUserQuestion call:
"No recognized test framework was detected. Choose one:"
- Use the
tests/demo/Bash fixture pattern as a zero-dependency fallback (Recommended)- Abort — I will add framework support first, then re-run
This is the only AskUserQuestion call allowed in this skill. The Step 4 loop
is completely forbidden from calling it.
Create the test file at the path resolved in Step 1. Do not create the production file yet.
Write one test block per acceptance criterion (minimum one assertion per
block). Use describe / it grouping where idiomatic. Use the project's
existing assertion style (read a nearby test file with Read to infer).
Run the scoped test suite (Bash):
All tests fail → correct. Proceed to Step 3.
Any test passes before any production code exists → the test is wrong (likely too lenient, trivially true, or the feature already exists). Output:
Test passed without any production code — likely too lenient or already
implemented. Review the failing/passing breakdown before proceeding.
STOP.
Invoke the commit-agent skill. When it drafts the commit message, ensure:
testtest(src/Tabs): failing suite for accessible Tabs componentThe commit-agent handles staging, pre-commit hooks, and conventional format.
Do not duplicate that logic here.
STOP here — do not write production code yet.
Initialize an iteration log (see references/tdd-log-format.md for schema).
Render it as a markdown table and update it live after every iteration.
For each iteration i from 1 to 20:
4a — Read failures. Run the scoped test suite. Capture the failure output.
4b — Form a hypothesis. In one sentence, state why the test is failing and what production-code change is responsible.
4c — Edit production code only. Use Edit (never Write to replace the
full file). Change only what the hypothesis requires. Do not refactor unrelated
code.
4d — Re-run scoped tests. Record the result in the iteration log.
4e — Check for green. If failing-count = 0, exit the loop and proceed to Step 5.
Show the updated iteration log after every iteration.
If failing-count was N > 0 at the end of Step 2, and goes to 0 after a single edit in iteration 1, output:
EARLY_GREEN: tests went green in one iteration.
This may mean the tests were too lenient or the feature was already partially
implemented. Diff of iteration-1 changes:
[paste the diff]
Review the test suite for adequacy before proceeding.
STOP. Do not commit. Do not open a PR.
.skip, xfail, @ts-ignore,
as any, mocking the unit under test, modifying test assertions (except via
the escape hatch below).AskUserQuestion — forbidden inside the loop.If, mid-loop, a test has a genuine defect (typo, wrong assertion discovered during implementation):
(test edited: <reason>) to the Hypothesis cell of the iteration row.If iteration 20 ends with failing tests:
Print the full iteration log.
Output:
tdd-loop stopped after 20 iterations. Tests are still failing.
No implementation commit or PR will be created.
The test: commit is on the branch; partial code changes are on disk (uncommitted).
Next steps: inspect the iteration log above, continue manually, or reset the branch.
STOP. Do not auto-revert. Leave the branch as-is for manual inspection.
Run all three gates in sequence:
npx tsc --noEmit if tsconfig.json exists; else mypy,
go vet, cargo check, or equivalent.npm run lint if the script exists; else eslint, ruff,
golangci-lint, as appropriate.If all pass → proceed to Step 6.
If any fail → enter a separate gate-fix loop (max 5 iterations). For each gate-fix iteration:
gate, numbered
g1, g2, etc. — see references/tdd-log-format.md).If gate-fix cap (5) is exhausted → output:
Quality gates failed after 5 fix iterations.
No implementation commit or PR will be created.
Gate failure details: [last output of each failing gate]
STOP.
Invoke the commit-agent skill. When it drafts the commit message, ensure:
featfeat(src/Tabs): accessible Tabs component (WAI-ARIA tablist pattern)Result: the branch now has exactly two feature commits — test: … then
feat: ….
Invoke the pr-agent skill. When it drafts the PR body, include:
## TDD iterations section with the full iteration log from Step 4.## Gate fixes section with the gate-fix log from Step 5 (omit if empty).## tdd-loop summary line: "N/20 impl iterations, M/5 gate iterations."Note: Do not chain this skill with ship-autonomous in the same session.
Both access gh pr checks and may interleave state unpredictably.
Print a summary line:
tdd-loop complete. N/20 impl iterations, M/5 gate iterations.
PR: <url>
STOP here. Do not poll CI, re-run tests, suggest refactors, or take any
further action. CI watching is ship-autonomous's responsibility.
data-ai
Craft-prompt: interviews users and assembles a structured AI prompt using Anthropic best-practice techniques. Use when the user runs /plan-agent:craft-prompt or asks to craft a prompt.
development
Generates a SOCIAL.md project sharing config by analyzing the codebase. Use when asked to set up social sharing preferences or create a SOCIAL.md file.
development
Explains how any project file, component, or concept works. Reads source files and synthesizes developer-friendly principles, social copy, and a dark-mode card. Use when asked 'how does X work' or 'explain X'.
development
Generate an HTML implementation-plan document. Produces a self-contained .html plan file with steps, acceptance criteria, and metadata. Use when the user asks to create a plan document, generate an HTML plan, or write a plan file — not for general planning questions.