skills/code-tdd-guide/SKILL.md
Test-driven development methodology. Red-green-refactor cycle, when to stub, how to validate failure reasons, commit cadence. Use when implementing with TDD, writing tests first or driving the tdd_loop tool.
npx skillsauth add jitsusama/agentic-harness.pi code-tdd-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
TDD is a design discipline, not a testing chore. You write the test first because the test is where you decide what the code looks like from the outside, before any implementation can talk you out of a clean shape. The tests you keep are a description of intent and a description of the exported interface, and nothing more than that.
Hold these in mind through every loop. They are why the cycle is shaped the way it is.
Write a test that describes the behaviour you want. Run it. It must fail, and the failure needs to be because the functionality doesn't exist yet, not because of a syntax error, missing import or broken test infrastructure.
Validate the failure reason. If the test fails for the wrong
reason (e.g., TypeError: X is not a function), stub out just
enough skeleton code to get a real failure:
Write the minimum code to make the test pass. No more.
Run the tests. They must pass.
With green tests as your safety net, improve the design. This is the moment to step back and reconsider both the internal shape and the exported interface, now that a real consumer need exists and a passing test proves the behaviour. Restructure for clarity; keep the behaviour, and keep the tests green. Do the refactoring with your normal editing tools.
When you're satisfied, close the loop with a one-line reflection on what you reconsidered.
Order tests so each one forces exactly one new increment of functionality into existence. The first test should need the least code to pass. Each subsequent test should require one new capability that the existing code doesn't have yet.
This usually means the simplest, most degenerate cases come first (nil input, empty collection, missing resource) because they force the minimum viable skeleton: a constructor, a method signature, a return type. The happy path, the primary success case, often comes later because it requires the most machinery to work.
Do not follow the plan's scenario grouping as the implementation order. Plans group scenarios by category (happy path, edge cases, errors) for coverage clarity. That grouping is not a sequence. Before writing the first test, sort the scenarios by how much code each one forces into existence and start with the one that forces the least.
The test: for each scenario, ask "how much code do I need to make this pass?" If the answer is "almost everything", it comes later. If the answer is "a constructor and a zero-value return", it comes first.
Each red-green-refactor loop produces one atomic commit. The test and implementation go together because they're one unit of work.
Figure out how to run tests from the project's context:
Infer what counts as a test file from the project:
*test*, *spec*, *_test.*__tests__/, tests/, test/, spec/If you can't determine the convention, ask.
The tdd_loop tool tracks one red-green-refactor loop at a
time. It is a tracker and a reminder, not a gate. It never
interrupts the user and it never blocks your file writes. You
drive it, and the human-facing surface stays passive: the status
line shows a constant TDD label beside a glyph, and a widget
alongside it shows the phase, the iteration and the behaviour
under test.
Run one loop per increment. Open a loop when you're ready for the next single behaviour, take it through to green and refactor, then close it before you open the next. Keep each loop to one behaviour rather than batching several together.
Each transition carries a short justification: a phrase, an assertion line, a one-line reflection. The machine advances only when that justification is present and the move makes sense from where the loop is. Otherwise it hands back a line of guidance and changes nothing. There's no prompt to answer; read the guidance, do what it names and try the transition again.
The classic three steps expand into the tool's transitions. Red
becomes plan, write and red; the loop then closes with
done after refactor. Every loop passes through refactor to
reach done, even as a no-op when there's nothing to change.
behaviour): what the code should do, in one
phrase, named after the symbol you wish existed. Opens the
loop into the plan phase.interface): the exact exported surface the test
imports and calls. Bind the test to the public interface,
never the internals.failure, failureKind): the failure you saw when
you ran the test. A compile or missing-symbol error is
failureKind: "other" and is not a real red, so stub a
minimal skeleton, re-run, and call red again with the
assertion failure (failureKind: "assertion"). Only a
verified assertion red clears the way to green.pass): the passing result you saw. Write the
minimum code to pass. Do not touch the test to make it green.done.reflection): a one-line note on what you
reconsidered about the internal and external design. Closes
the loop and returns to rest.reason): leave the loop early. Use this when
the user steers you elsewhere, or when a loop can't be
satisfied; don't leave one dangling silently.Attest red honestly. The whole value of red is proving the test
fails for the reason you think. A TypeError or a missing
symbol is not that proof; it is failureKind: "other". When you
hit one, stub just enough skeleton to get a real assertion
failure, then attest the real red with failureKind: "assertion". The tool records the kind you report; it never
reads your test output, so report it straight.
development
Structure of a quest README and the documents that live under it: frontmatter shape, the four core and four optional body sections, emoji glyphs, ID format, alias notation, Cast bullets and Journey entries. Use when writing or editing a quest README, a plan, research, brief or report document under a quest. Pairs with quest-convention for choices like kind, promotion and reordering. Follow the prose-standard for voice.
tools
Operational conventions for the quest system: when to use a quest versus a subquest versus a sidequest, when to scaffold a plan or research document, how to reorder priorities, when to add optional sections, when to conclude versus retire, the resuscitate pattern. Use when driving the quest tool, deciding kind, promoting or parking work, or organising a project as quests. Pairs with quest-format for the on-disk shape.
development
Markdown structure rules: Title Case headings with their exceptions, the line-width target and its legitimate exceptions, reference-style links, fenced code blocks with language tags, tables and lists. Use when writing or editing any markdown file (README, AGENTS, docs, plans, skill files), or when adding a heading, link, table or code block. Owns markdown structure; pairs with prose-standard, which owns voice, grammar, spelling and punctuation.
tools
How to measure whether convention corrections keep recurring in the pi session logs, by category and by week. Use to record a baseline before the convention gates take effect and to re-run afterwards to confirm the recurring categories bend down. Pairs with the convention gates (pr-guardian, issue-guardian, commit-guardian, slack-integration) and the convention-context extension.