kramme-cc-workflow/skills/kramme:test:tdd/SKILL.md
(experimental) Drive implementation with tests. Write a failing test that characterizes the requirement or reproduces the bug, implement the minimum to pass, then refactor with tests green. Use when implementing new logic, fixing a bug (Prove-It pattern), or changing behavior. Complementary to kramme:test:generate, which writes tests for existing untested code.
npx skillsauth add abildtoft/kramme-cc-workflow kramme:test:tddInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Drive implementation with tests. Write the test first, watch it fail for the right reason, implement the minimum that makes it pass, then refactor with the tests green. The failing test is a specification; the passing test is evidence; the refactor is cleanup you can trust because the tests are watching.
If you find yourself saying "it's just a spike, I'll add tests later" more than once per session, that is a red flag — you're not spiking, you're skipping.
Write a test that fails because the behavior doesn't exist yet, not because of a syntax error, missing import, or misconfigured fixture. The failure message should point at the missing behavior.
MISSING REQUIREMENT if the choice needs a decision — before writing the first test. There's nothing to run a RED test against otherwise.formatDate("2026-04-20") → "Apr 20, 2026"). Avoid parameterized or property-based tests until the behavior exists.If the test passes on first run, either the behavior already exists (delete the test or keep it as a regression guard) or the test is wrong (it asserts nothing meaningful). Stop and investigate before continuing.
Write the simplest code that makes the failing test pass. Not the best code. Not the generalized code. The simplest. If return "Apr 20, 2026" passes the test, that is a legal GREEN step — the next test forces real logic.
Once the test is green, refactor freely:
Run tests after each non-trivial refactor step. If a refactor turns a test red, undo it. (Or undo the test, if the behavior genuinely needed to change — but then it's not a refactor, it's a behavior change, and deserves its own RED step.)
Stop refactoring when: no duplication remains, names read cleanly, and the next thing you'd change is speculative. Don't carve out abstractions for tests you haven't written.
When a bug report arrives, resist the urge to fix-and-ship. Prove it first.
The Prove-It test becomes a permanent regression guard. Do not delete it during future refactors, even if its coverage looks redundant once other tests accumulate.
See
references/prove-it-pattern.mdfor an annotated end-to-end example.
Test size, locality, and double strategy change what a test can prove and what it costs.
See
references/test-doubles-ladder.mdfor the Test Sizes Resource Model (Small / Medium / Large) and the Real → Fake → Stub → Mock preference ladder.
The ladder prefers real implementations first, dropping to lighter fakes/stubs only when a real dependency would make the test slow, flaky, or non-deterministic. Mocks are the last resort — over-mocking produces tests that pass while production breaks.
Writing all tests first, then all implementation. Sequence looks like:
WRONG (horizontal): RIGHT (vertical):
test1 → test2 → test3 test1 → impl1
↓ ↓
impl1 → impl2 → impl3 test2 → impl2
↓
test3 → impl3
Why horizontal slicing produces bad tests:
The existing Red Flag — "You wrote the implementation first and are now 'backfilling' tests to match" — is one specific failure mode of horizontal slicing (the implementation-first variant). Horizontal slicing also includes the test-first-then-batch-implementation variant, which feels disciplined but produces the same drift between tests and behavior.
The fix: vertical slicing via tracer bullets — one test → make it pass → next test → make it pass. Each test is informed by the previous green; each implementation is shaped by the failing test in front of it.
Six additional failure modes, with remedies, are tabulated in references/anti-patterns.md (testing implementation details, flaky tests, testing framework code, snapshot abuse, no test isolation, mocking everything). Read that reference for the table and the recommended order of operations when more than one is present.
If you recognize any anti-pattern in the current session — horizontal slices or any of the six in the reference — stop and address it before writing the next test.
Use these markers so the user can skim status at a glance. Sibling kramme skills use the same vocabulary. Use them verbatim (uppercase, no decoration), one marker per line.
STACK DETECTED: TypeScript + Vitest, tests co-located as *.test.ts.UNVERIFIED: assumed the API returns ISO-8601 dates; I'll check before implementing.NOTICED BUT NOT TOUCHING: the date utility silently ignores time zones.CONFUSION: the failing test expects Array<string> but the function signature is Set<string>.MISSING REQUIREMENT: no example of a valid error payload — should I invent one or wait?.PLAN: write failing test for parse(''), then implement the empty-string branch./kramme:debug:investigate identifies a root cause, apply the Prove-It Pattern here to lock in the fix with a regression test./kramme:test:generate: for new code, prefer TDD with this skill. /kramme:test:generate is for retrofitting tests onto already-existing untested code — the opposite direction./kramme:verify:run is the right follow-up to exercise the full build/test pipeline before committing.Watch for these excuses — they signal the discipline is about to break.
| Excuse | Reality | | --- | --- | | "I'll add the test after — it's faster this way." | Retrofitted tests shape to the code, not the requirement. The test becomes a photograph, not a specification. | | "This is too simple to need a test." | If it's simple, the test is trivial. Write it anyway — the next edit may not be simple. | | "I can't test this without mocking everything." | That's a signal the code is coupled to too many collaborators. The test is telling you about the design. | | "The test is flaky, I'll skip it and come back." | Skipped tests rot. Either fix the flakiness now or delete the test. | | "Green tests mean we're done." | Green tests mean the assertions held. Ask whether the assertions cover what can go wrong, not only what did go right. | | "We don't need the Prove-It test — the fix is obvious." | Obvious fixes regress. The test is cheap insurance; without it, the same bug ships again in six months. |
If any of these are true, pause and resolve before continuing:
skip/xit/pending and has been for more than one session.Before declaring a TDD cycle complete, confirm:
skip/xit/pending./kramme:verify:run or the project's equivalent before committing.development
Runs kramme:pr:code-review as a closeout review loop for local or PR branch changes before commit, ship, or final response. Use when the user asks for autoreview, second-model review, or a final code-review pass after non-trivial edits. Not for UX, visual, accessibility, or product review.
development
Guides topic-level understanding verification for a PR, branch, feature, document, spec, design decision, bug fix, or other concrete subject. Use when the user asks to confirm, quiz, drill, teach-and-check, or verify that they understand a topic. Maintains a topic-specific checklist artifact and requires demonstrated understanding before marking the topic complete. Not for ordinary explanations without verification, end-of-session summaries, or code/test correctness checks.
testing
Design a CI/CD pipeline with quality gates, a <10-minute budget, feature-flag lifecycle, and an exit checklist. Use when adding a new CI pipeline, changing gate configuration, or planning a rollout for a new service. Complementary to kramme:pr:fix-ci (which fixes failures in an existing pipeline). Covers gate ordering, secrets storage, branch protection, rollback mechanism, and staged-rollout guardrails — not a rollout-execution runbook.
tools
--- name: kramme:visual:demo-reel description: Capture local demo evidence for observable product behavior: screenshots, before/after image sets, browser reels, terminal recordings, and short GIF/video proof. Use when shipping UI changes, CLI features, or any change where PR reviewers would benefit from visual or behavioral evidence. argument-hint: "[what to capture] [--url <url>|auto] [--tier static|before-after|browser-reel|terminal-recording]" disable-model-invocation: true user-invocable: tr