skills/tdd-agent/SKILL.md
Fully autonomous TDD with strict guardrails. Use when you want the AI to drive the entire RED-GREEN-REFACTOR cycle independently while maintaining TDD discipline.
npx skillsauth add michaelalber/ai-toolkit tdd-agentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"Make it work, make it right, make it fast — in that order." — Kent Beck
The TDD Agent operates autonomously through the complete TDD cycle. Unlike pair programming, the AI drives all phases. Stricter guardrails apply because there's no human catching mistakes in real-time.
Non-Negotiable Constraints:
Use search_knowledge (grounded-code-mcp) to ground decisions in authoritative references.
| Query | When to Call |
|-------|--------------|
| search_knowledge("TDD autonomous red green refactor cycle strict discipline") | At session start — load authoritative TDD cycle constraints before any code generation |
| search_knowledge("test-first development failing test implementation minimum") | Before each RED phase — confirms the test-first sequence |
| search_knowledge("refactoring code smells catalog extract method") | During REFACTOR phase — load smell catalog and refactoring mechanics |
| search_knowledge("Python test pytest fixtures best practices") | For Python projects — authoritative pytest patterns |
| search_knowledge("C# xUnit test patterns FluentAssertions NSubstitute") | For .NET projects — authoritative xUnit/FluentAssertions patterns |
| search_knowledge("unit test naming conventions behavior specification") | When naming tests — confirms behavioral naming standards |
Protocol: Search before each phase transition (RED→GREEN→REFACTOR). This is an autonomous agent — grounding every phase in authoritative references prevents drift. Cite the source path in phase logs.
The agent must ensure tests have these properties:
| Property | Agent Responsibility | |----------|---------------------| | Isolated | Tests don't share state; verify no side effects | | Deterministic | Same results every run; no flaky tests | | Specific | Failures point to exact cause | | Automated | No manual intervention required | | Predictive | Passing tests = working code | | Fast | Maintain quick feedback loop |
RED Protocol:
1. Identify smallest testable behavior
2. Write test for that behavior
3. RUN the test suite
4. VERIFY the new test fails
5. VERIFY failure is for the expected reason
6. Only then, proceed to GREEN
Mandatory Logging:
### RED Phase — Iteration N
**Behavior to test**: [description]
**Test written**: `test_name` in `file`
**Verification**:
[actual test output showing failure]
**Failure reason**: [e.g., "NameError: Calculator not defined"]
**Expected**: Yes, test fails because Calculator class doesn't exist yet
Proceeding to GREEN phase.
GREEN Protocol:
1. Review the failing test
2. Identify minimal code to pass
3. Implement ONLY what's needed
4. RUN the test suite
5. VERIFY all tests pass
6. Only then, proceed to REFACTOR
Mandatory Logging:
### GREEN Phase — Iteration N
**Test to satisfy**: `test_name`
**Implementation strategy**: [Fake It | Obvious | Triangulation]
**Code written**:
```[language]
[implementation code]
Verification:
[actual test output showing all pass]
All tests passing. Proceeding to REFACTOR phase.
### Phase 3: REFACTOR — Improve Structure
REFACTOR Protocol:
**Mandatory Logging:**
```markdown
### REFACTOR Phase — Iteration N
**Starting state**: All tests passing (N tests)
**Improvement identified**: [e.g., "Extract duplicate calculation"]
**Change made**:
[brief description]
**Verification**:
[actual test output]
Refactoring complete. [Continue refactoring | Ready for next feature]
At each phase, the agent must run verification:
┌─────────────────────────────────────────────┐
│ RED Self-Check │
├─────────────────────────────────────────────┤
│ □ Test file exists │
│ □ Test is syntactically valid │
│ □ Test suite runs without error │
│ □ New test fails │
│ □ Failure is for expected reason │
│ □ Only ONE new failing test │
│ □ Existing tests still pass │
└─────────────────────────────────────────────┘
If any check fails, stop and correct before proceeding.
┌─────────────────────────────────────────────┐
│ GREEN Self-Check │
├─────────────────────────────────────────────┤
│ □ Implementation is minimal │
│ □ No features beyond test requirements │
│ □ Test suite runs without error │
│ □ All tests pass │
│ □ New test passes │
│ □ No other tests broke │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ REFACTOR Self-Check │
├─────────────────────────────────────────────┤
│ □ Started with all tests passing │
│ □ Made ONE small change │
│ □ Test suite runs without error │
│ □ All tests still pass │
│ □ No behavior was changed │
│ □ Code is cleaner than before │
└─────────────────────────────────────────────┘
Before writing ANY implementation code:
STOP! Verify:
1. A test exists for this behavior
2. The test was just run
3. The test output shows failure
4. The failure is logged in conversation
If any are missing, DO NOT PROCEED.
Never claim tests pass/fail without evidence:
WRONG: "The test should now pass."
WRONG: "I believe all tests are passing."
RIGHT: "Running tests... [actual output] All 15 tests pass."
During GREEN, ask:
If yes to any, simplify.
If tests fail during REFACTOR:
IMMEDIATE ACTIONS:
1. Stop refactoring
2. Revert the change (git checkout or undo)
3. Verify tests pass again
4. Analyze what went wrong
5. Try smaller step
Do NOT try to "fix" the refactoring. Revert first.
Always maintain explicit state:
<tdd-state>
phase: [RED | GREEN | REFACTOR]
iteration: N
feature: [description]
current_test: [test name or none]
tests_passing: [true | false]
test_count: N
failing_count: N
last_verified: [timestamp or "just now"]
</tdd-state>
The agent must explain decisions at each step:
**Decision Point**: How to implement Calculator.add(2, 3)?
**Options**:
1. Return hardcoded 5 (Fake It)
2. Return a + b (Obvious Implementation)
**Reasoning**: This is the only test for addition. Obvious implementation
is safe because the algorithm is trivial (single expression, no edge cases).
**Choice**: Obvious Implementation
## TDD Session: Calculator
Mode: Autonomous (tdd-agent)
Stack: Python
---
### RED Phase — Iteration 1
**Behavior**: Calculator can add two numbers
**Test**:
```python
# tests/test_calculator.py
def test_add_two_numbers():
calc = Calculator()
assert calc.add(2, 3) == 5
Verification:
$ pytest tests/test_calculator.py -v
E NameError: name 'Calculator' is not defined
Analysis: Test fails as expected — Calculator class doesn't exist.
<tdd-state> phase: GREEN iteration: 1 feature: Addition current_test: test_add_two_numbers tests_passing: false </tdd-state>Strategy: Obvious Implementation (addition is trivial)
Implementation:
# calculator.py
class Calculator:
def add(self, a, b):
return a + b
Verification:
$ pytest tests/test_calculator.py -v
test_calculator.py::test_add_two_numbers PASSED
All tests pass.
<tdd-state> phase: REFACTOR iteration: 1 tests_passing: true </tdd-state>Check: Code is simple, no smells detected.
Decision: No refactoring needed. Ready for next feature.
<tdd-state> phase: RED iteration: 2 </tdd-state> ```See Guardrails Reference for detailed templates. See Autonomous Protocol for extended workflow examples.
If something unexpected happens:
When in doubt:
tdd-cycle — Orchestrates the phase state machine; invoke at session start to configure mode (autonomous vs. pair) and manage transitionstdd-implementer — Called during GREEN phase for implementation strategy selection (Fake It / Obvious / Triangulation)tdd-refactor — Called during REFACTOR phase for smell identification and safe step-by-step improvementtdd-verify — Run after the session to audit TDD compliance, score commit history, and identify anti-patternstdd-pair — Alternative to this skill; use when a human partner drives and the AI navigates rather than the AI driving autonomouslyProblem: Test suite errors out
Actions:
1. Check syntax of test file
2. Check imports and dependencies
3. Fix infrastructure issues
4. Do NOT write implementation until tests run
Problem: Test fails but not for expected reason
Actions:
1. Examine the actual error
2. Fix the test if it has bugs
3. Ensure test setup is correct
4. Only proceed when failure is expected
Problem: Implementation seems correct but test fails
Actions:
1. Re-read the test carefully
2. Check for typos in test expectations
3. Verify test setup and assertions
4. Ask for help if stuck
Problem: Unsure what phase we're in
Actions:
1. Run full test suite
2. If all pass: REFACTOR or new RED
3. If one fails: GREEN
4. Reconstruct state block from evidence
development
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".