skills/tdd-pair/SKILL.md
AI as TDD pair programmer with role-based collaboration. Use when you want to collaborate on TDD with the AI taking a specific role (driver, navigator, or alternating). Do NOT use when the goal is autonomous solo work; Do NOT use when no human partner is actively present in the session.
npx skillsauth add michaelalber/ai-toolkit tdd-pairInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"Pair programming is a dialog between two people trying to simultaneously program (and analyze and design and test) and understand together how to program better." — Kent Beck
Pair programming with AI brings human judgment and AI capabilities together. The human provides context, requirements, and final decisions. The AI provides pattern recognition, implementation speed, and TDD discipline.
The Partnership Principle: Two minds working together produce better code than either alone — but roles must be clear.
Use search_knowledge (grounded-code-mcp) to ground decisions in authoritative references.
| Query | When to Call |
|-------|--------------|
| search_knowledge("pair programming ping pong TDD navigator driver") | At session start — confirms pair roles and ping-pong protocol |
| search_knowledge("TDD test naming behavior specification") | When writing tests in the pair session — authoritative naming conventions |
| search_knowledge("XP extreme programming practices pair collaboration") | When establishing collaboration norms — grounding in XP source material |
| search_knowledge("test quality behavioral structure-insensitive readable") | During review turns — confirms test quality criteria for both partners |
Protocol: Search at session initialization and when disagreements arise about test quality or implementation approach. Cite the source path in your response.
In pair programming, both partners share responsibility for test quality:
| Property | Human Role | AI Role | |----------|-----------|---------| | Behavioral | Defines desired behavior | Helps express it in tests | | Specific | Knows failure implications | Writes precise assertions | | Readable | Reviews for clarity | Suggests clear naming | | Inspiring | Decides acceptable risk | Maintains test coverage |
At the start of each pairing session, negotiate roles:
## TDD Pair Session
Welcome! Let's establish our collaboration style.
**Pairing Modes Available**:
1. **Ping-Pong TDD** (Recommended for learning)
- We alternate: one writes test, other implements
- Builds shared understanding
- Both stay engaged
2. **Navigator Mode**
- AI suggests, human executes (or reverse)
- One drives, one guides
- Good for complex domains
3. **Teaching Mode**
- AI explains TDD concepts as we go
- Socratic questions to deepen understanding
- Best for TDD newcomers
**Which mode would you prefer?**
Round 1:
Human: Writes a failing test
AI: Implements to make it pass
Together: Refactor if needed
Round 2:
AI: Writes next failing test
Human: Implements to make it pass
Together: Refactor if needed
... alternate ...
### Your Turn: Write a Test
**Current state**: [passing/failing, count]
**Feature area**: [what we're building]
Please write a test for the next behavior you want to add.
**Guidance**:
- Focus on one small behavior
- Make assertions specific
- Use clear test names
I'll implement once you share the test.
### My Turn: Implement
**Your test**: `test_name`
**Current failure**: [error message]
**My implementation**:
```[language]
[minimal code]
Verification:
[test output]
Questions before proceeding:
Your turn to write the next test!
#### AI's Turn (Test Writer)
```markdown
### My Turn: Write a Test
**Current state**: All tests passing ([count])
**Next behavior I suggest testing**: [description]
**Proposed test**:
```[language]
[test code]
Why this test: [brief explanation]
Alternative behaviors we could test instead:
Do you want me to add this test, or would you prefer a different behavior?
#### Human's Turn (Implementer)
```markdown
### Your Turn: Implement
**Test I wrote**: `test_name`
**Current failure**:
[error output]
**Hints** (if you'd like):
<details>
<summary>Hint 1: Approach</summary>
[general direction]
</details>
<details>
<summary>Hint 2: Implementation pattern</summary>
[more specific guidance]
</details>
Take your time! Let me know when you've implemented it, and I'll run the tests.
The navigator provides strategic guidance while the driver executes.
Navigator Responsibilities:
Driver Responsibilities:
### Navigator Guidance
**Current Phase**: [RED | GREEN | REFACTOR]
**What we're doing**: [description]
**My Observations**:
- [observation 1]
- [observation 2]
**Suggested Next Step**:
[specific action to take]
**Questions to Consider**:
- [strategic question 1]
- [strategic question 2]
I'm watching for TDD discipline. You have the keyboard!
### Driver Executing
**Your navigation**: [what you asked for]
**My interpretation**: [how I understood it]
**Action I'll take**:
[specific code or command]
**Before I proceed**:
- Does this match your intent?
- Any adjustments needed?
[execute action]
**Result**: [outcome]
What's our next move, navigator?
Help humans internalize TDD thinking through guided questions rather than direct answers.
**Observation**: It seems you're ready to implement before we have a failing test.
**Question**: What behavior are we trying to add? Can you describe it in terms of inputs and expected outputs?
**Follow-up**: How would we know if the implementation is correct without a test to verify it?
**Observation**: This implementation handles several cases. Let me check our tests...
**Question**: Which specific test requires [the extra feature]?
**Reflection**: In TDD, we often say "YAGNI" — You Aren't Gonna Need It. What's the simplest implementation that would pass our current tests?
**Observation**: All tests pass. Before we add more features...
**Question**: Looking at the code we just wrote, is there anything that feels awkward or duplicated?
**Prompt**: What would make this code clearer to someone reading it for the first time?
**Observation**: This test verifies several behaviors at once.
**Question**: If this test fails, how would we know which part went wrong?
**Exercise**: Can we split this into smaller, more focused tests? What's the smallest behavior we can test?
### TDD Insight: [Topic]
**What we just experienced**: [situation]
**The TDD principle at work**: [explanation]
**Kent Beck's perspective**: "[relevant quote]"
**How this helps us**: [practical benefit]
**Question for reflection**: [thought-provoking question]
1. Role Negotiation
└─ Agree on pairing mode
└─ Clarify expectations
└─ Set up environment
2. Iteration Loop
├─ RED: Write failing test (one partner)
├─ GREEN: Implement (other partner)
├─ REFACTOR: Together
└─ Switch roles (if ping-pong)
3. Periodic Check-ins
└─ "How's this working?"
└─ "Should we switch modes?"
└─ "Any confusion?"
4. Session Wrap-up
└─ Review what was built
└─ Discuss learnings
└─ Plan next session
<tdd-pair-state>
mode: ping-pong | navigator | teaching
phase: RED | GREEN | REFACTOR
iteration: N
current_role: human_tests_ai_implements | ai_tests_human_implements | ai_navigates | human_navigates
feature: [description]
tests_passing: true | false
turn: human | ai
</tdd-pair-state>
Great! You've chosen **[mode]** mode.
**How this works**:
[brief explanation of chosen mode]
**Getting started**:
[first action for the mode]
<tdd-pair-state>
mode: [chosen mode]
phase: RED
iteration: 1
current_role: [initial role]
turn: [who starts]
</tdd-pair-state>
### Turn Complete
**What happened**: [brief summary]
**Result**: [test pass/fail status]
**Handing off to you**:
[what the human should do next]
<tdd-pair-state>
...
turn: [next turn]
</tdd-pair-state>
### Pair Session Summary
**Mode Used**: [mode]
**Iterations Completed**: N
**Tests Written**: N
**Behaviors Implemented**:
1. [behavior 1]
2. [behavior 2]
...
**Key Moments**:
- [interesting decision or learning]
- [challenge and how we solved it]
**For Next Session**:
- [suggested next steps]
Thanks for pairing! How did this session feel?
The AI must:
The AI must:
In pair mode, enforcement is softer:
The AI must:
Problem: Test suite errors on setup or syntax
Actions:
1. Switch roles temporarily — AI diagnoses infrastructure, human observes
2. Fix environment before resuming TDD work
3. Do NOT write new tests until suite runs cleanly
4. Resume at the phase that was interrupted
Problem: Previously passing tests now fail unexpectedly
Actions:
1. Stop — identify which change caused the regression
2. AI navigator: read the failure message, trace to the cause
3. If in REFACTOR: revert immediately, do not try to fix inline
4. If in GREEN: the implementation broke something — narrow scope
5. Restore green before adding any new behavior
Problem: Unclear whose turn it is or what phase we're in
Actions:
1. Run the full test suite — output tells you the phase
- All pass → REFACTOR or new RED
- One fails → GREEN
2. Check the most recent <tdd-pair-state> block
3. Re-announce the phase and whose turn it is before continuing
Problem: Human and AI disagree on implementation or test design
Actions:
1. AI defers to human on requirements and business logic — always
2. AI may advocate on TDD discipline (test-first, minimal impl)
3. If human wants to skip a phase, AI explains the tradeoff once, then follows the human's lead
4. Never block progress over TDD purity in pair mode
Problem: Session interrupted, state block missing or stale
Actions:
1. Ask the human to describe what was last completed
2. Run the test suite to establish current phase
3. Read the most recently touched test and implementation files
4. Reconstruct <tdd-pair-state> from evidence before continuing
tdd-cycle — Provides the phase state machine and transition rules used in every pairing modetdd-implementer — Invoked when it is the AI's turn to write implementation code in ping-pong modetdd-refactor — Invoked during the shared REFACTOR step at the end of each ping-pong roundtdd-verify — Run after the session to audit TDD compliance and generate a scorecardtdd-agent — Alternative to this skill; use when the AI should drive all phases autonomously rather than collaboratingSee reference files for pairing-specific patterns:
development
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".