Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

michaelalber/tdd-verify

Name: tdd-verify
Author: michaelalber

skills/team/tdd-verify/SKILL.md

npx skillsauth add michaelalber/ai-toolkit tdd-verify

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

TDD Verify (Gatekeeper)

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." — Martin Fowler

Core Philosophy

TDD verification ensures that the discipline was followed, not just that tests exist. Tests written after implementation feel different, test different things, and provide different value than tests written first.

The Gatekeeper's Role: Detect when TDD wasn't followed. Identify coverage theater (tests that don't test). Score TDD compliance. Guide improvement.

Knowledge Base Lookups

| Query | When to Call | |-------|--------------| | search_knowledge("TDD anti-patterns test after implementation coverage theater") | During verification — authoritative anti-pattern catalog to check against | | search_knowledge("test quality desiderata behavioral isolated deterministic") | When scoring test quality — confirms the 12 properties and their verification questions | | search_knowledge("code coverage mutation testing quality metrics") | When assessing coverage quality vs. coverage theater | | search_knowledge("TDD discipline red green refactor commit order") | When auditing commit history — confirms expected TDD commit sequence |

Search at verification start to load authoritative compliance criteria. Cite the source path in the scorecard.

Kent Beck's 12 Test Desiderata (Verification Focus)

| Property | Verification Question | |----------|----------------------| | Isolated | Can each test run independently? | | Composable | Can tests be run in any subset? | | Deterministic | Do tests always give the same result? | | Specific | Do failures point to the exact cause? | | Behavioral | Do tests verify behavior, not implementation? | | Structure-insensitive | Would refactoring break these tests? | | Fast | Is feedback loop quick enough? | | Writable | Are tests easy to create? | | Readable | Can you understand intent quickly? | | Automated | Do tests run without intervention? | | Predictive | Does passing mean it works? | | Inspiring | Do tests give confidence to change? |

Verification Modes

Commit History Analysis — verify test-first development by examining git history:

# Check if tests were committed before implementation
git log --oneline --name-only | less
# Expected: test file commit precedes implementation file commit

Coverage Quality Analysis — look beyond percentage to quality. Coverage theater signs: 100% coverage with no assertions, tests that only call methods, happy path only, implementation details tested.

Test Quality Audit — checklist per test: test name describes behavior, Arrange-Act-Assert structure clear, single concept per test, assertions are specific, no implementation details exposed, failure message would be helpful.

Compliance Scorecard — five categories scored 0–5: Test-First Evidence, Behavioral Tests, Minimal Implementation, Refactoring Discipline, Coverage Quality. Total out of 25.

Workflow

Step 1: Gather Evidence

Collect: git commit history (chronological), test file contents, implementation file contents, coverage report (if available), test execution results.

Step 2: Analyze Commit Order

Check if tests preceded implementation. Flag any commit that contains both test and implementation files in a single commit ("should be separate"), and any implementation commit without a preceding test commit.

| Commit | Type | TDD Compliant? | |--------|------|----------------| | abc123 | Test | N/A (first) | | def456 | Impl | Yes (test first) | | ghi789 | Both | No (should be separate) | | jkl012 | Impl | No (no preceding test) |

Step 3: Analyze Test Quality

For each test, evaluate against the 12 Desiderata. Note: Behavioral (tests outcome, not internal call), Specific (precise assertion), Isolated (no shared state), Structure-insensitive (not verifying private methods or internal structure).

Step 4: Check Coverage Quality

High-quality indicators: tests fail when behavior breaks, edge cases covered, error paths tested, assertions verify outcomes. Theater indicators: tests pass even with broken behavior, no assertions, only exercises code paths, happy path only.

Step 5: Generate Scorecard

Compile findings into a report with: Overall assessment, Strengths, Improvement Areas, Recommendations (immediate, short-term, long-term), and per-category scores.

AI Anti-Patterns to Detect

| Anti-Pattern | Signs | Detection Signal | |---|---|---| | Test-After Implementation | Tests mirror impl structure; same variable names as impl; test "documents" rather than "specifies" | Both test and impl in same commit; no failing-test commit before impl commit | | Over-Mocking | More mocks than real objects; tests verify method calls; mocks returning mocks | assert_called_with(...) on implementation-internal methods | | Happy Path Only | No error, edge, or boundary tests | Test inventory missing: zero cases, overflow cases, invalid input cases | | Assert-Free Tests | Tests only call methods; tests print output; tests "verify" nothing | Zero assertion statements in test body | | Implementation Coupling | Tests break on refactoring; tests verify private methods; tests depend on specific structure | _private_method or _internal_state references in test assertions | | Copy-Paste Tests | Tests differ only in values; no parameterization; duplicated setup code | Test names following pattern test_X_1, test_X_2, test_X_3 |

Output Templates

## TDD Compliance Scorecard: [Repo/Branch]
**Period**: [date range] | **Commits Analyzed**: N

| Category | Score | Status |
|----------|-------|--------|
| Test-First Development | X/5 | GREEN/YELLOW/RED |
| Behavioral Testing | X/5 | GREEN/YELLOW/RED |
| Minimal Implementation | X/5 | GREEN/YELLOW/RED |
| Refactoring Discipline | X/5 | GREEN/YELLOW/RED |
| Coverage Quality | X/5 | GREEN/YELLOW/RED |
**Overall**: X/25 ([percentage]%)

Anti-patterns: [list or "none"]
Recommendations: Immediate: [...] | Short-term: [...] | Ongoing: [...]

Full templates (Detailed Verification Report with per-category analysis and Appendix): references/compliance-scoring.md

State Block

<tdd-verify-state>
scope: [repo path | branch | commit range | "pending"]
commits_analyzed: [N | "none yet"]
current_category: [test-first | behavioral | minimal-impl | refactor | coverage | "complete"]
score_so_far: [e.g., "12/20 — 3 categories complete"]
anti_patterns_found: [comma-separated list or "none"]
findings_pending_review: [N items]
last_action: [what was just done]
next_action: [what should happen next]
</tdd-verify-state>

AI Discipline Rules

Evidence-based verification only. Never claim TDD compliance without evidence. Commit history must show test-first ordering. Coverage must show meaningful assertions. Tests must exercise behavior, not internal structure.

Be constructive — verification is for improvement, not punishment. Frame findings as opportunities. Provide specific, actionable recommendations. Acknowledge what was done well.

Context matters. Legacy code may not follow TDD. Time pressure affects discipline. Learning curves are real. Consider the situation before scoring.

Distinguish intent from mistakes. Some code may intentionally skip tests. Some tests may be exploratory. Ask before assuming violations when the situation is ambiguous.

Integration with Other Skills

tdd — Audit a session run under the canonical tdd inner loop; full-cycle commit history (one test → one implementation → refactor → commit) provides the richest evidence
tdd-agent — Run tdd-verify after an autonomous tdd-agent session to confirm discipline was followed
tdd-pair — Run tdd-verify at the end of a pair session to score compliance and surface improvement areas
tdd-refactor — If tdd-verify finds implementation-coupled tests, invoke tdd-refactor to decouple them safely
tdd-implementer — If tdd-verify finds over-engineering or over-mocking, trace findings back to the GREEN phase for root-cause

Reference files: references/compliance-scoring.md (detailed scoring methodology) | references/ai-antipatterns.md (patterns specific to AI-generated code)

michaelalber/tdd-verify

skills/team/tdd-verify/SKILL.md

Verify AI-generated code follows TDD discipline. Use when auditing commits for TDD discipline, checking test coverage quality, detecting TDD anti-patterns, or generating compliance scorecards. Do NOT use when reviewing legacy code written before TDD was applied without first establishing a baseline; Do NOT use when you have not reviewed project history.

development

Updated May 21, 2026

$ install --global

skillsauth

npx skillsauth add michaelalber/ai-toolkit tdd-verify

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 21, 2026, 7:03 AM210.3s3 files scanned

SKILL.md

name:: tdd-verify
audience:: team
description:: >

TDD Verify (Gatekeeper)

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." — Martin Fowler

Core Philosophy

The Gatekeeper's Role: Detect when TDD wasn't followed. Identify coverage theater (tests that don't test). Score TDD compliance. Guide improvement.

Knowledge Base Lookups

Search at verification start to load authoritative compliance criteria. Cite the source path in the scorecard.

Kent Beck's 12 Test Desiderata (Verification Focus)

Verification Modes

Commit History Analysis — verify test-first development by examining git history:

# Check if tests were committed before implementation
git log --oneline --name-only | less
# Expected: test file commit precedes implementation file commit

Compliance Scorecard — five categories scored 0–5: Test-First Evidence, Behavioral Tests, Minimal Implementation, Refactoring Discipline, Coverage Quality. Total out of 25.

Workflow

Step 1: Gather Evidence

Collect: git commit history (chronological), test file contents, implementation file contents, coverage report (if available), test execution results.

Step 2: Analyze Commit Order

Step 3: Analyze Test Quality

Step 4: Check Coverage Quality

Step 5: Generate Scorecard

Compile findings into a report with: Overall assessment, Strengths, Improvement Areas, Recommendations (immediate, short-term, long-term), and per-category scores.

AI Anti-Patterns to Detect

Output Templates

## TDD Compliance Scorecard: [Repo/Branch]
**Period**: [date range] | **Commits Analyzed**: N

| Category | Score | Status |
|----------|-------|--------|
| Test-First Development | X/5 | GREEN/YELLOW/RED |
| Behavioral Testing | X/5 | GREEN/YELLOW/RED |
| Minimal Implementation | X/5 | GREEN/YELLOW/RED |
| Refactoring Discipline | X/5 | GREEN/YELLOW/RED |
| Coverage Quality | X/5 | GREEN/YELLOW/RED |
**Overall**: X/25 ([percentage]%)

Anti-patterns: [list or "none"]
Recommendations: Immediate: [...] | Short-term: [...] | Ongoing: [...]

Full templates (Detailed Verification Report with per-category analysis and Appendix): references/compliance-scoring.md

State Block

<tdd-verify-state>
scope: [repo path | branch | commit range | "pending"]
commits_analyzed: [N | "none yet"]
current_category: [test-first | behavioral | minimal-impl | refactor | coverage | "complete"]
score_so_far: [e.g., "12/20 — 3 categories complete"]
anti_patterns_found: [comma-separated list or "none"]
findings_pending_review: [N items]
last_action: [what was just done]
next_action: [what should happen next]
</tdd-verify-state>

AI Discipline Rules

Be constructive — verification is for improvement, not punishment. Frame findings as opportunities. Provide specific, actionable recommendations. Acknowledge what was done well.

Context matters. Legacy code may not follow TDD. Time pressure affects discipline. Learning curves are real. Consider the situation before scoring.

Distinguish intent from mistakes. Some code may intentionally skip tests. Some tests may be exploratory. Ask before assuming violations when the situation is ambiguous.

Integration with Other Skills

tdd — Audit a session run under the canonical tdd inner loop; full-cycle commit history (one test → one implementation → refactor → commit) provides the richest evidence
tdd-agent — Run tdd-verify after an autonomous tdd-agent session to confirm discipline was followed
tdd-pair — Run tdd-verify at the end of a pair session to score compliance and surface improvement areas
tdd-refactor — If tdd-verify finds implementation-coupled tests, invoke tdd-refactor to decouple them safely
tdd-implementer — If tdd-verify finds over-engineering or over-mocking, trace findings back to the GREEN phase for root-cause

Reference files: references/compliance-scoring.md (detailed scoring methodology) | references/ai-antipatterns.md (patterns specific to AI-generated code)

Related Skills

michaelalber/grilling

development

VerifiedTrustedCommunity

Interviews the user relentlessly about a plan, decision, or idea — one question at a time, each with a recommended answer. Shared engine behind "grill-me" and "grill-with-docs". Use on any "grill" trigger phrase or to stress-test thinking. Do NOT use to build the plan; it ends at shared understanding, not implementation.

1SKILL.mdUpdated Jul 23, 2026

michaelalber/grilling

michaelalber/grill-with-docs

testing

VerifiedTrustedCommunity

Runs a relentless interview to sharpen a plan or design, capturing the decisions as ADRs and a glossary along the way. Use when the user wants to be grilled AND wants the session to leave durable domain documentation behind. Do NOT use for a throwaway stress-test with no artifacts; use grill-me instead.

1SKILL.mdUpdated Jul 23, 2026

michaelalber/grill-with-docs

michaelalber/vue-security-review

tools

VerifiedTrustedCommunity

OWASP-based security review of Vue/TypeScript front-ends. Detects framework (Vite/Vue CLI/Nuxt), entry points, and data flows; scans the OWASP Top 10 (2025) mapped to Vue client-side risks (raw-HTML XSS via v-html, URL/protocol injection, bundled secrets, insecure token storage, dependency CVEs, missing CSP, open redirects, router guard bypass); emits an exec summary plus graded findings. Use to audit Vue for vulnerabilities. Not for architecture grading (vue-architecture-checklist).

1SKILL.mdUpdated Jul 20, 2026

michaelalber/vue-security-review

michaelalber/vue-modernization-analyzer

tools

VerifiedTrustedCommunity

Analyzes legacy Vue codebases and produces actionable modernization plans. Primary migration paths include Options API to Composition API, Vue 2 to Vue 3, Vue CLI to Vite, JavaScript to TypeScript, Vue Test Utils/Karma/Mocha to Vitest + Vue Testing Library, legacy Vuex to Pinia, and removed-in-Vue-3 pattern cleanup (filters, event bus, `$listeners`). Does NOT perform the migration — assesses, quantifies risk, and plans.

1SKILL.mdUpdated Jul 20, 2026

michaelalber/vue-modernization-analyzer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/michaelalber/ai-toolkit.git

# Copy into Claude Code skills folder (global)
cp -r ai-toolkit/skills/team/tdd-verify ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

michaelalber/ai-toolkit

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT