tdd/SKILL.md
Use when implementing any feature or bugfix, before writing implementation code
npx skillsauth add lidge-jun/cli-jaw-skills tddInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Write the test first. Watch it fail. Write minimal code to pass.
If you didn't watch the test fail, you don't know if it tests the right thing.
Exceptions (confirm with user): throwaway prototypes, generated code, configuration files.
Write one minimal test for one behavior.
test('retries failed operations 3 times', async () => {
let attempts = 0;
const operation = () => {
attempts++;
if (attempts < 3) throw new Error('fail');
return 'success';
};
const result = await retryOperation(operation);
expect(result).toBe('success');
expect(attempts).toBe(3);
});
Requirements:
Run the test. Confirm it fails (not errors) for the expected reason — the feature is missing, not a typo.
Test passes immediately? You're testing existing behavior. Fix the test. Test errors? Fix the error, re-run until it fails correctly.
Write the simplest code that passes.
async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
for (let i = 0; i < 3; i++) {
try {
return await fn();
} catch (e) {
if (i === 2) throw e;
}
}
throw new Error('unreachable');
}
No extra features, no "improvements" beyond the test.
Run the test. Confirm it passes with clean output (no errors or warnings). Confirm other tests still pass.
Test fails? Fix code, not test. Other tests break? Fix now.
After green only: remove duplication, improve names, extract helpers.
Keep tests green. Add no new behavior. Then write the next failing test.
| Quality | Good | Bad |
|---------|------|-----|
| Minimal | One thing. "and" in name → split it. | test('validates email and domain and whitespace') |
| Clear | Name describes behavior | test('test1') |
| Shows intent | Demonstrates desired API | Obscures what code should do |
Bug: Empty email accepted
RED
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
→ Run: FAIL: expected 'Email required', got undefined ✓
GREEN
function submitForm(data: FormData) {
if (!data.email?.trim()) return { error: 'Email required' };
// ...
}
→ Run: PASS ✓
REFACTOR — Extract validation for multiple fields if needed.
Write production code only after a failing test exists for it.
If code was written before its test: delete it and restart with TDD. Keeping pre-written code as "reference" leads to testing-after — you test what you built rather than what's required.
| Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. The test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | | "Deleting X hours is wasteful" | Sunk cost. Unverified code is technical debt. | | "Need to explore first" | Explore freely, then discard and start with TDD. | | "Hard to test" | Hard to test = hard to use. Simplify the design. |
| Problem | Solution | |---------|----------| | Don't know how to test | Write the desired API first. Write the assertion first. Ask user. | | Test too complicated | Design too complicated. Simplify the interface. | | Must mock everything | Code too coupled. Use dependency injection. | | Test setup huge | Extract helpers. Still complex → simplify design. |
Bug found → write a failing test reproducing it → follow TDD cycle. The test proves the fix and prevents regression.
When adding mocks or test utilities, read @testing-anti-patterns.md to avoid:
When AI generates implementation code, the test suite doubles as an evaluation harness.
Measures probability that at least one of k generated samples passes all tests.
| Metric | Meaning | |--------|---------| | pass@1 | First attempt passes all tests | | pass@5 | At least one of 5 attempts passes | | pass@10 | At least one of 10 attempts passes |
Workflow:
# Run eval suite against a candidate
npm test -- --testPathPattern="eval/" --bail
# Score: count passing candidates out of k
{
"coverageThreshold": {
"global": { "branches": 80, "functions": 80, "lines": 80, "statements": 80 }
}
}
npx vitest run --coverage # Vitest
npm test -- --coverage # Jest
pytest --cov --cov-report=term # pytest
Review coverage by risk priority: auth, money, mutations, uploads, error paths.
Before marking work complete:
development
Native Web UI structured renderer schemas for compose-block drafts, search-results cards, dataframe tables, chart-json charts, and diff output
tools
Unified search hub. Route any web/real-time/X lookup through a 4-tier escalation: built-in web search → cli-jaw browser CDP → progrok Grok OAuth → web-ai (Grok Expert / GPT Pro). Use for: search, 검색, web search, latest news, real-time info, X/Twitter, fact lookup, deep research.
development
UI/UX intent discovery, design vocabulary, product personalities, UX state patterns, typography line break judgment, favicon/product logo design, and logo trust section design. Use when user design direction is vague, when building onboarding/empty/error states, when setting up favicons or product logos, or when referencing a product aesthetic.
development
Canonical owner of module boundary rules, circular dependency detection/prevention, implicit coupling taxonomy, barrel/re-export discipline, and boundary-only defensive programming. Referenced by dev, dev-code-reviewer, dev-backend, dev-frontend stubs.