Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lidge-jun/tdd

Name: tdd
Author: lidge-jun

tdd/SKILL.md

npx skillsauth add lidge-jun/cli-jaw-skills tdd

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Test-Driven Development (TDD)

Write the test first. Watch it fail. Write minimal code to pass.

If you didn't watch the test fail, you don't know if it tests the right thing.

When to Use

New features, bug fixes, refactoring, behavior changes

Exceptions (confirm with user): throwaway prototypes, generated code, configuration files.

Red-Green-Refactor Cycle

RED — Write Failing Test

Write one minimal test for one behavior.

test('retries failed operations 3 times', async () => {
  let attempts = 0;
  const operation = () => {
    attempts++;
    if (attempts < 3) throw new Error('fail');
    return 'success';
  };

  const result = await retryOperation(operation);

  expect(result).toBe('success');
  expect(attempts).toBe(3);
});

Requirements:

One behavior per test
Name describes the behavior
Real code over mocks

Verify RED

Run the test. Confirm it fails (not errors) for the expected reason — the feature is missing, not a typo.

Test passes immediately? You're testing existing behavior. Fix the test. Test errors? Fix the error, re-run until it fails correctly.

GREEN — Minimal Code

Write the simplest code that passes.

async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
  for (let i = 0; i < 3; i++) {
    try {
      return await fn();
    } catch (e) {
      if (i === 2) throw e;
    }
  }
  throw new Error('unreachable');
}

No extra features, no "improvements" beyond the test.

Verify GREEN

Run the test. Confirm it passes with clean output (no errors or warnings). Confirm other tests still pass.

Test fails? Fix code, not test. Other tests break? Fix now.

REFACTOR

After green only: remove duplication, improve names, extract helpers.

Keep tests green. Add no new behavior. Then write the next failing test.

Good Tests

| Quality | Good | Bad | |---------|------|-----| | Minimal | One thing. "and" in name → split it. | test('validates email and domain and whitespace') | | Clear | Name describes behavior | test('test1') | | Shows intent | Demonstrates desired API | Obscures what code should do |

Example: Bug Fix

Bug: Empty email accepted

RED

test('rejects empty email', async () => {
  const result = await submitForm({ email: '' });
  expect(result.error).toBe('Email required');
});

→ Run: FAIL: expected 'Email required', got undefined ✓

GREEN

function submitForm(data: FormData) {
  if (!data.email?.trim()) return { error: 'Email required' };
  // ...
}

→ Run: PASS ✓

REFACTOR — Extract validation for multiple fields if needed.

The Test-First Rule

Write production code only after a failing test exists for it.

If code was written before its test: delete it and restart with TDD. Keeping pre-written code as "reference" leads to testing-after — you test what you built rather than what's required.

Common Rationalizations

| Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. The test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | | "Deleting X hours is wasteful" | Sunk cost. Unverified code is technical debt. | | "Need to explore first" | Explore freely, then discard and start with TDD. | | "Hard to test" | Hard to test = hard to use. Simplify the design. |

When Stuck

| Problem | Solution | |---------|----------| | Don't know how to test | Write the desired API first. Write the assertion first. Ask user. | | Test too complicated | Design too complicated. Simplify the interface. | | Must mock everything | Code too coupled. Use dependency injection. | | Test setup huge | Extract helpers. Still complex → simplify design. |

Debugging Integration

Bug found → write a failing test reproducing it → follow TDD cycle. The test proves the fix and prevents regression.

Testing Anti-Patterns

When adding mocks or test utilities, read @testing-anti-patterns.md to avoid:

Testing mock behavior instead of real behavior
Adding test-only methods to production classes
Mocking without understanding dependencies

Eval-Driven TDD for AI Code

When AI generates implementation code, the test suite doubles as an evaluation harness.

pass@k Metric

Measures probability that at least one of k generated samples passes all tests.

| Metric | Meaning | |--------|---------| | pass@1 | First attempt passes all tests | | pass@5 | At least one of 5 attempts passes | | pass@10 | At least one of 10 attempts passes |

Workflow:

Write the test suite (RED phase) — this is your eval spec.
Generate k candidate implementations (varying temperature / prompt).
Run each candidate against the full suite. Record pass/fail per candidate.
Select the passing candidate, then REFACTOR as usual.

Eval Harness Design

Tests should be deterministic and fast so they can evaluate many candidates.
Cover functional correctness, edge cases, and performance bounds.
Separate behavioral tests (part of eval) from integration tests (not part of eval).
Keep eval-critical tests tagged or in a dedicated suite for automated scoring.

# Run eval suite against a candidate
npm test -- --testPathPattern="eval/" --bail
# Score: count passing candidates out of k

When to Use Eval-Driven TDD

Generating utility functions, algorithms, data transformers
Comparing prompt strategies for the same spec
Validating AI-assisted refactors against the existing suite

Coverage Integration

Threshold Configuration

{
  "coverageThreshold": {
    "global": { "branches": 80, "functions": 80, "lines": 80, "statements": 80 }
  }
}

Quick Coverage Commands

npx vitest run --coverage          # Vitest
npm test -- --coverage             # Jest
pytest --cov --cov-report=term     # pytest

Review coverage by risk priority: auth, money, mutations, uploads, error paths.

Verification Checklist

Before marking work complete:

[ ] Every new function/method has a test
[ ] Watched each test fail before implementing
[ ] Each failure was for the expected reason
[ ] Wrote minimal code to pass each test
[ ] All tests pass with clean output
[ ] Mocks used only when unavoidable
[ ] Edge cases and errors covered
[ ] Coverage meets project thresholds

lidge-jun/tdd

tdd/SKILL.md

Use when implementing any feature or bugfix, before writing implementation code

4 stars

development

Updated Jun 5, 2026

$ install --global

skillsauth

npx skillsauth add lidge-jun/cli-jaw-skills tdd

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 5, 2026, 6:51 AM132.6s2 files scanned

SKILL.md

name:: tdd
description:: Use when implementing any feature or bugfix, before writing implementation code

Test-Driven Development (TDD)

Write the test first. Watch it fail. Write minimal code to pass.

If you didn't watch the test fail, you don't know if it tests the right thing.

When to Use

New features, bug fixes, refactoring, behavior changes

Exceptions (confirm with user): throwaway prototypes, generated code, configuration files.

Red-Green-Refactor Cycle

RED — Write Failing Test

Write one minimal test for one behavior.

test('retries failed operations 3 times', async () => {
  let attempts = 0;
  const operation = () => {
    attempts++;
    if (attempts < 3) throw new Error('fail');
    return 'success';
  };

  const result = await retryOperation(operation);

  expect(result).toBe('success');
  expect(attempts).toBe(3);
});

Requirements:

One behavior per test
Name describes the behavior
Real code over mocks

Verify RED

Run the test. Confirm it fails (not errors) for the expected reason — the feature is missing, not a typo.

Test passes immediately? You're testing existing behavior. Fix the test. Test errors? Fix the error, re-run until it fails correctly.

GREEN — Minimal Code

Write the simplest code that passes.

async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
  for (let i = 0; i < 3; i++) {
    try {
      return await fn();
    } catch (e) {
      if (i === 2) throw e;
    }
  }
  throw new Error('unreachable');
}

No extra features, no "improvements" beyond the test.

Verify GREEN

Run the test. Confirm it passes with clean output (no errors or warnings). Confirm other tests still pass.

Test fails? Fix code, not test. Other tests break? Fix now.

REFACTOR

After green only: remove duplication, improve names, extract helpers.

Keep tests green. Add no new behavior. Then write the next failing test.

Good Tests

Example: Bug Fix

Bug: Empty email accepted

RED

test('rejects empty email', async () => {
  const result = await submitForm({ email: '' });
  expect(result.error).toBe('Email required');
});

→ Run: FAIL: expected 'Email required', got undefined ✓

GREEN

function submitForm(data: FormData) {
  if (!data.email?.trim()) return { error: 'Email required' };
  // ...
}

→ Run: PASS ✓

REFACTOR — Extract validation for multiple fields if needed.

The Test-First Rule

Write production code only after a failing test exists for it.

If code was written before its test: delete it and restart with TDD. Keeping pre-written code as "reference" leads to testing-after — you test what you built rather than what's required.

Common Rationalizations

When Stuck

Debugging Integration

Bug found → write a failing test reproducing it → follow TDD cycle. The test proves the fix and prevents regression.

Testing Anti-Patterns

When adding mocks or test utilities, read @testing-anti-patterns.md to avoid:

Testing mock behavior instead of real behavior
Adding test-only methods to production classes
Mocking without understanding dependencies

Eval-Driven TDD for AI Code

When AI generates implementation code, the test suite doubles as an evaluation harness.

pass@k Metric

Measures probability that at least one of k generated samples passes all tests.

| Metric | Meaning | |--------|---------| | pass@1 | First attempt passes all tests | | pass@5 | At least one of 5 attempts passes | | pass@10 | At least one of 10 attempts passes |

Workflow:

Write the test suite (RED phase) — this is your eval spec.
Generate k candidate implementations (varying temperature / prompt).
Run each candidate against the full suite. Record pass/fail per candidate.
Select the passing candidate, then REFACTOR as usual.

Eval Harness Design

Tests should be deterministic and fast so they can evaluate many candidates.
Cover functional correctness, edge cases, and performance bounds.
Separate behavioral tests (part of eval) from integration tests (not part of eval).
Keep eval-critical tests tagged or in a dedicated suite for automated scoring.

# Run eval suite against a candidate
npm test -- --testPathPattern="eval/" --bail
# Score: count passing candidates out of k

When to Use Eval-Driven TDD

Generating utility functions, algorithms, data transformers
Comparing prompt strategies for the same spec
Validating AI-assisted refactors against the existing suite

Coverage Integration

Threshold Configuration

{
  "coverageThreshold": {
    "global": { "branches": 80, "functions": 80, "lines": 80, "statements": 80 }
  }
}

Quick Coverage Commands

npx vitest run --coverage          # Vitest
npm test -- --coverage             # Jest
pytest --cov --cov-report=term     # pytest

Review coverage by risk priority: auth, money, mutations, uploads, error paths.

Verification Checklist

Before marking work complete:

[ ] Every new function/method has a test
[ ] Watched each test fail before implementing
[ ] Each failure was for the expected reason
[ ] Wrote minimal code to pass each test
[ ] All tests pass with clean output
[ ] Mocks used only when unavoidable
[ ] Edge cases and errors covered
[ ] Coverage meets project thresholds

Related Skills

lidge-jun/codex-imagegen

tools

VerifiedTrustedCommunity

Use only on the Codex CLI for native image generation or image editing without an API key. Save final PNG files under ~/.cli-jaw/uploads, report web-ready absolute-path markdown, and send to Telegram or Discord only when explicitly requested.

5SKILL.mdUpdated Jul 10, 2026

lidge-jun/codex-imagegen

lidge-jun/repo-map

tools

VerifiedTrustedCommunity

Ranked repository structure map via `cli-jaw map`. Use for codebase overview, structure map, symbol overview, unfamiliar codebase exploration, architecture orientation. Triggers: repo map, structure map, codebase overview, 와꾸, project structure, unfamiliar code.

5SKILL.mdUpdated Jul 7, 2026

lidge-jun/design

tools

VerifiedTrustedCommunity

cli-jaw Design workspace: create, preview, run, and export design pages from the right sidebar. Covers panel UX, direct-write workflow, artifact lifecycle, wireframe generation, design system, and Open Design adapter.

5SKILL.mdUpdated Jul 5, 2026

lidge-jun/dev-devops

development

VerifiedTrustedCommunity

MUST USE for infrastructure and delivery work — container builds, deploy pipelines, Kubernetes, Infrastructure as Code, SRE foundations, edge/serverless, ML infrastructure. Triggers: Dockerfile, K8s manifests, CI/CD pipeline, Terraform/IaC, release/deploy, devops/infra/deploy or release_cd task_tags.

5SKILL.mdUpdated Jun 19, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lidge-jun/cli-jaw-skills.git

# Copy into Claude Code skills folder (global)
cp -r cli-jaw-skills/tdd ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lidge-jun/cli-jaw-skills

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT