Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

santosomar/test-driven-generation

Name: test-driven-generation
Author: santosomar

skills/testing/test-driven-generation/SKILL.md

npx skillsauth add santosomar/general-secure-coding-agent-skills test-driven-generation

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Test-Driven Generation

TDD isn't "write tests too." It's test before code, minimal code to pass, repeat. Each cycle is tiny. The tests design the API.

The cycle — do not skip steps

┌─────────────────────────────────────────────────┐
│  RED: Write ONE failing test.                   │
│       Run it. Confirm it fails for the RIGHT    │
│       reason (assertion, not ImportError).      │
└──────────────────┬──────────────────────────────┘
                   ▼
┌─────────────────────────────────────────────────┐
│  GREEN: Write the MINIMUM code to pass.         │
│         Hardcoding is fine. Duplication is fine.│
│         Just make it green.                     │
└──────────────────┬──────────────────────────────┘
                   ▼
┌─────────────────────────────────────────────────┐
│  REFACTOR: Clean up. Tests stay green.          │
│            NOW remove duplication.              │
│            NOW generalize the hardcoded value.  │
└──────────────────┬──────────────────────────────┘
                   │
                   └──► back to RED with next test

One test at a time. One small step of code. The discipline is the point.

Red — the test drives the API

Write the test as if the code exists. The test is the first user of the API:

def test_cart_empty_total_is_zero():
    cart = Cart()
    assert cart.total() == 0

Cart doesn't exist. total() doesn't exist. That's fine — the test just decided that Cart() takes no args and total() returns a number. Design through usage.

Run it. Fails: NameError: Cart. Wrong failure — we want AssertionError, not NameError. Create the shell:

class Cart:
    def total(self):
        pass

Run again: assert None == 0 — fails. Right failure. Now we're red.

Green — embarrassingly minimal

class Cart:
    def total(self):
        return 0

return 0. Hardcoded. Passes. Resist the urge to write return sum(item.price for item in self.items). You have one test. It wants 0. Give it 0.

Why? Because the next test forces the generalization, and you end up with exactly as much code as the tests demand — no more.

Triangulation — the next test forces generality

def test_cart_with_one_item_total_is_item_price():
    cart = Cart()
    cart.add(Item(price=10))
    assert cart.total() == 10

Now return 0 fails. Now we need items. Now we generalize:

class Cart:
    def __init__(self):
        self._items = []
    def add(self, item):
        self._items.append(item)
    def total(self):
        return sum(i.price for i in self._items)

Both tests green. The generalization was forced by the second test, not guessed from the first.

Refactor — with a safety net

Green means refactor is safe — tests will catch breakage. Extract methods, rename, restructure. Run tests after each micro-change.

# Maybe later: total grows complex with discounts, tax.
# Refactor when green:
def total(self):
    subtotal = self._subtotal()
    return subtotal - self._discount(subtotal) + self._tax(subtotal)

Tests still green → refactor is safe. Test went red → revert, try smaller step.

Worked cycle transcript

Requirement: Rate limiter — max N calls per window, returns True if allowed.

RED   test_first_call_allowed
      → assert limiter.allow("user1") is True
      → fails: RateLimiter doesn't exist

GREEN class RateLimiter: def allow(self, k): return True
      → passes (always True is enough for one test)

RED   test_call_over_limit_denied
      → limiter = RateLimiter(max_calls=2, window_s=60)
      → limiter.allow("u"); limiter.allow("u"); assert limiter.allow("u") is False
      → fails: returns True (hardcoded)

GREEN Add self._counts dict. Track. return count < max.
      → passes. Two tests green.

RED   test_different_keys_independent
      → limiter.allow("a"); limiter.allow("a"); assert limiter.allow("b") is True
      → already passes! (dict is per-key). Skip — not a red test.

RED   test_window_resets
      → fill limit at t=0. At t=61, allow again.
      → fails: no time logic.

GREEN Inject clock. Store timestamps. Prune old.
      → passes.

REFACTOR  _counts is getting messy. Extract _prune_expired().
          Tests green throughout.

Four tests. Each added exactly the code needed. No window_resets logic existed until a test demanded it.

Anti-patterns

| Anti-pattern | Why bad | Instead | | --------------------------------------------- | ------------------------------------------------- | ------------------------------------------ | | Write 10 tests, then all the code | No red-green feedback. Tests might all be wrong. | One at a time. | | Write code, then tests that pass | That's test-after. Tests fit the code, not spec. | Test first. Always. | | Skip running the red test | You don't know if it can fail. | Run it. See it fail. See why it fails. | | Over-implement on green | Untested code. Speculative generality. | Minimum to pass. Next test generalizes. | | Refactor on red | No safety net. Can't tell refactor bugs from feature bugs. | Only refactor on green. | | Write a test you can't make fail | Test is tautological or already covered. | Skip it or delete it. |

Do not

Do not write the test and the code in the same step. The red confirmation is non-negotiable — it proves the test can fail.
Do not generalize before triangulation. return 0 passing one test is correct TDD. return sum(...) passing one test is speculation.
Do not let the test tell you how to implement. assert cart._items == [item] tests structure. assert cart.total() == 10 tests behavior. Test behavior.
Do not skip refactor because "it works." Green without refactor accumulates cruft. Refactor is one-third of the cycle.
Do not TDD through a big design problem without sketching first. TDD discovers design through small steps; it doesn't replace thinking about the large shape.

Output format

## Requirement
<feature being built>

## Test list (planned — gets revised as you go)
- [ ] <test 1 — simplest case>
- [ ] <test 2 — next simplest>
- [ ] <test 3 — first interesting constraint>
- ...

## Cycle log
### Cycle 1
RED:   <test code>  — fails with: <failure>
GREEN: <minimal impl>  — passes
REFACTOR: <what changed, or "none">

### Cycle 2
...

## Final
Tests: <N>  All green.
Coverage: <%>  (should be ~100% — every line was demanded by a test)

## Deferred
<tests on the list that turned out unnecessary or out of scope>

santosomar/test-driven-generation

skills/testing/test-driven-generation/SKILL.md

Generates code test-first — writes a failing test from a requirement, then generates the minimal code to pass it, then refactors, in strict red-green-refactor cycles. Use when building new features where the spec is clear, when the design is uncertain and you want tests to drive it, or when you need high confidence in coverage from the start.

development

Updated Apr 13, 2026

$ install --global

skillsauth

npx skillsauth add santosomar/general-secure-coding-agent-skills test-driven-generation

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 13, 2026, 4:38 AM70.1s1 file scanned

SKILL.md

name:: test-driven-generation
description:: Generates code test-first — writes a failing test from a requirement, then generates the minimal code to pass it, then refactors, in strict red-green-refactor cycles. Use when building new features where the spec is clear, when the design is uncertain and you want tests to drive it, or when you need high confidence in coverage from the start.
license:: Apache-2.0
category:: testing
suite:: general-secure-coding-agent-skills
version:: 0.3.0
related:: unit-test-generator, scenario-generator

Test-Driven Generation

TDD isn't "write tests too." It's test before code, minimal code to pass, repeat. Each cycle is tiny. The tests design the API.

The cycle — do not skip steps

┌─────────────────────────────────────────────────┐
│  RED: Write ONE failing test.                   │
│       Run it. Confirm it fails for the RIGHT    │
│       reason (assertion, not ImportError).      │
└──────────────────┬──────────────────────────────┘
                   ▼
┌─────────────────────────────────────────────────┐
│  GREEN: Write the MINIMUM code to pass.         │
│         Hardcoding is fine. Duplication is fine.│
│         Just make it green.                     │
└──────────────────┬──────────────────────────────┘
                   ▼
┌─────────────────────────────────────────────────┐
│  REFACTOR: Clean up. Tests stay green.          │
│            NOW remove duplication.              │
│            NOW generalize the hardcoded value.  │
└──────────────────┬──────────────────────────────┘
                   │
                   └──► back to RED with next test

One test at a time. One small step of code. The discipline is the point.

Red — the test drives the API

Write the test as if the code exists. The test is the first user of the API:

def test_cart_empty_total_is_zero():
    cart = Cart()
    assert cart.total() == 0

Cart doesn't exist. total() doesn't exist. That's fine — the test just decided that Cart() takes no args and total() returns a number. Design through usage.

Run it. Fails: NameError: Cart. Wrong failure — we want AssertionError, not NameError. Create the shell:

class Cart:
    def total(self):
        pass

Run again: assert None == 0 — fails. Right failure. Now we're red.

Green — embarrassingly minimal

class Cart:
    def total(self):
        return 0

return 0. Hardcoded. Passes. Resist the urge to write return sum(item.price for item in self.items). You have one test. It wants 0. Give it 0.

Why? Because the next test forces the generalization, and you end up with exactly as much code as the tests demand — no more.

Triangulation — the next test forces generality

def test_cart_with_one_item_total_is_item_price():
    cart = Cart()
    cart.add(Item(price=10))
    assert cart.total() == 10

Now return 0 fails. Now we need items. Now we generalize:

class Cart:
    def __init__(self):
        self._items = []
    def add(self, item):
        self._items.append(item)
    def total(self):
        return sum(i.price for i in self._items)

Both tests green. The generalization was forced by the second test, not guessed from the first.

Refactor — with a safety net

Green means refactor is safe — tests will catch breakage. Extract methods, rename, restructure. Run tests after each micro-change.

# Maybe later: total grows complex with discounts, tax.
# Refactor when green:
def total(self):
    subtotal = self._subtotal()
    return subtotal - self._discount(subtotal) + self._tax(subtotal)

Tests still green → refactor is safe. Test went red → revert, try smaller step.

Worked cycle transcript

Requirement: Rate limiter — max N calls per window, returns True if allowed.

RED   test_first_call_allowed
      → assert limiter.allow("user1") is True
      → fails: RateLimiter doesn't exist

GREEN class RateLimiter: def allow(self, k): return True
      → passes (always True is enough for one test)

RED   test_call_over_limit_denied
      → limiter = RateLimiter(max_calls=2, window_s=60)
      → limiter.allow("u"); limiter.allow("u"); assert limiter.allow("u") is False
      → fails: returns True (hardcoded)

GREEN Add self._counts dict. Track. return count < max.
      → passes. Two tests green.

RED   test_different_keys_independent
      → limiter.allow("a"); limiter.allow("a"); assert limiter.allow("b") is True
      → already passes! (dict is per-key). Skip — not a red test.

RED   test_window_resets
      → fill limit at t=0. At t=61, allow again.
      → fails: no time logic.

GREEN Inject clock. Store timestamps. Prune old.
      → passes.

REFACTOR  _counts is getting messy. Extract _prune_expired().
          Tests green throughout.

Four tests. Each added exactly the code needed. No window_resets logic existed until a test demanded it.

Anti-patterns

Do not

Do not write the test and the code in the same step. The red confirmation is non-negotiable — it proves the test can fail.
Do not generalize before triangulation. return 0 passing one test is correct TDD. return sum(...) passing one test is speculation.
Do not let the test tell you how to implement. assert cart._items == [item] tests structure. assert cart.total() == 10 tests behavior. Test behavior.
Do not skip refactor because "it works." Green without refactor accumulates cruft. Refactor is one-third of the cycle.
Do not TDD through a big design problem without sketching first. TDD discovers design through small steps; it doesn't replace thinking about the large shape.

Output format

## Requirement
<feature being built>

## Test list (planned — gets revised as you go)
- [ ] <test 1 — simplest case>
- [ ] <test 2 — next simplest>
- [ ] <test 3 — first interesting constraint>
- ...

## Cycle log
### Cycle 1
RED:   <test code>  — fails with: <failure>
GREEN: <minimal impl>  — passes
REFACTOR: <what changed, or "none">

### Cycle 2
...

## Final
Tests: <N>  All green.
Coverage: <%>  (should be ~100% — every line was demanded by a test)

## Deferred
<tests on the list that turned out unnecessary or out of scope>

Related Skills

santosomar/verified-pseudocode-extractor

development

VerifiedTrustedCommunity

Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.

SKILL.mdUpdated Apr 13, 2026

santosomar/verified-pseudocode-extractor

santosomar/tlaplus-spec-generator

development

VerifiedTrustedCommunity

Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-spec-generator

santosomar/tlaplus-model-reduction

testing

VerifiedTrustedCommunity

Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-model-reduction

santosomar/tlaplus-guided-code-repair

development

VerifiedTrustedCommunity

TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-guided-code-repair

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/santosomar/general-secure-coding-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r general-secure-coding-agent-skills/skills/testing/test-driven-generation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

santosomar/general-secure-coding-agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT