skills/debugging/bug-reproduction-test-generator/SKILL.md
Creates minimal, reproducible test cases from bug reports to confirm the defect before and after a fix. Use when a bug is reported without a failing test, when the user needs a regression test for a fix, or when the user asks to reproduce a bug as a test.
npx skillsauth add santosomar/general-secure-coding-agent-skills bug-reproduction-test-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A bug you can't reliably reproduce is a bug you can't reliably fix. This skill turns prose ("it crashes when I upload a big file") into an executable test that is red on the buggy code and will be green once it's fixed.
| Input available | First move | | ----------------------------------------- | ------------------------------------------------------- | | Stack trace | Top project-code frame → target function; exception → assertion | | "Steps to reproduce" prose | Translate each step to a setup line; last step → action | | Log excerpt | Find the first anomalous line; grep for the producer | | Failing production request (curl, HAR) | Replay against a test harness; wrap in an assertion | | Screenshot / "it looks wrong" | Not machine-checkable — ask for the data, not the UI | | "It's intermittent" | Do not write a test yet — first stabilize (Step 4) |
Every reproduction test is (setup, action, assertion). Extract each:
Write a test that reproduces. It will be too big. Minimize mechanically:
| Dimension | Minimization | | ------------ | -------------------------------------------------------------------------- | | Setup state | Delete one setup line → still red? Keep deleting. Add back when it goes green. | | Input size | Binary-bisect the input (half the file, half the list) until it goes green | | Dependencies | Inline mocks; if removing a mock turns it green, that mock was load-bearing |
The target is: a test so small that when it fails, the fault is obvious from reading the test alone.
If the bug is non-deterministic, the test can't simply assert correct behavior — it'll pass by luck half the time.
| Source of flakiness | Stabilization | | --------------------- | ------------------------------------------------------------- | | Timing / sleep-based | Replace wall-clock with an injected clock you advance manually | | Thread interleaving | Force the bad interleaving with a latch/barrier; assert on the state you now deterministically reach | | Network | Mock the transport; inject the specific response/error that triggers the bug | | Test order | Run the test in isolation; if it passes alone, the bug is test pollution, not product code | | Randomness | Seed the RNG with the seed that reproduces (if logged); if not, loop 1000× and assert all pass |
If you cannot stabilize, do not ship a test that retries. A retry loop in a reproduction test is an admission you don't understand the bug.
Report: "Exporting a CSV with a quote in a column value produces a file Excel can't open."
Triple extraction:
export_csv(rows)" → "" and field is quoted)Minimal test:
def test_csv_export_escapes_quotes():
rows = [{"name": 'say "hi"'}]
out = export_csv(rows)
assert out == 'name\r\n"say ""hi"""\r\n'
Three lines of meaning. When this fails, the reader immediately sees: quotes aren't escaped.
pytest.fail() after the call, with a timeout decorator. The test passes if it reaches fail() within the timeout.close()/__exit__ was called on the resource mock.requirement-enhancer.Once red: → bug-localization (test is the signal) → bug-to-patch-generator (test is the oracle).
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.