skills/testing/test-case-reducer/SKILL.md
Shrinks a failing test input to its minimal form while preserving the failure — delta debugging and structured shrinking to find the smallest input that still triggers the bug. Use when a fuzzer or property test finds a failure with a huge input, when a bug report has an unwieldy reproduction, or when you need a minimal test case for a regression suite.
npx skillsauth add santosomar/general-secure-coding-agent-skills test-case-reducerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Fuzzer found a crash with a 50KB JSON input. The bug is probably triggered by 3 bytes of it. Reduction finds those 3 bytes.
Everything hinges on a boolean oracle: does this (smaller) input still trigger the bug?
def interesting(input_bytes):
proc = subprocess.run(["./parser", "-"], input=input_bytes,
capture_output=True, timeout=5)
return proc.returncode == -11 # SIGSEGV
Make it specific. "Crashes" is too loose — you might reduce to a different crash. Check for the same stack trace, same error message, same assertion.
Works on any sequence (bytes, lines, list elements):
ddmin(input, n=2):
Split input into n chunks.
For each chunk: is input-minus-this-chunk still interesting?
YES → recurse on the reduced input, reset n=2
For each chunk: is this-chunk-alone interesting?
YES → recurse on just this chunk, reset n=2
No reduction at granularity n?
n < len(input) → try n*=2 (finer chunks)
else → done, input is 1-minimal
"1-minimal" means: removing any single element breaks the interestingness. Not globally minimal (that's NP-hard), but small enough.
Bytes-level ddmin on JSON wastes time producing syntactically invalid JSON. Shrink at the grammar level:
| Format | Shrink operations |
| -------- | ----------------------------------------------------------- |
| JSON | Remove object keys; remove array elements; replace strings with ""; replace numbers with 0; replace objects with null |
| Source code | Remove statements; inline variables; simplify expressions (a + b → a); remove function params |
| Lists/trees | Remove children; collapse single-child nodes; replace subtrees with leaves |
| Integers | Try 0, 1, -1, value/2 — smaller magnitude |
| Strings | Try "", first char, each half |
Hypothesis/QuickCheck do this automatically for inputs they generated. For external inputs, do it manually.
Crash input (from fuzzer):
{"users": [
{"id": 1, "name": "alice", "tags": ["a", "b", "c"], "meta": {"x": 1, "y": 2}},
{"id": 2, "name": "bob", "tags": [], "meta": {"x": null}},
{"id": 3, "name": "", "tags": ["d"], "meta": {}}
]}
Interestingness: parser.parse(j) raises KeyError: 'y'.
Structured shrink:
Now: {"users": [{"id": 1, "name": "alice", "tags": ["a","b","c"], "meta": {"x": 1, "y": 2}}]}
Now: {"users": [{"meta": {"x": 1}}]}
1 with 0 → Yes. {"users": [{"meta": {"x": 0}}]}Minimal: {"users": [{"meta": {"x": 0}}]}
The bug: parser assumes meta has "y" when "x" is present. 50KB → 30 bytes. The bug is now obvious.
| Target | Tool |
| ------------------------ | ----------------------------------------------------- |
| C/C++ source (compiler bugs) | creduce, cvise |
| Arbitrary text | halfempty, delta |
| Hypothesis-generated | Built-in — hypothesis.find or just @given shrinking |
| AFL crashes | afl-tmin |
| Custom formats | Write ddmin yourself — it's ~30 lines |
If the bug is non-deterministic (race condition, uninitialized memory), interesting() flips between True and False on the same input. Reduction goes haywire.
Fix: require N-out-of-M successes:
def interesting(inp):
crashes = sum(1 for _ in range(10) if _run_once(inp) == CRASH)
return crashes >= 3 # crashes at least 30% of the time
Slower (10× the runs), but stable.
## Original input
Size: <bytes/lines/elements>
<truncated sample>
## Interestingness test
<exact predicate — error type, stack frame, message pattern>
Stability: <deterministic | N-of-M with N, M>
## Reduction trace
| Step | Operation | Size after | Still interesting |
| ---- | --------- | ---------- | ----------------- |
## Minimal input
Size: <bytes/lines/elements> (reduction: <%> from original)
<full input>
## Verified
<ran minimal input N times, triggered the bug M times>
## Bug hypothesis
<what the minimal input reveals about the bug>
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.