skills/verification/counterexample-to-test-generator/SKILL.md
Converts a model checker counterexample trace into an executable test case in the source language, so the bug found in the model is reproducible (and regression-guarded) in the real code. Use when TLC/NuSMV/Spin finds a violation and you want a failing test before writing the fix.
npx skillsauth add santosomar/general-secure-coding-agent-skills counterexample-to-test-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A model checker trace is a proof that the bug exists — in the model. A test is a proof it exists in the code. The translation is: map each model action back to a code operation, sequence them, and assert the invariant.
This is the model-checking analogue of → bug-reproduction-test-generator.
| Model trace element | Test element |
| ------------------------------ | ----------------------------------------------------------- |
| Initial state | Test setup / fixture |
| Action(p) firing | Call the code function that Action models, as process p |
| Interleaving of actions | Thread scheduling — controlled or forced |
| Final (violating) state | Assertion on the corresponding code state |
The hard part is interleaving. The model checker found a specific schedule that breaks things. Your test has to reproduce that schedule.
| Determinism level | Technique | Reliability | | ------------------------- | ------------------------------------------------------------ | ---------------------- | | Full control | Single-threaded simulation: call each step explicitly in order | 100% — preferred | | Partial control | Barriers/latches between steps to force ordering | High if used carefully | | No control | Run threads concurrently many times, hope for the interleaving | Flaky — last resort |
Always try single-threaded first. If the model actions are Read(p1), Read(p2), Write(p1), Write(p2), you don't need real threads — just call p1.read(); p2.read(); p1.write(); p2.write(); in sequence. The interleaving that matters is the order of operations on shared state, not actual OS threads.
program-to-tlaplus-spec-generator (or equivalent extraction).\E p : Foo(p), use the p TLC chose.Inv == counter = N becomes assert counter == N.TLC trace (from the lost-update counter):
State 1: counter=0, tmp=[p1|->0, p2|->0], pc=[p1|->"start", p2|->"start"]
State 2: <Read(p1)> counter=0, tmp=[p1|->0, p2|->0], pc=[p1|->"write", p2|->"start"]
State 3: <Read(p2)> counter=0, tmp=[p1|->0, p2|->0], pc=[p1|->"write", p2|->"write"]
State 4: <Write(p1)> counter=1, ...
State 5: <Write(p2)> counter=1 ← Inv violated: 2 increments, counter should be 2
Action sequence: Read(p1), Read(p2), Write(p1), Write(p2).
Code mapping:
Read(p) ↔ first half of increment(): tmp = counterWrite(p) ↔ second half of increment(): lock(); counter = tmp + 1; unlock()Problem: increment() is one function in code, two actions in the model. To reproduce the interleaving, split it:
Test (Go):
func TestLostUpdate_FromTLC(t *testing.T) {
counter := 0
var mu sync.Mutex
// Simulate the TLC trace: Read(p1), Read(p2), Write(p1), Write(p2)
// Split increment() into its two atomic halves.
tmp1 := counter // Read(p1) -- reads 0
tmp2 := counter // Read(p2) -- reads 0
mu.Lock(); counter = tmp1 + 1; mu.Unlock() // Write(p1) -- counter=1
mu.Lock(); counter = tmp2 + 1; mu.Unlock() // Write(p2) -- counter=1
// Invariant: two increments → counter == 2
if counter != 2 {
t.Fatalf("lost update: 2 increments, counter=%d (TLC trace reproduced)", counter)
}
}
Test fails with counter=1. Bug reproduced, deterministically, no real threads.
The trace violated the model invariant but the test passes. Three reasons:
| Cause | How to tell | Fix | | -------------------------------------- | ------------------------------------------------------- | ------------------------------------------- | | Model is more abstract than code | Code has a check the model doesn't | Fix the model — it was over-approximating | | Wrong code mapping | You called the wrong function for an action | Re-check the extraction mapping | | Atomicity mismatch | Model treats X as one step; code has finer interleaving | Refine the model's actions |
A test that doesn't reproduce is good news — it means the model found a bug in itself, not the code. Fix the model.
## Action sequence (from trace)
1. <Action(params)>
2. ...
## Code mapping
<Action> ↔ <function/code region>
## Schedule strategy
<single-threaded simulation | barriers | stress loop> — <why>
## Test
<code block — executable>
## Expected
FAIL before fix: <what assertion trips>
PASS after fix: <same assertion, now holds>
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.