skills/code-quality/semantic-equivalence-verifier/SKILL.md
Proves two program fragments semantically equivalent using symbolic reasoning — stronger than testing, applicable when differential testing is insufficient or impossible. Use when behavior preservation must be proven rather than sampled, when the input space is too large to enumerate, or when a transformation needs a correctness argument.
npx skillsauth add santosomar/general-secure-coding-agent-skills semantic-equivalence-verifierInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
→ behavior-preservation-checker tests on a sample; this skill proves over the full input space. Use when "we ran it on 500 inputs" isn't enough.
| Situation | Why testing fails |
| ------------------------------------------------------ | --------------------------------------------------- |
| Input space is infinite and adversarial | Attacker picks the input you didn't test (crypto, parsers, sanitizers) |
| Rare edge case matters (overflow, boundary) | 1-in-2³² inputs — fuzzing takes forever |
| Regulatory / safety requirement | "We tested it" isn't certifiable evidence |
| The transformation is algebraic | x*2 ↔ x<<1 — easy to prove, tedious to test |
| Shape | Strategy |
| -------------------------------------- | --------------------------------------------------------------------- |
| Straight-line, no loops | Symbolic execution → discharge old_out == new_out with an SMT solver |
| Loop, both versions have same structure| Show loop bodies equivalent + same termination → induction |
| Loop, restructured (unrolled, fused) | Find a simulation relation between iteration states |
| Recursive | Structural induction on the recursion argument |
| Different algorithms, same spec | Can't prove A≡B directly. Prove A⊨spec ∧ B⊨spec separately. |
assume(pre(x)).assert(old_out != new_out).For loops: replace the loop with its invariant. You need the invariant to be the same for both versions (or for one to imply the other).
Old:
int abs_old(int x) {
if (x < 0) return -x;
return x;
}
New (branchless):
int abs_new(int x) {
int mask = x >> 31;
return (x + mask) ^ mask;
}
Encode (SMT-LIB, bitvector theory):
(declare-const x (_ BitVec 32))
(define-fun old () (_ BitVec 32)
(ite (bvslt x #x00000000) (bvneg x) x))
(define-fun new () (_ BitVec 32)
(let ((mask (bvashr x #x0000001f)))
(bvxor (bvadd x mask) mask)))
(assert (not (= old new)))
(check-sat)
Solve: unsat. Equivalent for all 32-bit inputs. ∎
But wait — what about INT_MIN? -INT_MIN overflows in C (UB). The SMT encoding uses modular arithmetic, so bvneg(INT_MIN) = INT_MIN, and both versions return INT_MIN. Equivalent, including at the UB point — they both return the wrong thing. The proof shows equivalence, not correctness. If you need correctness, add (assume (not (= x INT_MIN))) or prove against the spec result ≥ 0.
| Failure | What it means | Next move | | ------------------------------------------ | -------------------------------------------------- | ------------------------------------------ | | SAT — solver found a counterexample | They're NOT equivalent | Look at the model. Is it a real input, or outside the precondition you forgot to state? | | Unknown / timeout | Solver couldn't decide | Add lemmas; break into smaller pieces; try a different solver | | Can't encode the construct | Heap, I/O, unbounded recursion | Abstract it — model the heap as an uninterpreted function; axiomatize I/O |
behavior-preservation-checker with heavy fuzzingpython-to-dafny-translator, CBMC's C frontend) when possible.## Claim
old: <fragment>
new: <fragment>
Equivalent under precondition: <P(x)>
## Proof strategy
<SMT | induction | simulation>
## Result
<PROVEN EQUIVALENT | COUNTEREXAMPLE: x=<val>, old=<o>, new=<n> | UNKNOWN — <why>>
## Trust base
<what the proof relies on: solver correctness, translation fidelity, axioms assumed>
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.