skills/testing/test-oracle-generator/SKILL.md
Generates test oracles — the "expected output" part of a test — by choosing among reference implementations, invariants, inverse functions, or differential comparison when the correct answer isn't obvious. Use when the hard part of testing is knowing what the right answer is, not generating inputs.
npx skillsauth add santosomar/general-secure-coding-agent-skills test-oracle-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
An oracle answers: "is this output correct?" For add(2, 3), the oracle is trivial: == 5. For optimize_route(cities), what's the right answer? The oracle is the hard part.
| Oracle type | How it works | When available |
| ------------------------ | ----------------------------------------------------- | ------------------------------------------ |
| Known value | Hardcoded expected output | Small inputs you can compute by hand |
| Reference implementation | Compare against a trusted other implementation | A slow/simple version exists |
| Inverse function | decode(encode(x)) == x | There's a round-trip |
| Invariant / property | Output satisfies a predicate | You know properties, not values |
| Metamorphic relation | Multiple runs relate in a known way | → metamorphic-test-generator |
| Differential | N implementations should agree | Multiple implementations exist |
| Regression (golden) | Output matches a saved previous output | You trust the current behavior |
Can you compute the answer by hand for small inputs?
│
├─ YES → Known-value oracle for those inputs.
│ Still need something for random/large inputs — continue ↓
│
Is there a simpler/slower implementation that's obviously correct?
│
├─ YES → Reference implementation oracle.
│ `assert fast(x) == slow_obvious(x)` for random x.
│
Is there an inverse? (encode/decode, compress/decompress, serialize/parse)
│
├─ YES → Round-trip oracle. `assert decode(encode(x)) == x`
│
Do you know properties the output must have, even if not the exact value?
│
├─ YES → Invariant oracle.
│ `result = sort(x); assert is_sorted(result) and is_permutation(result, x)`
│
None of the above?
│
└─ Regression oracle (golden files) as a last resort.
Captures current behavior — not correctness.
Under test: fast_median(nums) — O(n) quickselect-based.
Oracle: The obvious O(n log n) version:
def slow_median(nums):
s = sorted(nums)
n = len(s)
return s[n // 2] if n % 2 else (s[n//2 - 1] + s[n//2]) / 2
from hypothesis import given, strategies as st
@given(st.lists(st.integers(), min_size=1))
def test_fast_matches_slow(nums):
assert fast_median(nums) == slow_median(nums)
slow_median is three lines and obviously correct. fast_median is 40 lines of partitioning. Any disagreement is a bug in fast_median.
Under test: schedule(tasks, workers) — assigns tasks to workers, minimizing makespan. NP-hard; you can't compute the optimal answer.
What you can check:
@given(tasks=task_lists(), workers=st.integers(1, 10))
def test_schedule_is_valid(tasks, workers):
assignment = schedule(tasks, workers)
# Every task assigned exactly once
assigned = [t for w in assignment.values() for t in w]
assert sorted(assigned) == sorted(tasks)
# No worker index out of range
assert all(0 <= w < workers for w in assignment)
# Makespan is no worse than the trivial round-robin
# (not optimal — but if we're worse than round-robin, something's very wrong)
trivial = makespan(round_robin(tasks, workers))
assert makespan(assignment) <= trivial
Three invariants. None of them is "the answer is X." All of them catch real bugs.
Under test: serialize(obj) -> bytes and deserialize(bytes) -> obj.
@given(arbitrary_objects())
def test_roundtrip(obj):
assert deserialize(serialize(obj)) == obj
One line. Tests both functions against each other. Catches: field dropped in serialize, wrong type on deserialize, encoding mismatches.
When nothing else works: run once, save the output, assert future runs match.
def test_render_matches_golden():
output = render(template, data)
golden_path = Path("test/golden/render.txt")
if UPDATE_GOLDEN:
golden_path.write_text(output)
assert output == golden_path.read_text()
This tests stability, not correctness. The first golden might be wrong. Use only when:
assert f(x) == f(x) tests nothing. Even subtler: reference implementations that share a buggy helper with the fast version.assert len(result) >= 0 — always true, useless.slow_median above — does it handle the even-length case right? (It does. But check.)## Under test
<function — and why the oracle is non-trivial>
## Oracle type
<known-value | reference | inverse | invariant | metamorphic | differential | golden>
## Why this oracle
<decision-flow reasoning — what cheaper oracles were unavailable>
## Oracle implementation
<code — the reference impl, the invariant predicates, the round-trip, etc>
## Test
<code — uses the oracle>
## Oracle validity
<why you trust the oracle — it's simpler, it's from a different codebase, the invariant is from the spec>
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.