Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

santosomar/test-oracle-generator

Name: test-oracle-generator
Author: santosomar

skills/testing/test-oracle-generator/SKILL.md

npx skillsauth add santosomar/general-secure-coding-agent-skills test-oracle-generator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Test Oracle Generator

An oracle answers: "is this output correct?" For add(2, 3), the oracle is trivial: == 5. For optimize_route(cities), what's the right answer? The oracle is the hard part.

Oracle types — pick the cheapest that works

| Oracle type | How it works | When available | | ------------------------ | ----------------------------------------------------- | ------------------------------------------ | | Known value | Hardcoded expected output | Small inputs you can compute by hand | | Reference implementation | Compare against a trusted other implementation | A slow/simple version exists | | Inverse function | decode(encode(x)) == x | There's a round-trip | | Invariant / property | Output satisfies a predicate | You know properties, not values | | Metamorphic relation | Multiple runs relate in a known way | → metamorphic-test-generator | | Differential | N implementations should agree | Multiple implementations exist | | Regression (golden) | Output matches a saved previous output | You trust the current behavior |

Decision flow

Can you compute the answer by hand for small inputs?
│
├─ YES → Known-value oracle for those inputs.
│        Still need something for random/large inputs — continue ↓
│
Is there a simpler/slower implementation that's obviously correct?
│
├─ YES → Reference implementation oracle.
│        `assert fast(x) == slow_obvious(x)` for random x.
│
Is there an inverse?  (encode/decode, compress/decompress, serialize/parse)
│
├─ YES → Round-trip oracle.  `assert decode(encode(x)) == x`
│
Do you know properties the output must have, even if not the exact value?
│
├─ YES → Invariant oracle.
│        `result = sort(x); assert is_sorted(result) and is_permutation(result, x)`
│
None of the above?
│
└─ Regression oracle (golden files) as a last resort.
   Captures current behavior — not correctness.

Worked example — reference implementation

Under test: fast_median(nums) — O(n) quickselect-based.

Oracle: The obvious O(n log n) version:

def slow_median(nums):
    s = sorted(nums)
    n = len(s)
    return s[n // 2] if n % 2 else (s[n//2 - 1] + s[n//2]) / 2

from hypothesis import given, strategies as st

@given(st.lists(st.integers(), min_size=1))
def test_fast_matches_slow(nums):
    assert fast_median(nums) == slow_median(nums)

slow_median is three lines and obviously correct. fast_median is 40 lines of partitioning. Any disagreement is a bug in fast_median.

Worked example — invariant oracle

Under test: schedule(tasks, workers) — assigns tasks to workers, minimizing makespan. NP-hard; you can't compute the optimal answer.

What you can check:

@given(tasks=task_lists(), workers=st.integers(1, 10))
def test_schedule_is_valid(tasks, workers):
    assignment = schedule(tasks, workers)

    # Every task assigned exactly once
    assigned = [t for w in assignment.values() for t in w]
    assert sorted(assigned) == sorted(tasks)

    # No worker index out of range
    assert all(0 <= w < workers for w in assignment)

    # Makespan is no worse than the trivial round-robin
    # (not optimal — but if we're worse than round-robin, something's very wrong)
    trivial = makespan(round_robin(tasks, workers))
    assert makespan(assignment) <= trivial

Three invariants. None of them is "the answer is X." All of them catch real bugs.

Worked example — inverse

Under test: serialize(obj) -> bytes and deserialize(bytes) -> obj.

@given(arbitrary_objects())
def test_roundtrip(obj):
    assert deserialize(serialize(obj)) == obj

One line. Tests both functions against each other. Catches: field dropped in serialize, wrong type on deserialize, encoding mismatches.

Regression (golden) — the weakest oracle

When nothing else works: run once, save the output, assert future runs match.

def test_render_matches_golden():
    output = render(template, data)
    golden_path = Path("test/golden/render.txt")
    if UPDATE_GOLDEN:
        golden_path.write_text(output)
    assert output == golden_path.read_text()

This tests stability, not correctness. The first golden might be wrong. Use only when:

You've manually verified the golden is correct, once.
The function is mostly-frozen (change = deliberate).
Better oracles are genuinely unavailable.

Do not

Do not use the code-under-test as its own oracle. assert f(x) == f(x) tests nothing. Even subtler: reference implementations that share a buggy helper with the fast version.
Do not default to golden files. They test "same as yesterday," not "correct." A bug that was always there stays there.
Do not write invariants so weak they never fail. assert len(result) >= 0 — always true, useless.
Do not forget that reference implementations can be wrong too. slow_median above — does it handle the even-length case right? (It does. But check.)

Output format

## Under test
<function — and why the oracle is non-trivial>

## Oracle type
<known-value | reference | inverse | invariant | metamorphic | differential | golden>

## Why this oracle
<decision-flow reasoning — what cheaper oracles were unavailable>

## Oracle implementation
<code — the reference impl, the invariant predicates, the round-trip, etc>

## Test
<code — uses the oracle>

## Oracle validity
<why you trust the oracle — it's simpler, it's from a different codebase, the invariant is from the spec>

santosomar/test-oracle-generator

skills/testing/test-oracle-generator/SKILL.md

Generates test oracles — the "expected output" part of a test — by choosing among reference implementations, invariants, inverse functions, or differential comparison when the correct answer isn't obvious. Use when the hard part of testing is knowing what the right answer is, not generating inputs.

testing

Updated Apr 13, 2026

$ install --global

skillsauth

npx skillsauth add santosomar/general-secure-coding-agent-skills test-oracle-generator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 13, 2026, 4:39 AM77.1s1 file scanned

SKILL.md

name:: test-oracle-generator
description:: Generates test oracles — the "expected output" part of a test — by choosing among reference implementations, invariants, inverse functions, or differential comparison when the correct answer isn't obvious. Use when the hard part of testing is knowing what the right answer is, not generating inputs.
license:: Apache-2.0
category:: testing
suite:: general-secure-coding-agent-skills
version:: 0.3.0
related:: unit-test-generator, metamorphic-test-generator

Test Oracle Generator

An oracle answers: "is this output correct?" For add(2, 3), the oracle is trivial: == 5. For optimize_route(cities), what's the right answer? The oracle is the hard part.

Oracle types — pick the cheapest that works

Decision flow

Can you compute the answer by hand for small inputs?
│
├─ YES → Known-value oracle for those inputs.
│        Still need something for random/large inputs — continue ↓
│
Is there a simpler/slower implementation that's obviously correct?
│
├─ YES → Reference implementation oracle.
│        `assert fast(x) == slow_obvious(x)` for random x.
│
Is there an inverse?  (encode/decode, compress/decompress, serialize/parse)
│
├─ YES → Round-trip oracle.  `assert decode(encode(x)) == x`
│
Do you know properties the output must have, even if not the exact value?
│
├─ YES → Invariant oracle.
│        `result = sort(x); assert is_sorted(result) and is_permutation(result, x)`
│
None of the above?
│
└─ Regression oracle (golden files) as a last resort.
   Captures current behavior — not correctness.

Worked example — reference implementation

Under test: fast_median(nums) — O(n) quickselect-based.

Oracle: The obvious O(n log n) version:

def slow_median(nums):
    s = sorted(nums)
    n = len(s)
    return s[n // 2] if n % 2 else (s[n//2 - 1] + s[n//2]) / 2

from hypothesis import given, strategies as st

@given(st.lists(st.integers(), min_size=1))
def test_fast_matches_slow(nums):
    assert fast_median(nums) == slow_median(nums)

slow_median is three lines and obviously correct. fast_median is 40 lines of partitioning. Any disagreement is a bug in fast_median.

Worked example — invariant oracle

Under test: schedule(tasks, workers) — assigns tasks to workers, minimizing makespan. NP-hard; you can't compute the optimal answer.

What you can check:

@given(tasks=task_lists(), workers=st.integers(1, 10))
def test_schedule_is_valid(tasks, workers):
    assignment = schedule(tasks, workers)

    # Every task assigned exactly once
    assigned = [t for w in assignment.values() for t in w]
    assert sorted(assigned) == sorted(tasks)

    # No worker index out of range
    assert all(0 <= w < workers for w in assignment)

    # Makespan is no worse than the trivial round-robin
    # (not optimal — but if we're worse than round-robin, something's very wrong)
    trivial = makespan(round_robin(tasks, workers))
    assert makespan(assignment) <= trivial

Three invariants. None of them is "the answer is X." All of them catch real bugs.

Worked example — inverse

Under test: serialize(obj) -> bytes and deserialize(bytes) -> obj.

@given(arbitrary_objects())
def test_roundtrip(obj):
    assert deserialize(serialize(obj)) == obj

One line. Tests both functions against each other. Catches: field dropped in serialize, wrong type on deserialize, encoding mismatches.

Regression (golden) — the weakest oracle

When nothing else works: run once, save the output, assert future runs match.

def test_render_matches_golden():
    output = render(template, data)
    golden_path = Path("test/golden/render.txt")
    if UPDATE_GOLDEN:
        golden_path.write_text(output)
    assert output == golden_path.read_text()

This tests stability, not correctness. The first golden might be wrong. Use only when:

You've manually verified the golden is correct, once.
The function is mostly-frozen (change = deliberate).
Better oracles are genuinely unavailable.

Do not

Do not use the code-under-test as its own oracle. assert f(x) == f(x) tests nothing. Even subtler: reference implementations that share a buggy helper with the fast version.
Do not default to golden files. They test "same as yesterday," not "correct." A bug that was always there stays there.
Do not write invariants so weak they never fail. assert len(result) >= 0 — always true, useless.
Do not forget that reference implementations can be wrong too. slow_median above — does it handle the even-length case right? (It does. But check.)

Output format

## Under test
<function — and why the oracle is non-trivial>

## Oracle type
<known-value | reference | inverse | invariant | metamorphic | differential | golden>

## Why this oracle
<decision-flow reasoning — what cheaper oracles were unavailable>

## Oracle implementation
<code — the reference impl, the invariant predicates, the round-trip, etc>

## Test
<code — uses the oracle>

## Oracle validity
<why you trust the oracle — it's simpler, it's from a different codebase, the invariant is from the spec>

Related Skills

santosomar/verified-pseudocode-extractor

development

VerifiedTrustedCommunity

Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.

SKILL.mdUpdated Apr 13, 2026

santosomar/verified-pseudocode-extractor

santosomar/tlaplus-spec-generator

development

VerifiedTrustedCommunity

Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-spec-generator

santosomar/tlaplus-model-reduction

testing

VerifiedTrustedCommunity

Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-model-reduction

santosomar/tlaplus-guided-code-repair

development

VerifiedTrustedCommunity

TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-guided-code-repair

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/santosomar/general-secure-coding-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r general-secure-coding-agent-skills/skills/testing/test-oracle-generator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

santosomar/general-secure-coding-agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT