Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

santosomar/metamorphic-test-generator

Name: metamorphic-test-generator
Author: santosomar

skills/testing/metamorphic-test-generator/SKILL.md

npx skillsauth add santosomar/general-secure-coding-agent-skills metamorphic-test-generator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Metamorphic Test Generator

Normal test: "for input X, output should be Y." But what if you don't know Y? Metamorphic test: "for inputs X and X', the outputs should relate in a known way." You don't need an oracle — you need a metamorphic relation.

The core idea

         f
  X ─────────► Y           You don't know Y.
  │            │
  │ transform  │ relation  But you know: if X' = transform(X),
  ▼            ▼           then f(X') should relate to f(X).
  X' ────────► Y'
         f

If the relation fails, f is wrong. You found a bug without knowing the right answer.

Metamorphic relation catalog

| Domain | Relation | Transform → expected output change | | ------------------- | --------------------------------------------------------------- | ---------------------------------------- | | Sorting | Permutation invariance | shuffle(X) → same output | | Search | Adding irrelevant docs doesn't change top-k for a fixed query | X ∪ {irrelevant} → same top-k | | Search | Subset monotonicity | results(A∪B) ⊇ results(A) for query q | | Numerical | Scaling | f(k·X) = k·f(X) (if f is linear) | | Numerical | Additivity | f(X+Y) = f(X) + f(Y) (if linear) | | Pathfinding | Adding an edge can't make shortest path longer | G + edge → dist ≤ old dist | | Compression | Round-trip | decompress(compress(X)) = X | | Parsing | Pretty-print invariance | parse(print(parse(X))) = parse(X) | | ML classifier | Semantic-preserving input → same class | image + tiny noise → same prediction | | Aggregation (sum) | Partition invariance | sum(A∪B) = sum(A) + sum(B) (disjoint) | | Caching | Idempotence | f(X); f(X) → same result, second is fast |

Finding relations for your function

Ask: what transformation of the input has a predictable effect on the output?

Does adding something irrelevant change nothing? (search, filter, max)
Does reordering change nothing? (set operations, commutative aggregates)
Does scaling input scale output? (linear functions)
Is there a round-trip? (encode/decode, serialize/deserialize)
Is there a known inverse? (f(g(x)) = x)
Is there monotonicity? (more input → more-or-equal output)
Is the operation idempotent? (f(f(x)) = f(x))

Each "yes" is a metamorphic relation.

Worked example — testing a search engine without knowing right answers

System: search(query, corpus) -> list[doc] ranked by relevance. You don't know what's "correct" — relevance is subjective.

Relations:

from hypothesis import given, strategies as st

# MR1: Permuting the corpus doesn't change results (order-independence of index)
@given(query=queries(), corpus=corpora())
def test_mr_corpus_order_invariant(query, corpus):
    r1 = search(query, corpus)
    r2 = search(query, shuffled(corpus))
    assert r1 == r2

# MR2: Adding a document irrelevant to the query doesn't change top-k
@given(query=queries(), corpus=corpora(), k=st.integers(1, 10))
def test_mr_irrelevant_addition(query, corpus, k):
    r1 = search(query, corpus)[:k]
    irrelevant = make_doc_with_no_overlap(query)   # no shared terms
    r2 = search(query, corpus + [irrelevant])[:k]
    assert r1 == r2

# MR3: Duplicating a top result keeps it in top results
@given(query=queries(), corpus=corpora())
def test_mr_duplicate_stays_top(query, corpus):
    r1 = search(query, corpus)
    if not r1:
        return
    top = r1[0]
    r2 = search(query, corpus + [top])   # add another copy
    assert top in r2[:2]   # original or copy should be in top-2

# MR4: Query with more terms → results are a subset (conjunctive search)
@given(base_query=queries(), extra_term=terms(), corpus=corpora())
def test_mr_conjunction_narrows(base_query, extra_term, corpus):
    r_broad = set(search(base_query, corpus))
    r_narrow = set(search(base_query + " " + extra_term, corpus))
    assert r_narrow <= r_broad

MR4 assumes conjunctive (AND) semantics. If search is disjunctive (OR), the relation flips. The relation encodes a spec claim — if it fails, either the relation is wrong (you misunderstood the spec) or the code is wrong.

Worked example — numerical code

Function: std_dev(samples: list[float]) -> float. You could compute the expected value by hand for each test input. Or:

# MR: std dev is shift-invariant — adding a constant doesn't change spread
@given(st.lists(st.floats(allow_nan=False, allow_infinity=False), min_size=2),
       st.floats(allow_nan=False, allow_infinity=False))
def test_mr_shift_invariant(samples, c):
    assert abs(std_dev(samples) - std_dev([s + c for s in samples])) < 1e-9

# MR: std dev scales linearly with the data
@given(st.lists(st.floats(1, 100), min_size=2), st.floats(0.1, 10))
def test_mr_scale(samples, k):
    assert abs(std_dev([s * k for s in samples]) - k * std_dev(samples)) < 1e-9

These test the algebra of std dev. A buggy implementation (e.g., forgot the square root, or divides by N instead of N-1) will break at least one.

Do not

Do not use a relation you're not sure holds. test_mr_conjunction_narrows above is wrong for OR-search. Verify the relation against the spec before encoding it.
Do not make the relation so weak it never fails. assert len(r1) >= 0 is always true. The relation has to be tight enough to catch bugs.
Do not forget floating-point tolerance in numerical MRs. == on floats → flaky test. Use abs(a - b) < ε.
Do not skip metamorphic testing because you can write oracles. MRs find different bugs — they test algebraic properties, which unit tests with fixed I/O pairs don't.

Output format

## System under test
<function — and why an oracle is hard>

## Metamorphic relations
| # | Relation | Transform | Expected output relation | Spec basis |
| - | -------- | --------- | ------------------------ | ---------- |

## Tests
### MR-<N>: <name>
<code — property-test style, Hypothesis/QuickCheck>
Bug classes this catches: <what would violate this relation>

## Relation validity
<for each MR: why you believe this relation holds — cite the spec or the math>

santosomar/metamorphic-test-generator

skills/testing/metamorphic-test-generator/SKILL.md

Generates metamorphic tests — tests that check relationships between multiple runs instead of checking exact outputs, useful when the correct output is unknown or expensive to compute. Use when there's no oracle, when testing ML/numerical/search code, or when the spec describes properties rather than values.

development

Updated Apr 13, 2026

$ install --global

skillsauth

npx skillsauth add santosomar/general-secure-coding-agent-skills metamorphic-test-generator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 13, 2026, 4:36 AM60.4s1 file scanned

SKILL.md

name:: metamorphic-test-generator
description:: Generates metamorphic tests — tests that check relationships between multiple runs instead of checking exact outputs, useful when the correct output is unknown or expensive to compute. Use when there's no oracle, when testing ML/numerical/search code, or when the spec describes properties rather than values.
license:: Apache-2.0
category:: testing
suite:: general-secure-coding-agent-skills
version:: 0.3.0
related:: test-oracle-generator

Metamorphic Test Generator

The core idea

         f
  X ─────────► Y           You don't know Y.
  │            │
  │ transform  │ relation  But you know: if X' = transform(X),
  ▼            ▼           then f(X') should relate to f(X).
  X' ────────► Y'
         f

If the relation fails, f is wrong. You found a bug without knowing the right answer.

Metamorphic relation catalog

Finding relations for your function

Ask: what transformation of the input has a predictable effect on the output?

Does adding something irrelevant change nothing? (search, filter, max)
Does reordering change nothing? (set operations, commutative aggregates)
Does scaling input scale output? (linear functions)
Is there a round-trip? (encode/decode, serialize/deserialize)
Is there a known inverse? (f(g(x)) = x)
Is there monotonicity? (more input → more-or-equal output)
Is the operation idempotent? (f(f(x)) = f(x))

Each "yes" is a metamorphic relation.

Worked example — testing a search engine without knowing right answers

System: search(query, corpus) -> list[doc] ranked by relevance. You don't know what's "correct" — relevance is subjective.

Relations:

from hypothesis import given, strategies as st

# MR1: Permuting the corpus doesn't change results (order-independence of index)
@given(query=queries(), corpus=corpora())
def test_mr_corpus_order_invariant(query, corpus):
    r1 = search(query, corpus)
    r2 = search(query, shuffled(corpus))
    assert r1 == r2

# MR2: Adding a document irrelevant to the query doesn't change top-k
@given(query=queries(), corpus=corpora(), k=st.integers(1, 10))
def test_mr_irrelevant_addition(query, corpus, k):
    r1 = search(query, corpus)[:k]
    irrelevant = make_doc_with_no_overlap(query)   # no shared terms
    r2 = search(query, corpus + [irrelevant])[:k]
    assert r1 == r2

# MR3: Duplicating a top result keeps it in top results
@given(query=queries(), corpus=corpora())
def test_mr_duplicate_stays_top(query, corpus):
    r1 = search(query, corpus)
    if not r1:
        return
    top = r1[0]
    r2 = search(query, corpus + [top])   # add another copy
    assert top in r2[:2]   # original or copy should be in top-2

# MR4: Query with more terms → results are a subset (conjunctive search)
@given(base_query=queries(), extra_term=terms(), corpus=corpora())
def test_mr_conjunction_narrows(base_query, extra_term, corpus):
    r_broad = set(search(base_query, corpus))
    r_narrow = set(search(base_query + " " + extra_term, corpus))
    assert r_narrow <= r_broad

Worked example — numerical code

Function: std_dev(samples: list[float]) -> float. You could compute the expected value by hand for each test input. Or:

# MR: std dev is shift-invariant — adding a constant doesn't change spread
@given(st.lists(st.floats(allow_nan=False, allow_infinity=False), min_size=2),
       st.floats(allow_nan=False, allow_infinity=False))
def test_mr_shift_invariant(samples, c):
    assert abs(std_dev(samples) - std_dev([s + c for s in samples])) < 1e-9

# MR: std dev scales linearly with the data
@given(st.lists(st.floats(1, 100), min_size=2), st.floats(0.1, 10))
def test_mr_scale(samples, k):
    assert abs(std_dev([s * k for s in samples]) - k * std_dev(samples)) < 1e-9

These test the algebra of std dev. A buggy implementation (e.g., forgot the square root, or divides by N instead of N-1) will break at least one.

Do not

Do not use a relation you're not sure holds. test_mr_conjunction_narrows above is wrong for OR-search. Verify the relation against the spec before encoding it.
Do not make the relation so weak it never fails. assert len(r1) >= 0 is always true. The relation has to be tight enough to catch bugs.
Do not forget floating-point tolerance in numerical MRs. == on floats → flaky test. Use abs(a - b) < ε.
Do not skip metamorphic testing because you can write oracles. MRs find different bugs — they test algebraic properties, which unit tests with fixed I/O pairs don't.

Output format

## System under test
<function — and why an oracle is hard>

## Metamorphic relations
| # | Relation | Transform | Expected output relation | Spec basis |
| - | -------- | --------- | ------------------------ | ---------- |

## Tests
### MR-<N>: <name>
<code — property-test style, Hypothesis/QuickCheck>
Bug classes this catches: <what would violate this relation>

## Relation validity
<for each MR: why you believe this relation holds — cite the spec or the math>

Related Skills

santosomar/verified-pseudocode-extractor

development

VerifiedTrustedCommunity

Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.

SKILL.mdUpdated Apr 13, 2026

santosomar/verified-pseudocode-extractor

santosomar/tlaplus-spec-generator

development

VerifiedTrustedCommunity

Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-spec-generator

santosomar/tlaplus-model-reduction

testing

VerifiedTrustedCommunity

Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-model-reduction

santosomar/tlaplus-guided-code-repair

development

VerifiedTrustedCommunity

TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-guided-code-repair

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/santosomar/general-secure-coding-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r general-secure-coding-agent-skills/skills/testing/metamorphic-test-generator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

santosomar/general-secure-coding-agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT