Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

santosomar/test-deduplicator

Name: test-deduplicator
Author: santosomar

skills/testing/test-deduplicator/SKILL.md

npx skillsauth add santosomar/general-secure-coding-agent-skills test-deduplicator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Test Deduplicator

Ten tests cover the same branch. Nine of them can go. But which nine? The one you keep is the one that covers something the others don't.

What makes two tests "the same"

| Sameness level | Evidence | Deletion confidence | | ----------------------------- | ------------------------------------------------------------ | ------------------- | | Textual clone | Near-identical code, one value different | Low — might test different boundaries | | Same coverage | Per-test coverage sets are equal | Medium | | Same coverage, subsumes | Test A's coverage ⊇ Test B's | High — B is redundant | | Same mutants killed | Mutation testing: A and B kill identical mutant sets | High | | Same assertions, same inputs | Literally the same test, different name | Very high — copy-paste accident |

Coverage subsumption is the key signal. If test A covers everything test B covers and more, B adds nothing.

Step 1 — Build per-test coverage

# Python — run each test in isolation, collect coverage
pytest --cov=src --cov-context=test --cov-report=
coverage json -o coverage.json
# coverage.json now has per-context (per-test) line data

# Java — JaCoCo per-test is harder; use test-impact tools or:
# Run suite once, instrument to log (test_name, covered_line) pairs

Output: {test_name: set(covered_lines)}

Step 2 — Find subsumption

# A subsumes B if coverage[A] >= coverage[B]  (superset)
subsumed = []
tests = sorted(coverage, key=lambda t: len(coverage[t]), reverse=True)
for i, a in enumerate(tests):
    for b in tests[i+1:]:
        if coverage[a] >= coverage[b]:   # a's coverage is a superset
            subsumed.append((b, a))      # b is subsumed by a

This is O(n²) set comparisons. Fine for a few thousand tests. For tens of thousands, sort by size and prune.

Step 3 — Greedy minimum cover

Subsumption alone misses partial overlap. For the minimum set of tests with the same total coverage, use greedy set cover:

covered = set()
keep = []
remaining = sorted(coverage, key=lambda t: len(coverage[t]), reverse=True)
total = set().union(*coverage.values())

for t in remaining:
    new_lines = coverage[t] - covered
    if new_lines:
        keep.append(t)
        covered |= coverage[t]
    if covered == total:
        break

delete_candidates = set(coverage) - set(keep)

Greedy isn't optimal (set cover is NP-hard) but it's within ln(n) of optimal and runs in seconds.

Step 4 — Don't just delete — verify

Coverage equivalence ≠ behavioral equivalence. Two tests can cover the same lines but assert different things:

def test_parse_int():
    assert parse("42") == 42         # covers lines 10-15

def test_parse_int_leading_zero():
    assert parse("042") == 42        # covers lines 10-15 — same coverage!

Same coverage. Different inputs. The second might catch a bug the first doesn't (octal interpretation, anyone?).

Before deleting, check mutation-kill equivalence: run mutation testing (→ mutation-test-suite-optimizer) on the delete candidates. If a candidate kills a mutant no keeper kills, it's not redundant.

Parametrize instead of delete

Often the "duplicates" are value variations:

def test_discount_gold():    assert discount(100, "gold") == 80
def test_discount_silver():  assert discount(100, "silver") == 90
def test_discount_bronze():  assert discount(100, "bronze") == 95
def test_discount_none():    assert discount(100, "none") == 100

Four tests, near-identical structure. Don't delete three — parametrize:

@pytest.mark.parametrize("tier,expected", [
    ("gold", 80), ("silver", 90), ("bronze", 95), ("none", 100),
])
def test_discount(tier, expected):
    assert discount(100, tier) == expected

Same coverage. Same assertions. One-quarter the code. Still four test cases.

Tests that look redundant but aren't

| Pattern | Why not redundant | | -------------------------------------------------- | ----------------------------------------- | | Same function, one mocks DB, one hits real DB | Unit vs integration — different failure modes | | Same logic, different input sizes (1 vs 10000) | The big one catches O(n²) performance bugs | | Same coverage, one asserts exception message | Error-path granularity | | Flaky test + reliable test covering same thing | Delete the flaky one, keep the reliable one — but this is a flake fix, not dedup |

Do not

Do not trust coverage equivalence alone. Same lines ≠ same assertions. Mutation-kill equivalence is the real test.
Do not delete tests in a single commit without a revert plan. Batch deletions, run full suite between batches, easy rollback.
Do not delete slow tests just because a fast test has the same coverage. If the slow test is an integration test, it catches integration bugs the fast one can't.
Do not delete tests that document behavior even if they're covered elsewhere. test_empty_list_returns_empty is worth keeping for the name, even if test_various_inputs also covers it.
Do not skip the before/after coverage diff. If coverage dropped, you deleted something that wasn't redundant.

Output format

## Suite before
Tests: <N>  Runtime: <s>  Coverage: <%> line, <%> branch

## Subsumption
| Subsumed test | Subsumed by | Coverage delta |
| ------------- | ----------- | -------------- |

## Greedy minimum cover
Keep: <N> tests → same total coverage
Delete candidates: <M> tests

## Mutation verification
| Candidate | Mutants only this test kills | Keep? |
| --------- | ---------------------------- | ----- |

## Parametrization opportunities
| Tests | Merged into | Cases |
| ----- | ----------- | ----- |

## Final
| Action | Count |
| ------ | ----- |
| Delete (fully subsumed, no unique mutants) | <N> |
| Parametrize | <N> groups → <M> tests |
| Keep (looked redundant, isn't) | <N> |

## After
Tests: <N>  Runtime: <s>  Coverage: <%> (unchanged)  Mutation score: <%> (unchanged)

santosomar/test-deduplicator

skills/testing/test-deduplicator/SKILL.md

Finds and removes redundant tests — tests that cover the same code, kill the same mutants, or assert the same behavior — to shrink suite runtime without losing coverage. Use when the test suite is slow, when tests have accumulated over years of copy-paste, or when CI costs are too high.

development

Updated Apr 13, 2026

$ install --global

skillsauth

npx skillsauth add santosomar/general-secure-coding-agent-skills test-deduplicator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 13, 2026, 4:38 AM54.3s1 file scanned

SKILL.md

name:: test-deduplicator
description:: Finds and removes redundant tests — tests that cover the same code, kill the same mutants, or assert the same behavior — to shrink suite runtime without losing coverage. Use when the test suite is slow, when tests have accumulated over years of copy-paste, or when CI costs are too high.
license:: Apache-2.0
category:: testing
suite:: general-secure-coding-agent-skills
version:: 0.3.0
related:: test-suite-prioritizer, mutation-test-suite-optimizer

Test Deduplicator

Ten tests cover the same branch. Nine of them can go. But which nine? The one you keep is the one that covers something the others don't.

What makes two tests "the same"

Coverage subsumption is the key signal. If test A covers everything test B covers and more, B adds nothing.

Step 1 — Build per-test coverage

# Python — run each test in isolation, collect coverage
pytest --cov=src --cov-context=test --cov-report=
coverage json -o coverage.json
# coverage.json now has per-context (per-test) line data

# Java — JaCoCo per-test is harder; use test-impact tools or:
# Run suite once, instrument to log (test_name, covered_line) pairs

Output: {test_name: set(covered_lines)}

Step 2 — Find subsumption

# A subsumes B if coverage[A] >= coverage[B]  (superset)
subsumed = []
tests = sorted(coverage, key=lambda t: len(coverage[t]), reverse=True)
for i, a in enumerate(tests):
    for b in tests[i+1:]:
        if coverage[a] >= coverage[b]:   # a's coverage is a superset
            subsumed.append((b, a))      # b is subsumed by a

This is O(n²) set comparisons. Fine for a few thousand tests. For tens of thousands, sort by size and prune.

Step 3 — Greedy minimum cover

Subsumption alone misses partial overlap. For the minimum set of tests with the same total coverage, use greedy set cover:

covered = set()
keep = []
remaining = sorted(coverage, key=lambda t: len(coverage[t]), reverse=True)
total = set().union(*coverage.values())

for t in remaining:
    new_lines = coverage[t] - covered
    if new_lines:
        keep.append(t)
        covered |= coverage[t]
    if covered == total:
        break

delete_candidates = set(coverage) - set(keep)

Greedy isn't optimal (set cover is NP-hard) but it's within ln(n) of optimal and runs in seconds.

Step 4 — Don't just delete — verify

Coverage equivalence ≠ behavioral equivalence. Two tests can cover the same lines but assert different things:

def test_parse_int():
    assert parse("42") == 42         # covers lines 10-15

def test_parse_int_leading_zero():
    assert parse("042") == 42        # covers lines 10-15 — same coverage!

Same coverage. Different inputs. The second might catch a bug the first doesn't (octal interpretation, anyone?).

Parametrize instead of delete

Often the "duplicates" are value variations:

def test_discount_gold():    assert discount(100, "gold") == 80
def test_discount_silver():  assert discount(100, "silver") == 90
def test_discount_bronze():  assert discount(100, "bronze") == 95
def test_discount_none():    assert discount(100, "none") == 100

Four tests, near-identical structure. Don't delete three — parametrize:

@pytest.mark.parametrize("tier,expected", [
    ("gold", 80), ("silver", 90), ("bronze", 95), ("none", 100),
])
def test_discount(tier, expected):
    assert discount(100, tier) == expected

Same coverage. Same assertions. One-quarter the code. Still four test cases.

Tests that look redundant but aren't

Do not

Do not trust coverage equivalence alone. Same lines ≠ same assertions. Mutation-kill equivalence is the real test.
Do not delete tests in a single commit without a revert plan. Batch deletions, run full suite between batches, easy rollback.
Do not delete slow tests just because a fast test has the same coverage. If the slow test is an integration test, it catches integration bugs the fast one can't.
Do not delete tests that document behavior even if they're covered elsewhere. test_empty_list_returns_empty is worth keeping for the name, even if test_various_inputs also covers it.
Do not skip the before/after coverage diff. If coverage dropped, you deleted something that wasn't redundant.

Output format

## Suite before
Tests: <N>  Runtime: <s>  Coverage: <%> line, <%> branch

## Subsumption
| Subsumed test | Subsumed by | Coverage delta |
| ------------- | ----------- | -------------- |

## Greedy minimum cover
Keep: <N> tests → same total coverage
Delete candidates: <M> tests

## Mutation verification
| Candidate | Mutants only this test kills | Keep? |
| --------- | ---------------------------- | ----- |

## Parametrization opportunities
| Tests | Merged into | Cases |
| ----- | ----------- | ----- |

## Final
| Action | Count |
| ------ | ----- |
| Delete (fully subsumed, no unique mutants) | <N> |
| Parametrize | <N> groups → <M> tests |
| Keep (looked redundant, isn't) | <N> |

## After
Tests: <N>  Runtime: <s>  Coverage: <%> (unchanged)  Mutation score: <%> (unchanged)

Related Skills

santosomar/verified-pseudocode-extractor

development

VerifiedTrustedCommunity

Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.

SKILL.mdUpdated Apr 13, 2026

santosomar/verified-pseudocode-extractor

santosomar/tlaplus-spec-generator

development

VerifiedTrustedCommunity

Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-spec-generator

santosomar/tlaplus-model-reduction

testing

VerifiedTrustedCommunity

Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-model-reduction

santosomar/tlaplus-guided-code-repair

development

VerifiedTrustedCommunity

TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-guided-code-repair

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/santosomar/general-secure-coding-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r general-secure-coding-agent-skills/skills/testing/test-deduplicator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

santosomar/general-secure-coding-agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT