Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

mikeparcewski/wicked-garden-engineering-unit-test-quality

Name: wicked-garden-engineering-unit-test-quality
Author: mikeparcewski

skills/engineering/unit-test-quality/SKILL.md

npx skillsauth add mikeparcewski/wicked-garden wicked-garden-engineering-unit-test-quality

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Unit Test Quality

A test you cannot break by changing the system under test (SUT) is not a test — it is dead weight. This skill exists because suites grow faster than they shrink, and "100% coverage" is no defense against a suite that asserts the wrong things.

Two complementary filters

No single mechanical filter catches every useless test — each anti-pattern below has its own diagnostic. But two thought experiments catch the most common cases and are language-neutral. Apply them in order; if either says "cut," you are looking at a test that does not earn its keep.

Filter A — execution check

If I replace the body of the SUT with a force-fail (throw / raise / panic / equivalent), does the test still pass?

Still passes → the test never exercised the SUT at all. Cut it (it is pure coverage theater, see #2 assertion-free, #4 framework retest).
Now fails → the test at least invokes the SUT. Move to Filter B.

Filter B — behavior check

If I replace the body of the SUT with a no-op stub returning sensible defaults (zero, null/None, empty collection, default-constructed value), does the test still pass?

Still passes → the test asserts nothing the SUT actually computes; the assertion would hold against an empty stub. Cut it (typically #5 constant verification or #4 framework retest).
Now fails → the test depends on SUT-derived behavior. Likely worth keeping, but still apply the per-pattern rules below.

The filters are first-cut diagnostics — both can pass on #1 tautological (assertion mirrors mock input), #3 implementation mirror (assertion locks structure, not behavior), #6 sleep-coupled (timing flake is orthogonal), and #7 exception-swallowing (assertion-count guards are required regardless). Use the per-pattern decision rules below for those.

The seven useless-test anti-patterns

1. Tautological — asserts what the mock was told to return

when(repo.find(1)).thenReturn(user);
result = service.fetch(1);
assert(result == user);   // Tests that the mock was wired correctly. The
                          // production fetch() could be `return repo.find(id)`
                          // OR `return repo.find(id).asJson().broken()` — this
                          // test cannot distinguish.

Decision rule: if the assertion is about a value you handed to a mock, the test asserts mock plumbing, not behavior. Replace with a contract assertion (state changed, side-effect emitted, derived value computed).

2. Assertion-free — "no exception thrown" is the only signal

test("processes order", () => {
  service.process(order);            // No assertion. Coverage counter ticks.
});

Decision rule: every test must contain a positive assertion against an observable outcome. If the only thing the test proves is "this line executed", delete it — coverage tools already prove that.

Variants to catch: expect(x).toBeDefined(), expect(x).not.toBeNull(), assert x is not None, assertNotNull(x). These are coverage assertions, not behavior assertions.

3. Implementation mirror — asserts the structure of the code, not its outcome

expect(component.props).toEqual({a: 1, b: 2, c: 3, d: 4, e: 5});

Decision rule: snapshot tests and full-object equals tests rot — they catch every change, including refactors that preserve behavior. Assert the minimum set of fields that prove the contract. If the test must update every time the SUT is touched, the test is locking in implementation, not behavior.

4. Framework retest — asserts that the framework still works

test("ORM saves user", () => {
  user = User.create({name: "x"});
  assert(User.find(user.id).name == "x");   // Tests Rails / Django / Hibernate.
});

Decision rule: the framework's test suite covers this. If your test would also pass against an empty stub of your code, it is a framework test. Move it to integration tests if the wiring is non-trivial; otherwise delete it.

5. Constant verification — tests with no logic

test("getter returns name", () => {
  user = new User("alice");
  expect(user.name).toBe("alice");
});

Decision rule: if the SUT contains no branches, no transformations, no side-effects, and no derivations, there is nothing to test. Generated getters, data-class field access, and one-line returns of constants do not need unit tests.

6. Sleep-coupled — uses `sleep`, `setTimeout`, or wall-clock waits to "stabilize"

service.startBackground();
sleep(500);                  // Hopes the background work finished.
expect(state).toBe("done");

Decision rule: any test using a hardcoded delay is non-deterministic. Either the test sometimes finishes before the SUT (false fail), sometimes finishes exactly on the boundary (flake), or runs unnecessarily slowly (waste). Replace with: a deterministic clock, an explicit completion signal, or a poll-with-timeout over a real condition (not a sleep).

7. Exception-swallowing — try/catch hides the failure

test("handles bad input", () => {
  try { service.process(null); } catch (e) { /* expected */ }
});

Decision rule: assertions that depend on a catch block being entered must also assert the catch was entered (expect.assertions(N) in Jest, try…else in Python with a fail-the-test in else). A bare catch that swallows everything makes the test pass even when the SUT fails to throw at all.

Pre-write self-check (use before adding a new test)

Run through these in order. Stop at the first "no" — that is the work to do before writing the test.

What contract am I asserting? State it in one sentence: input → expected observable outcome. If you cannot, you are not yet ready to write the test.
Could this test pass against a broken SUT? Mentally apply Filter A (replace SUT body with a force-fail) and Filter B (replace SUT body with a no-op default-returning stub). If the test passes either filter, stop and reconsider what you are asserting.
Is the assertion about behavior or about plumbing? Behavior: state changed, value derived, event emitted, error raised with specific shape. Plumbing: mock was called, function exists, framework worked.
Is there exactly one logical assertion? Multiple assertions about the same outcome are fine; multiple unrelated outcomes mean multiple tests.
Is the test deterministic without sleeps? If it depends on timing, reach for a fake clock or a completion signal, not a delay.
Will I have to update this test on every refactor? If yes, it is asserting implementation, not behavior. Narrow the assertion.
Does the test name describe the behavior? "test_user_creation" is a coverage label. "rejects user creation when email is missing" is a contract.

Reviewer rubric (use during PR review of new tests)

For each new test in the diff:

Read the test name. Does it state a contract or a code path? Contract → continue. Code path → flag.
Read the assertion(s). Are they against observable outcomes the SUT controls? Yes → continue. No → flag.
Apply Filter A mentally. Would the test fail if the SUT body became a force-fail (throw/raise/panic)? Yes → continue. No → flag (#2 assertion-free or #4 framework retest).
Apply Filter B mentally. Would the test fail if the SUT body became a no-op stub returning defaults? Yes → likely keep. No → flag (#5 constant verification or #4 framework retest).
If the test uses a mock: does the assertion verify interaction the SUT chose to make, not value the test handed in? Verifies SUT decision → keep. Echoes test input → flag (#1 tautological — neither filter catches this; structural inspection only).
If the test contains sleep, setTimeout(..., N), Thread.sleep, time.sleep, await delay, or any wall-clock-based wait: flag (#6 sleep-coupled). Replace with fake timers / virtual clock, an explicit completion signal/promise, or a poll-with-timeout over a real condition — never with setTimeout(..., 0) (which is a macrotask, not deterministic).

What this skill does not cover

Integration / E2E tests — different rules (real dependencies, slower feedback, broader contracts). Use the integration skill.
Property / fuzz testing — requires its own taxonomy (shrinking, generators, invariants). Out of scope here.
Mutation testing — verifies the suite catches bugs by mutating the SUT. Use the mutation-test-engineer agent for that audit; this skill prevents the bad tests in the first place.
Test runner / CI ergonomics — flake quarantine, retry policy, parallel isolation. See the flaky-test-hunter agent.

Why this is one skill across all languages

The seven anti-patterns above show up in pytest, Jest, Vitest, JUnit, RSpec, xUnit, Go's testing, Rust's #[test], and every framework that has ever shipped. The syntax differs; the failure mode is identical. Keeping the rules language-neutral is what makes this skill durable — any new framework inherits the same filters the day it is adopted.

mikeparcewski/wicked-garden-engineering-unit-test-quality

skills/engineering/unit-test-quality/SKILL.md

Language- and framework-neutral guard against useless unit tests — the kind that pass forever, never catch regressions, and inflate coverage without protecting behavior. Provides two complementary thought-experiment filters (execution + behavior), a pre-write self-check, a taxonomy of seven recurring anti-patterns (tautological, assertion-free, implementation mirror, framework retest, constant verification, sleep-coupled, exception-swallowing), and a one-line decision rule per pattern. Use when: writing or reviewing a unit test, auditing a suite with high coverage but low confidence, or pairing a regression test with a bug fix.

8 stars

development

Updated Jul 12, 2026

$ install --global

skillsauth

npx skillsauth add mikeparcewski/wicked-garden wicked-garden-engineering-unit-test-quality

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 12, 2026, 7:06 AM27.3s1 file scanned

SKILL.md

name:: wicked-garden-engineering-unit-test-quality
description:: |
Use when:: writing or reviewing a unit test, auditing a suite with high
status:: stable
portability:: portable
phase_relevance:: ["design", "build"]
archetype_relevance:: ["*"]

Unit Test Quality

Two complementary filters

Filter A — execution check

If I replace the body of the SUT with a force-fail (throw / raise / panic / equivalent), does the test still pass?

Still passes → the test never exercised the SUT at all. Cut it (it is pure coverage theater, see #2 assertion-free, #4 framework retest).
Now fails → the test at least invokes the SUT. Move to Filter B.

Filter B — behavior check

If I replace the body of the SUT with a no-op stub returning sensible defaults (zero, null/None, empty collection, default-constructed value), does the test still pass?

Still passes → the test asserts nothing the SUT actually computes; the assertion would hold against an empty stub. Cut it (typically #5 constant verification or #4 framework retest).
Now fails → the test depends on SUT-derived behavior. Likely worth keeping, but still apply the per-pattern rules below.

The seven useless-test anti-patterns

1. Tautological — asserts what the mock was told to return

when(repo.find(1)).thenReturn(user);
result = service.fetch(1);
assert(result == user);   // Tests that the mock was wired correctly. The
                          // production fetch() could be `return repo.find(id)`
                          // OR `return repo.find(id).asJson().broken()` — this
                          // test cannot distinguish.

2. Assertion-free — "no exception thrown" is the only signal

test("processes order", () => {
  service.process(order);            // No assertion. Coverage counter ticks.
});

Variants to catch: expect(x).toBeDefined(), expect(x).not.toBeNull(), assert x is not None, assertNotNull(x). These are coverage assertions, not behavior assertions.

3. Implementation mirror — asserts the structure of the code, not its outcome

expect(component.props).toEqual({a: 1, b: 2, c: 3, d: 4, e: 5});

4. Framework retest — asserts that the framework still works

test("ORM saves user", () => {
  user = User.create({name: "x"});
  assert(User.find(user.id).name == "x");   // Tests Rails / Django / Hibernate.
});

5. Constant verification — tests with no logic

test("getter returns name", () => {
  user = new User("alice");
  expect(user.name).toBe("alice");
});

6. Sleep-coupled — uses `sleep`, `setTimeout`, or wall-clock waits to "stabilize"

service.startBackground();
sleep(500);                  // Hopes the background work finished.
expect(state).toBe("done");

7. Exception-swallowing — try/catch hides the failure

test("handles bad input", () => {
  try { service.process(null); } catch (e) { /* expected */ }
});

Pre-write self-check (use before adding a new test)

Run through these in order. Stop at the first "no" — that is the work to do before writing the test.

What contract am I asserting? State it in one sentence: input → expected observable outcome. If you cannot, you are not yet ready to write the test.
Could this test pass against a broken SUT? Mentally apply Filter A (replace SUT body with a force-fail) and Filter B (replace SUT body with a no-op default-returning stub). If the test passes either filter, stop and reconsider what you are asserting.
Is the assertion about behavior or about plumbing? Behavior: state changed, value derived, event emitted, error raised with specific shape. Plumbing: mock was called, function exists, framework worked.
Is there exactly one logical assertion? Multiple assertions about the same outcome are fine; multiple unrelated outcomes mean multiple tests.
Is the test deterministic without sleeps? If it depends on timing, reach for a fake clock or a completion signal, not a delay.
Will I have to update this test on every refactor? If yes, it is asserting implementation, not behavior. Narrow the assertion.
Does the test name describe the behavior? "test_user_creation" is a coverage label. "rejects user creation when email is missing" is a contract.

Reviewer rubric (use during PR review of new tests)

For each new test in the diff:

Read the test name. Does it state a contract or a code path? Contract → continue. Code path → flag.
Read the assertion(s). Are they against observable outcomes the SUT controls? Yes → continue. No → flag.
Apply Filter A mentally. Would the test fail if the SUT body became a force-fail (throw/raise/panic)? Yes → continue. No → flag (#2 assertion-free or #4 framework retest).
Apply Filter B mentally. Would the test fail if the SUT body became a no-op stub returning defaults? Yes → likely keep. No → flag (#5 constant verification or #4 framework retest).
If the test uses a mock: does the assertion verify interaction the SUT chose to make, not value the test handed in? Verifies SUT decision → keep. Echoes test input → flag (#1 tautological — neither filter catches this; structural inspection only).
If the test contains sleep, setTimeout(..., N), Thread.sleep, time.sleep, await delay, or any wall-clock-based wait: flag (#6 sleep-coupled). Replace with fake timers / virtual clock, an explicit completion signal/promise, or a poll-with-timeout over a real condition — never with setTimeout(..., 0) (which is a macrotask, not deterministic).

What this skill does not cover

Integration / E2E tests — different rules (real dependencies, slower feedback, broader contracts). Use the integration skill.
Property / fuzz testing — requires its own taxonomy (shrinking, generators, invariants). Out of scope here.
Mutation testing — verifies the suite catches bugs by mutating the SUT. Use the mutation-test-engineer agent for that audit; this skill prevents the bad tests in the first place.
Test runner / CI ergonomics — flake quarantine, retry policy, parallel isolation. See the flaky-test-hunter agent.

Why this is one skill across all languages

Related Skills

mikeparcewski/wicked-garden-engineering-conformance-reviewer

development

VerifiedTrustedCommunity

Pattern-conformance agent-half: evaluates a produced artifact or diff against a set of architectural/design pattern rules from the conformance-rule store (wicked_governance schema). Returns structured findings with rule ID, severity, and rationale — the deterministic half (mechanical rule recall) is done by the guard pipeline; this is the semantic evaluation step. Triggered by: the guard_pipeline `outgov_pattern` check (session-close), or explicitly by an engineering review when WICKED_OUTGOV_RULES_DIR is populated. NOT a replacement for the full `engineering` review skill — focuses only on conformance to stored Pattern rules; architecture and code-quality checks live in the `engineering` skill. Semantic evaluation reuses `wicked-garden-qe-semantic-reviewer` as the designated agent-half evaluator (per garden#983 spec). This skill is the orchestrating wrapper that loads applicable Pattern rules and delegates the per-rule semantic judgment to qe-semantic-reviewer.

8SKILL.mdUpdated Jul 22, 2026

mikeparcewski/wicked-garden-engineering-conformance-reviewer

mikeparcewski/wicked-garden-domain

tools

VerifiedTrustedCommunity

The FOUNDATIONAL domain-model capability: extract a codebase's domain — testable business rules (with confidence + provenance), entities, requirements — as a schema-conformant model on the estate graph. The workers annotate the store; wicked-core reads it and builds the requirements graph, coverage-gating fail-closed. Steers three fork workers. A shared substrate, not a modernization tool. The `modernize` archetype DERIVES from it; build / migrate / review / specify / explore consume the SAME domain model — none OWN it. Understanding a codebase's domain is upstream of almost everything else garden does. Use when: "extract the business rules / domain model from this codebase", "build a requirements graph from the code", "what does this system actually require", "reverse-engineer the domain before we build/port/migrate". Works on ANY codebase (modern or legacy) — the value is the domain model, not the porting. NOT the code transform itself (that is the archetype consuming this model). This skill produces the DOMAIN MODEL, not new code.

8SKILL.mdUpdated Jul 15, 2026

mikeparcewski/wicked-garden-domain

mikeparcewski/wicked-garden-domain-modeler

development

VerifiedTrustedCommunity

Domain-graph fork worker for the modernize archetype. Groups the estate's Louvain communities into business domains, attaches each requirement to its cluster (advisory cluster_id provenance), and invokes wicked-core's domain-graph build (which reads the annotated estate store, recomputes coverage fail-closed, and builds the requirements graph) — then validates core's output against the vendored schema. Use when: dispatched by wicked-garden-domain after rule extraction to turn a flat rule set into cluster-keyed domains; "group these into domains", "build the requirements graph", "translate clusters into a domain model". NOT for mining the rules themselves (that is domain-extractor) or threat-modeling (that is domain-coverage).

8SKILL.mdUpdated Jul 15, 2026

mikeparcewski/wicked-garden-domain-modeler

mikeparcewski/wicked-garden-domain-extractor

tools

VerifiedTrustedCommunity

Rule-extraction fork worker for the FOUNDATIONAL domain-model capability. Mines testable business rules from a codebase — each with a numeric confidence and a provenance{source, ref, source_kinds} — and annotates them into the estate store so wicked-core can build the domain-model requirements graph (coverage-gated). This is a substrate, not a modernization tool: the `modernize` archetype DERIVES from it, and build / migrate / review / specify / explore can consume the same domain model — none OWN it. Use when: dispatched by wicked-garden-domain to mine the business_rules of a codebase (or a module); "extract the domain rules", "what does this system require", building the requirements half of a domain model. NOT for grouping into domains (that is domain-modeler) or judging coverage (that is domain-coverage — a seat-distinct evaluator).

8SKILL.mdUpdated Jul 15, 2026

mikeparcewski/wicked-garden-domain-extractor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/mikeparcewski/wicked-garden.git

# Copy into Claude Code skills folder (global)
cp -r wicked-garden/skills/engineering/unit-test-quality ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

mikeparcewski/wicked-garden

8 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

mikeparcewski/wicked-garden-engineering-unit-test-quality

$ install --global

Security Scan Results

SKILL.md

Unit Test Quality

Two complementary filters

Filter A — execution check

Filter B — behavior check

The seven useless-test anti-patterns

1. Tautological — asserts what the mock was told to return

2. Assertion-free — "no exception thrown" is the only signal

3. Implementation mirror — asserts the structure of the code, not its outcome

4. Framework retest — asserts that the framework still works

5. Constant verification — tests with no logic

6. Sleep-coupled — uses sleep, setTimeout, or wall-clock waits to "stabilize"

7. Exception-swallowing — try/catch hides the failure

Pre-write self-check (use before adding a new test)

Reviewer rubric (use during PR review of new tests)

What this skill does not cover

Why this is one skill across all languages

Related Skills

mikeparcewski/wicked-garden-engineering-conformance-reviewer

mikeparcewski/wicked-garden-domain

mikeparcewski/wicked-garden-domain-modeler

mikeparcewski/wicked-garden-domain-extractor

mikeparcewski/wicked-garden-engineering-unit-test-quality

$ install --global

Security Scan Results

SKILL.md

Unit Test Quality

Two complementary filters

Filter A — execution check

Filter B — behavior check

The seven useless-test anti-patterns

1. Tautological — asserts what the mock was told to return

2. Assertion-free — "no exception thrown" is the only signal

3. Implementation mirror — asserts the structure of the code, not its outcome

4. Framework retest — asserts that the framework still works

5. Constant verification — tests with no logic

6. Sleep-coupled — uses sleep, setTimeout, or wall-clock waits to "stabilize"

7. Exception-swallowing — try/catch hides the failure

Pre-write self-check (use before adding a new test)

Reviewer rubric (use during PR review of new tests)

What this skill does not cover

Why this is one skill across all languages

Related Skills

mikeparcewski/wicked-garden-engineering-conformance-reviewer

mikeparcewski/wicked-garden-domain

mikeparcewski/wicked-garden-domain-modeler

mikeparcewski/wicked-garden-domain-extractor

6. Sleep-coupled — uses `sleep`, `setTimeout`, or wall-clock waits to "stabilize"

6. Sleep-coupled — uses `sleep`, `setTimeout`, or wall-clock waits to "stabilize"