skills/engineering/unit-test-quality/SKILL.md
Language- and framework-neutral guard against useless unit tests — the kind that pass forever, never catch regressions, and inflate coverage without protecting behavior. Provides two complementary thought-experiment filters (execution + behavior), a pre-write self-check, a taxonomy of seven recurring anti-patterns (tautological, assertion-free, implementation mirror, framework retest, constant verification, sleep-coupled, exception-swallowing), and a one-line decision rule per pattern. Use when: writing or reviewing a unit test, auditing a suite with high coverage but low confidence, or pairing a regression test with a bug fix.
npx skillsauth add mikeparcewski/wicked-garden unit-test-qualityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A test you cannot break by changing the system under test (SUT) is not a test — it is dead weight. This skill exists because suites grow faster than they shrink, and "100% coverage" is no defense against a suite that asserts the wrong things.
No single mechanical filter catches every useless test — each anti-pattern below has its own diagnostic. But two thought experiments catch the most common cases and are language-neutral. Apply them in order; if either says "cut," you are looking at a test that does not earn its keep.
If I replace the body of the SUT with a force-fail (
throw/raise/panic/ equivalent), does the test still pass?
If I replace the body of the SUT with a no-op stub returning sensible defaults (zero, null/None, empty collection, default-constructed value), does the test still pass?
The filters are first-cut diagnostics — both can pass on #1 tautological (assertion mirrors mock input), #3 implementation mirror (assertion locks structure, not behavior), #6 sleep-coupled (timing flake is orthogonal), and #7 exception-swallowing (assertion-count guards are required regardless). Use the per-pattern decision rules below for those.
when(repo.find(1)).thenReturn(user);
result = service.fetch(1);
assert(result == user); // Tests that the mock was wired correctly. The
// production fetch() could be `return repo.find(id)`
// OR `return repo.find(id).asJson().broken()` — this
// test cannot distinguish.
Decision rule: if the assertion is about a value you handed to a mock, the test asserts mock plumbing, not behavior. Replace with a contract assertion (state changed, side-effect emitted, derived value computed).
test("processes order", () => {
service.process(order); // No assertion. Coverage counter ticks.
});
Decision rule: every test must contain a positive assertion against an observable outcome. If the only thing the test proves is "this line executed", delete it — coverage tools already prove that.
Variants to catch: expect(x).toBeDefined(), expect(x).not.toBeNull(),
assert x is not None, assertNotNull(x). These are coverage assertions, not
behavior assertions.
expect(component.props).toEqual({a: 1, b: 2, c: 3, d: 4, e: 5});
Decision rule: snapshot tests and full-object equals tests rot — they catch every change, including refactors that preserve behavior. Assert the minimum set of fields that prove the contract. If the test must update every time the SUT is touched, the test is locking in implementation, not behavior.
test("ORM saves user", () => {
user = User.create({name: "x"});
assert(User.find(user.id).name == "x"); // Tests Rails / Django / Hibernate.
});
Decision rule: the framework's test suite covers this. If your test would also pass against an empty stub of your code, it is a framework test. Move it to integration tests if the wiring is non-trivial; otherwise delete it.
test("getter returns name", () => {
user = new User("alice");
expect(user.name).toBe("alice");
});
Decision rule: if the SUT contains no branches, no transformations, no side-effects, and no derivations, there is nothing to test. Generated getters, data-class field access, and one-line returns of constants do not need unit tests.
sleep, setTimeout, or wall-clock waits to "stabilize"service.startBackground();
sleep(500); // Hopes the background work finished.
expect(state).toBe("done");
Decision rule: any test using a hardcoded delay is non-deterministic. Either the test sometimes finishes before the SUT (false fail), sometimes finishes exactly on the boundary (flake), or runs unnecessarily slowly (waste). Replace with: a deterministic clock, an explicit completion signal, or a poll-with-timeout over a real condition (not a sleep).
test("handles bad input", () => {
try { service.process(null); } catch (e) { /* expected */ }
});
Decision rule: assertions that depend on a catch block being entered must
also assert the catch was entered (expect.assertions(N) in Jest, try…else in
Python with a fail-the-test in else). A bare catch that swallows everything
makes the test pass even when the SUT fails to throw at all.
Run through these in order. Stop at the first "no" — that is the work to do before writing the test.
For each new test in the diff:
sleep, setTimeout(..., N), Thread.sleep, time.sleep, await delay, or any wall-clock-based wait: flag (#6 sleep-coupled). Replace with fake timers / virtual clock, an explicit completion signal/promise, or a poll-with-timeout over a real condition — never with setTimeout(..., 0) (which is a macrotask, not deterministic).integration skill.mutation-test-engineer agent for that audit; this skill prevents the bad tests in the first place.flaky-test-hunter agent.The seven anti-patterns above show up in pytest, Jest, Vitest, JUnit, RSpec,
xUnit, Go's testing, Rust's #[test], and every framework that has ever
shipped. The syntax differs; the failure mode is identical. Keeping the rules
language-neutral is what makes this skill durable — any new framework inherits
the same filters the day it is adopted.
development
--- name: large-scale-migration description: How to execute a LARGE MECHANICAL change across any codebase with LEVERAGE instead of an agent-grind or hand-edits — a cross-cutting migration, refactor, rename, dialect/framework/DB port, library adoption, or bulk transform. The map→transform→gate pattern: a deterministic transform driven by a source-of-truth map, proven by a differential-equivalence gate. Use when the work is "migrate all X to Y", "rename Z everywhere", "port to a new DB/dialect/fra
testing
v11 LLM-based work-shape classifier. Replaces the regex archetype detector with the model's own reasoning. Reads the user's prompt, picks the right archetype(s) from the catalog, identifies signals (blast_radius, novelty, reversibility, etc.), and persists to SessionState so subsequent turns steer correctly. Use when: the prompt_submit hook emitted a `<wg classify-due />` directive, OR explicitly invoked at session start, OR when re-classifying after the user changes scope mid-session.
tools
v11 work-shape archetype runner. When a prompt has been routed to one of the 9 archetypes (triage, explore, specify, decide, ship, review, incident, build, migrate), this skill is the entry point. It picks the right per-archetype playbook from refs/ and executes the phase shape declared in `.claude-plugin/archetypes.json`. Use when: a `<wg archetype="X">` or `<wg archetypes>` system-reminder tag appears, an explicit "let's run the X archetype" request, or when one of the per-archetype slash commands resolves to this skill.
development
Show or set the session intent variable. Intent gates how loud the framework is — simple-edit (silent), feature/research (synthesis directive), rigor (full crew context). Auto-detected on turn 1; this skill overrides explicitly. Sticky for the session. Use when: "set intent", "intent override", "/wicked-garden:intent", "make the framework quiet", "force rigor", "what's my intent".