skills/tests-purge-unneeded/SKILL.md
Delete tests that don't catch real bugs — the inverse of TDD. Use when reviewing legacy test suites, slow CI investigations, refactor-driven test sweeps, or evaluating whether a test the type system already covers should stay. Thesis — a test exists only if removing it would let a real bug reach production.
npx skillsauth add outlinedriven/odin-codex-plugin tests-purge-unneededInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Tests are not assets. Tests are liabilities that pay rent by catching real bugs. Volume is not a quality signal — coverage percentage is not a quality signal — only the counterfactual matters: if I delete this test, can a real bug now reach prod? If the answer is no, the test is dead weight, and dead weight slows CI, breeds noise, and trains reviewers to ignore failures.
Modern insight (2025): TDD pairs with purge discipline. The same rigor that earns RED before GREEN earns deletion before keep — a test that cannot describe the bug it would catch should not exist. Mutation testing exposes which tests are actually load-bearing; the rest are cargo cult.
See python for pytest examples (dynamic-language carve-out). See typescript for jest/vitest examples (static-language redundancy). See rust for cargo test examples (compile-time-guaranteed redundancy). See keep-vs-delete-table for the language-agnostic decision rubric.
The principles below are load-bearing. Internalize them; do not paraphrase.
This is the discriminator. Before keeping any test, ask: what bug, specifically, does this fail on? If the answer is "none I can name" or "a bug the compiler would catch", the test does not earn its keep. The bar is a real bug — not a hypothetical, not a "what if the implementation changes" — a concrete failure mode that a real change could plausibly introduce and that this test would catch.
Tests earn their keep at boundaries. Protocol compliance (HTTP status codes, message formats, retry semantics), error semantics (what happens on malformed input, partial failure, timeout), security invariants (authz/authn enforcement, input validation, rate limits), and real-I/O integration (DB transactions, file I/O, network calls) are exactly where bugs hide and where the type system cannot help. These tests stay.
In statically-typed languages, the compiler already proves that a constructor returns the type it claims, that a struct has the fields it declares, and that a config object has the shape its type asserts. A test like assert User(name="x").name == "x" does not catch any bug a compiler does not — the only way it can fail is if User's type signature changes, in which case the test itself fails to compile and the assertion is moot. Delete it.
Identity-passthrough tests (assert echo(x) == x, expect(passthrough(value)).toEqual(value)) prove nothing — they describe the function's signature, not its behavior. If echo is supposed to validate or transform x, test the validation/transformation. If echo is genuinely identity, the function is dead code.
~/.claude/claude/system-prompt-baseline.md <directives>)The carve-out for mandate 3 is language-dependent and must mirror the user's system-prompt-baseline.md testing charter exactly:
The split is not aesthetic — it is about what guarantee the language provides. See references/python.md vs references/typescript.md for the contrasting examples.
references/keep-vs-delete-table.md)| Pattern | Static lang | Dynamic lang | |---------|-------------|--------------| | Constructor returns expected fields | Delete — type system covers | Keep — boundary shape test | | Function passes input through verbatim | Delete | Delete (test the dead code, then delete it) | | HTTP handler returns 401 on missing auth | Keep — security invariant | Keep | | Parser rejects malformed input | Keep — boundary | Keep | | Two structs equal after struct-assembly | Delete | Keep if shape changes are plausible | | Mock returns fixture, test asserts the fixture | Delete — testing the mock | | Real DB transaction commits + rolls back | Keep — real I/O integration |
~/.claude/claude/system-prompt-baseline.md, system-prompt-baseline.md wins — this skill mirrors the user's testing charter; if drift is detected, system-prompt-baseline.md is the source of truth.| Gate | Pass Criteria | Blocking | |------|---------------|----------| | Bug articulation | Each candidate has a one-sentence failure-mode description | Yes | | Static-guarantee check | Confirmed compiler/type-checker does or does not cover the test | Yes | | Bug-injection check | Test verified to NOT catch the bug, before deletion | Yes | | Atomic commit | Deletion is its own commit with rationale | Yes | | Suite still passes | Remaining tests green after deletion | Yes |
| Code | Meaning | |------|---------| | 0 | Deletion safe — bug-injection confirmed test catches nothing, atomic commit landed | | 11 | Test framework not detected — cannot run bug-injection check | | 12 | Bug articulation failed — candidate kept pending review | | 13 | Static guarantee unclear — language carve-out not resolvable; kept pending | | 14 | Test caught the injected bug — load-bearing; kept | | 15 | Suite regressed after deletion — rollback required |
cleanup-codebase — sibling deletion discipline for non-test code (dead fields, redundant wrappers, stale config)tests-adversarial — the complement: writing tests that do catch bugs, especially in failure paths and silent-failure regionstest-driven — the design-side discipline (RED → GREEN → REFACTOR); purge runs in the REFACTOR phase~/.claude/claude/system-prompt-baseline.md <directives> testing charter — the source-of-truth principle this skill mirrorstesting
ODIN's compress-operations dispatcher under the Compressor/Extender role. Invoke on "tidy", "clean up", "tidy this file/memory/workspace/git/docs", or when active context (current file, diff, stack, memory directory) has structural rot to resolve before touching behavior. Detects target domain from context and routes to the sibling skill. Requires explicit target or clear active-context signal — do not invoke speculatively.
development
Cross-domain taste skill — apply distinctive judgment to any artifact (prose, code, design, decisions) instead of converging to AI defaults. Two modes — `audit` (judge work against the two-sided charter and portable anchors) and `anchor` (load register before producing). Auto-detects by phrasing; override via `/taste audit | anchor`. Trigger on "is this slop?", "overkill?", "elegant?", "taste-test this".
tools
One-shot bootstrap of strict-mode tooling per ecosystem plus per-task GOALS.md scaffolding so an agentic loop can self-verify. Writes typechecker/linter/schema-validator config for TS (strict + noUncheckedIndexedAccess + exactOptionalPropertyTypes), Python (Pyright strict, Ruff strict), Rust (Clippy deny-correctness), Go (golangci-lint with staticcheck), OCaml (dune --release); establishes `.agent-tasks/<id>/GOALS.md` per-task convention distinct from project-stable AGENTS.md. C++/Java/Kotlin and framework specifics (Spring Boot, Nest, React-strict) are out of scope. Trigger on new project bootstrap, agentic-task setup, "make this self-verifying", "set the loop's goal", "scaffold goals for this issue". Pairs with `llm-self-loop` runtime.
tools
Install git pre-commit hooks via the project's hook tool — Husky+lint-staged (JS), pre-commit (Python/OCaml), lefthook (Go), cargo-husky (Rust). Use when the user wants commit-time formatting, linting, type-checking, or test gates. Detects ecosystem first.