plugins/tdd/skills/test-coverage/SKILL.md
Use after writing tests to assess coverage quality across structural, mutation, requirements, and API/integration dimensions; organized knowledge for choosing and interpreting coverage analyses.
npx skillsauth add NeoLabHQ/context-engineering-kit test-coverageInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A reference manual for choosing, applying, and interpreting test-coverage analyses on an existing test suite.
This skill is a knowledge reference, not a procedure. It does not tell you when to write tests or which test types to design — that is the job of design-testing-strategy. It tells you, once tests exist, which mechanical signal best measures what those tests do (and do not) exercise, and how to read that signal honestly.
Test coverage analysis is the post-hoc measurement of how thoroughly a test suite exercises a software artifact along one or more axes. It answers the question "what did my tests actually touch?" — for some specific definition of "touch."
The word "coverage" is overloaded. It can mean any of:
Mutation testing, MC/DC, branch coverage, RTM linkage, contract coverage, and schema coverage are measurements about an existing test suite. They are not test types in the way unit, integration, e2e, contract, or smoke tests are.
If a "test strategy" places mutation testing alongside unit / integration / e2e, that strategy has confused what to test with how to measure the tests. The two questions are orthogonal.
Low coverage is strong evidence of weak testing. High coverage is weak evidence of strong testing.
Coverage is necessary-but-not-sufficient. A test can execute a line without asserting anything meaningful; 100% line coverage is routinely achievable with zero assertions (thinkinglabs.io, codeintelligently.com). Use coverage as a tripwire, not a trophy. Once a coverage percentage becomes a target, it ceases to be a good metric (Goodhart's law applied to testing; see Optivem Journal).
Every coverage type in this skill is documented in the same six sub-fields, in this order:
Scan any section by these headings.
Measured by instrumenting the compiled or interpreted program and recording which structural elements (lines, statements, branches, conditions, paths) the test suite executes.
Definition. Percentage of source-code lines (or statements) executed at least once.
What it does NOT measure. Whether branches were taken in both directions. Whether assertions verified the result. Whether boundary values were tested. Multiple statements on one line distort the metric (Metridev).
Typical tools.
| Ecosystem | Tool |
|-----------|------|
| JS/TS | Istanbul / nyc (built into Jest, Vitest, Karma); --coverage flag |
| Python | Coverage.py + pytest-cov; supports branch mode |
| JVM | JaCoCo — bytecode instrumentation, industry standard |
| C/C++ | gcov / lcov / gcovr, llvm-cov |
| Go | go test -cover, go tool cover (build-cover added integration-test mode in Go 1.20) |
| .NET | Coverlet (open-source default), JetBrains dotCover, AltCover. OpenCover is in maintenance mode — prefer Coverlet / dotCover / AltCover (NDepend guide) |
| Ruby | SimpleCov |
| Swift / Obj-C | Xcode built-in (llvm-cov backend) |
| Rust | cargo-llvm-cov, cargo-tarpaulin |
| Report formats | Cobertura XML, LCOV, Clover; aggregators: Codecov, Coveralls, SonarQube |
When to use vs skip. Always-on; cost is near-zero (a CI flag). Never use as a quality goal in itself.
Targets / thresholds & pitfalls. 70–85% is typical for general-purpose code (Qt blog). Apply only with the risk caveat (see Risk-Based Interpretation below). Tests with no assertions still count lines as covered. Single-line if (x) doA(); else doB(); shows 100% statement coverage with only one branch exercised. Snapshot-only tests inflate numbers without verifying behavior.
Cost-benefit ROI. Very high — cost near-zero, value is a tripwire on regression in test reach.
if, while, for, ?:, switch cases) executed.A && B taken true might never test A=true, B=false). Order of evaluation. Loop iteration counts. Assertion strength.--branch (coverage.py), branch mode (Istanbul is branch-aware by default), JaCoCo reports branches natively. Strictly stronger than line/statement coverage (Graph AI).if (A || B) achieves 100% branch coverage with only one true-evaluating sub-condition and one false branch overall.Definition. Every Boolean sub-condition in every decision has taken both true and false at least once.
What it does NOT measure. Whether each sub-condition independently affects the outcome (that is MC/DC). Does not require all combinations.
Typical tools. Same toolchains as branch coverage; many report condition coverage as a separate column.
| Ecosystem | Tool |
|-----------|------|
| JVM | JaCoCo (condition counters in branch reports) |
| C / C++ | gcov/gcovr (--branch-counts), Qt Coco |
| .NET | Coverlet (condition coverage via Cobertura output) |
When to use vs skip. Informative for code with compound expressions; rarely useful as a CI gate on its own.
Targets / thresholds & pitfalls. Achievable without exercising every combination. if (A && B) hits 100% condition coverage with {A=T,B=F} and {A=F,B=T} — neither makes the decision true. Treat as a diagnostic, not a gate.
ROI: Medium — useful diagnostically when investigating why branch coverage looks high but bugs persist; not a gate.
Definition (per Wikipedia):
For n conditions, MC/DC is achievable with n+1 to 2n tests via independence pairs — vastly cheaper than the 2^n of exhaustive multiple-condition coverage (LDRA).
What it does NOT measure. Loop iteration counts, data values, integration paths, assertion strength.
Typical tools. LDRA TBvision, Rapita RapiCover, VectorCAST, Razorcat TESSY, Qt Coco, Parasoft C/C++test. Mostly commercial — open-source MC/DC is rare.
When to use vs skip. When mandated by a standard (DO-178C DAL A, ISO 26262 ASIL D, IEC 62304 Class C high-risk modules, EN 50128 SIL 4, IEC 61508 SIL 4). Outside regulated domains, branch coverage + mutation testing covers the same intent at lower cost.
Targets / thresholds & pitfalls. 100% by definition in regulated domains. Short-circuit evaluation in C-like languages can make some independence pairs unreachable; compiler optimizations can collapse conditions, so coverage builds must disable optimization — meaning the coverage-build binary is not the release-build binary, an acknowledged regulatory risk (Verifysoft).
Cost-benefit ROI. Very high cost (specialist toolchain + labor + documentation overhead); high value only where required by law/standard.
Definition. Percentage of declared functions/methods invoked at least once.
What it does NOT measure. Anything about the bodies of those functions.
Typical tools. Reported by most structural-coverage tools as a side column.
| Ecosystem | Tool |
|-----------|------|
| JVM | JaCoCo (method counter) |
| .NET | Coverlet (methods column) |
| Python | coverage.py (report -m granularity), pytest-cov |
| JS/TS | Istanbul (functions metric in lcov / json-summary) |
When to use vs skip. As a quick "did I forget a module?" check; never as a primary metric.
Targets / thresholds & pitfalls. Often deceptively high — many functions are entered by happy-path tests with no error-path coverage inside.
ROI: Low — informational only; useful as a "module forgotten?" tripwire, not a gate.
Definition. Percentage of unique linearly-independent paths through a function. Bounded by cyclomatic complexity V(G) = decisions + 1 (Cyclomatic complexity).
What it does NOT measure. Anything practical for non-trivial functions — N decisions yields 2^N paths, unbounded for loops.
Typical tools. Some commercial safety-critical tools report basis-path counts; rarely a CI artifact.
| Ecosystem | Tool | |-----------|------| | Safety-critical C/C++ | LDRA TBvision, VectorCAST (basis-path metrics) | | Any / complexity proxy | lizard, radon, SonarQube (cyclomatic complexity as a bound, not a path metric) |
When to use vs skip. Rarely as a coverage target. Cyclomatic complexity is more useful as a complexity signal that bounds the minimum number of tests needed to exercise distinct flows.
Targets / thresholds & pitfalls. Combinatorial explosion. Most production code is uncovered at path-coverage level and that is acceptable.
ROI: Low for production code; meaningful only inside very small, very high-criticality functions — outside that, use complexity as a signal and stop.
Reminder: Mutation testing is a coverage analysis of an existing test suite. It is not a test type. It produces a score and a list of survived mutants; it does not produce new tests. You apply it to your unit / integration suite, not instead of it.
Mutation testing introduces small, syntactic modifications ("mutants") to the source and re-runs the existing test suite against each mutant. If at least one test fails for a given mutant, the mutant is killed (the suite detected the fault). If all tests pass, the mutant survived (the suite is blind to that change). It measures test-suite fault-detection power, not source-code reach (Stryker docs).
Typical mutation operators:
+ → -, * → /, ++ → --.< → <=, == → !=, && → ||.true → false, remove !.return x → return null / return "".> → >=.| State | Meaning | |-------|---------| | Killed | At least one test failed on the mutant. Suite detected the fault. | | Survived | All tests passed on the mutant. Suite is blind. | | No coverage | No test executed the mutated code (orthogonal gap — code itself is untested). | | Timeout | Tests hung; usually counted as a kill (the suite did observe abnormal behavior). | | Compile error / runtime error | Mutant is syntactically/semantically invalid; usually filtered. | | Ignored | Filtered by config (generated code, glue, etc.). |
Score: mutation_score = killed_mutants / (total_mutants - equivalent_mutants - errors). Some tools also report a "killed%" relative to covered mutants only.
no coverage, identical to "line not covered."| Ecosystem | Tool | |-----------|------| | JS / TS | Stryker (StrykerJS) — TypeScript checker plugin filters compile-error mutants | | .NET (C#) | Stryker .NET; documented in Microsoft Learn | | Java / JVM | PIT (Pitest) — reference standard for JVM; Major Mutator for research | | Python | mutmut, Cosmic Ray, MutPy | | Go | go-mutesting, ooze | | PHP | Infection | | Ruby | mutant | | Rust | cargo-mutants | | C / C++ | Mull — LLVM-based | | Scala | Stryker4s |
Apply when:
no coverage and you learn nothing new beyond what structural coverage already shows.Skip when:
Stryker defaults (config): high: 80, low: 60, break: null. Set break to fail the build below a floor. Apply with the risk caveat: 60–80% on a mature unit suite over pure-logic core is a reasonable starting point; never on glue code. Common pitfalls: chasing equivalents (asymptote), running on UI/config (noise), running on shallow suites (re-reports what coverage already shows).
Mutation testing subsumes and supplements structural coverage:
no coverage — identical signal to "line not covered."Mapping between specification artifacts (requirements, user stories, acceptance criteria, regulatory clauses) and verifying tests:
requirements_coverage = requirements_with_>=1_passing_test / total_requirements
The foundational artifact is the Requirements Traceability Matrix (RTM) — a two-dimensional table correlating requirements to test cases (ISTQB Glossary). Enables:
| Tool | Type | Notes |
|------|------|-------|
| Jira + Xray / Zephyr / Test Manager | ALM | Stories ↔ tests linked in tickets |
| Polarion, IBM DOORS / DOORS Next, Codebeamer | Regulated-domain ALM | Tier 1 for DO-178C / ISO 26262 |
| Spreadsheet + tags in test names | Lightweight | Works for small teams; deteriorates at scale |
| BDD scenario reports | BDD-aligned | Cucumber + Pickles report generator |
| @Tag("AC-123") style annotations | Code-level | JUnit / pytest tag-based linking to AC IDs |
In BDD ecosystems, Gherkin Scenario: blocks are the unit of acceptance coverage; each AC ideally maps to one or more scenarios (the Cardinal Rule of BDD: one scenario, one behavior — Automation Panda). Tools: Cucumber (multi-language), SpecFlow (note: the active community fork is Reqnroll), behave (Python), pytest-bdd, Robot Framework, Behat (PHP).
Foundational standards: ISTQB Foundation Level treats RTM as a foundational artifact for systematic test design. ISO/IEC/IEEE 29119 parts 1–5 require traceability at the test-plan, test-design, and test-execution levels (ISTQB-to-29119 mapping in rcolomo.com). Regulatory traceability is mandatory in DO-178C, ISO 26262 Part 6 work products, IEC 62304 verification record, and FDA 21 CFR Part 820.
it("AC-3: rejects mismatched passwords")).100% requirements coverage is a reasonable goal — every documented AC should have at least one test. The pitfalls are qualitative, not numerical:
API coverage measures the integration surface — endpoints, methods, status codes, payload fields, and inter-service contracts — exercised by the test suite, independent of code coverage.
(unique_endpoint_method_pairs_hit) / (total_documented_endpoint_method_pairs). Often broken down further by (endpoint, method, status_code) triples.(method, path_template, status) triples observed during test runs vs the OpenAPI spec./users/123 vs /users/:id) is a common defect in custom collectors. Undocumented endpoints showing 0% when they are the riskiest. 5xx error paths are rarely tested.oneOf / anyOf variants exercised across the test suite.oneOf variants are common blind spots; deprecated fields linger uncovered.test_suite_id to attribute spans to test runs; service-mesh telemetry (Istio, Linkerd).test_suite_id tags missing on out-of-band background work.orderTotal with partitions {<0, 0-99, 100-499, ≥500} has 4 EP slots plus (B-1, B, B+1) boundary triples at 0, 100, 500 (9 boundary slots).(B-1, B, B+1) triple. Pitfall: counting one BVA test as "covering" a partition.t=3); ordering effects in workflows.t up to 6), CATS, Jenny, Hexawise.catch blocks, Result.Err branches, if (err != nil) branches, and explicit error return paths exercised.catch blocks.catch blocks in coverage reports; they are usually the suite's largest blind spot.Different coverage axes detect different bug classes. Combining them is multiplicative, not additive.
Worked example — input validator for a discount code. Consider if (code.length > 0 && code.startsWith("PROMO")). A single happy-path test with "PROMO10" produces:
| Axis added | What it newly catches | What still hides |
|------------|----------------------|------------------|
| Line only | Everything executes once — 100% line. | Both empty-string and non-PROMO rejection paths are untested; a refactor to >= 0 ships green. |
| + Branch | Forces a false-branch test, e.g. "". Now the empty-string case is exercised. | A weak assertion (expect(result).toBeDefined()) still kills no bugs; mutating > → >= leaves the suite green because no test asserts the specific rejection. |
| + Mutation | Killing the > → >= and startsWith → endsWith mutants forces tests with specific assertions on the rejection outcome. | A boundary value like code.length === MAX is still untested — that requires Boundary / data coverage. |
| + Data (BVA) | Adds the MAX, MAX+1, and null cases. | A second consumer that calls this validator with code = undefined is still uncaught — that requires Contract coverage. |
Each new axis closes a class of fault the previous axes structurally cannot see — branch alone can never detect weak assertions, mutation alone can never detect untested boundary values, and contract coverage alone can never detect logic bugs. That is the multiplicative effect. The table below lists each axis and the blind spots it has — read it as "to fill this blind spot, layer another axis."
| Axis | Blind to | |------|----------| | Statement / line | Untaken branches, missing assertions, wrong return values | | Branch / decision | Compound-condition independence, loop iteration counts, assertion strength | | MC/DC | Data-value boundaries (still need BVA), assertion strength | | Path | Combinatorial explosion at runtime; impractical beyond toy functions | | Mutation | Dead-code regions (no coverage); semantic correctness of assertions (a wrong-but-strict assertion still kills mutants) | | RTM / requirements | Whether the requirement is complete; whether the test is correct | | Endpoint / API | Field-level shape correctness; behavioral correctness | | Contract | Negative space — provider features no consumer uses | | Schema | Semantic correctness; field meaning | | Pairwise | 3-way+ interactions; ordering effects | | State-transition | Data carried across transitions; non-modeled behavior | | Error-path | Whether the error was handled correctly |
A typical layered stack for non-regulated product code:
This stack costs marginal CI minutes and surfaces the majority of test-suite blind spots that matter in practice.
The single most common reason high coverage fails to reflect quality:
// 100% statement coverage. 0% useful.
it("computes discount", () => {
discount(150); // line executed
// no assertion
});
Or, more subtly:
// 100% statement coverage. Verifies nothing semantic.
it("renders form", () => {
expect(render(<Form />)).toMatchSnapshot();
});
Both tests execute every line they touch and produce green coverage reports. Both are detected by mutation testing: every operator mutation in discount survives because no assertion can fail; every change in <Form /> survives because the snapshot can be updated with --update-snapshot. Coverage alone cannot distinguish these from a strong suite. Mutation testing can.
Counter-measure: when a coverage report shows ≥80% with persistent bugs in production, run mutation analysis on the suspect modules before raising the coverage target.
Coverage targets should be proportional to risk (ISO/IEC/IEEE 29119). The same artifact at different criticalities should not have the same coverage gate. Use the table below as a starting point — adjust per artifact, not per team policy.
| Criticality | Reasonable structural target | Mutation? | RTM? | MC/DC? | |-------------|------------------------------|-----------|------|--------| | NONE (docs, throwaway) | — | No | No | No | | LOW (internal tooling) | 60% line | No | Lightweight | No | | MEDIUM (CRUD / standard product) | 70–80% branch | On core logic only | Per AC | No | | MEDIUM-HIGH (user-facing critical path) | 80% branch | On core logic + key validators | Mandatory | No | | HIGH (money, auth, security) | 90%+ branch | Required on pure-logic core | Mandatory + audit-grade | Where standard mandates | | Regulated | Per standard | Recommended on pure-logic core | Mandatory + audit-grade | Required where standard mandates |
Delta thresholds ("must not drop more than X%") are generally safer than absolute floors — they protect against regression in testing discipline without rewarding gaming (Optivem). The exception is greenfield code, where there is no baseline to delta against.
expect(component).toMatchSnapshot() covers every line but verifies nothing semantic; one careless --update-snapshot invalidates the suite. Counter: cap snapshot contribution, or require an accompanying behavior assertion.a && b && doX() instead of if (a && b) doX();) — tooling-dependent gaming.// coverage:ignore comment with a rationale that survives code review.*.generated.* deterministically.coverage_build_behavior ≠ release_build_behavior. Especially relevant for C/C++/Rust and MC/DC — the coverage-build binary is not the release-build binary.-covermode=atomic, JaCoCo offline instrumentation).| Standard | Domain | Coverage requirement | |----------|--------|----------------------| | DO-178C §6.4.4.2, Table A-7 | Avionics | MC/DC required for DAL A; Decision coverage for DAL B; Statement coverage for DAL C; none for DAL D/E (LDRA structural coverage) | | ISO 26262 Part 6 Table 9 | Automotive | MC/DC highly recommended for ASIL D; branch for ASIL B–C; statement minimum for ASIL A (Verifysoft) | | IEC 62304 | Medical | Coverage scales with safety class A → B → C; auditors expect MC/DC for Class C high-risk modules; MISRA C/C++ coding standards customary alongside | | MISRA C / MISRA C++ | Embedded (cross-industry) | Coding-rule conformance is the primary signal; coverage is complementary; MC/DC expected for safety-critical components | | EN 50128 / EN 50657 | Rail | MC/DC required for SIL 4 | | IEC 61508 | Functional-safety umbrella | MC/DC for SIL 4 |
In regulated domains, the coverage target is set by the standard, not negotiated against ROI. The choice is which standard applies, not what threshold to pick.
These are lookup tables, not steps. Read the row that matches your artifact and criticality; ignore the rest.
| Criticality | Recommended coverage signals | |-------------|------------------------------| | NONE | None — explicit skip | | LOW | Line coverage as tripwire; AC linkage if the work has ACs | | MEDIUM | Branch coverage + AC linkage; endpoint coverage if API artifact | | MEDIUM-HIGH | Branch coverage + AC linkage + endpoint / contract coverage; mutation on validators | | HIGH | Branch coverage + mutation on pure logic + AC linkage + state-transition coverage for workflows + pairwise for config; MC/DC where standard mandates | | Regulated (DO-178C / ISO 26262 / IEC 62304) | Whatever the standard prescribes — usually MC/DC + RTM + auditable evidence |
| Artifact | Most informative coverage axes | |----------|--------------------------------| | Pure utility function (parser, calculator, formatter) | Branch + BVA/EP + mutation | | HTTP endpoint with DB / queue | Branch (unit) + endpoint + contract + status-code | | UI component | Storybook story coverage + interaction coverage + visual regression baseline; logic helpers via branch + mutation | | Workflow engine / state machine | State-transition (0-switch minimum, 1-switch on critical paths) + branch | | Authorization / policy module | Branch + MC/DC if regulated, otherwise branch + mutation; decision-table coverage at design time | | Multi-parameter config / feature flag | Pairwise (PICT / ACTS) + branch | | Public API consumed by N clients | Contract (Pact) + endpoint + schema | | Library / SDK | Branch + property-based; mutation on pure logic; published-API surface coverage | | Generated code | Excluded from structural coverage; verify via integration only |
| Ecosystem | Structural | Mutation | API | Combinatorial | BDD | |-----------|-----------|----------|-----|---------------|-----| | JS / TS | Istanbul/nyc, Vitest, Jest | Stryker | Schemathesis, Pact JS, Dredd | PICT | Cucumber-JS, Vitest+Gherkin | | Java / JVM | JaCoCo | PIT | Pact JVM, Spring Cloud Contract | jcunit / NIST ACTS | Cucumber-JVM | | Python | Coverage.py + pytest-cov | mutmut, Cosmic Ray | Schemathesis, pact-python | NIST ACTS, allpairspy | behave, pytest-bdd | | Go | go cover + gocover-cobertura | go-mutesting | Schemathesis, pact-go | (manual / NIST ACTS) | godog | | .NET | Coverlet, dotCover, AltCover (OpenCover deprecated) | Stryker .NET | Pact .NET, Specmatic | PICT, NIST ACTS | Reqnroll (active fork of SpecFlow) | | Ruby | SimpleCov | mutant | pact-ruby | (manual) | Cucumber, RSpec | | C / C++ | gcov/lcov, llvm-cov | Mull | (vendor-specific) | NIST ACTS | (vendor-specific) | | Rust | cargo-llvm-cov, tarpaulin | cargo-mutants | (limited) | proptest combinators | (limited) | | PHP | Xdebug / PCOV | Infection | pact-php | (manual) | Behat |
For UI-heavy projects, these specialized coverage signals supplement (do not replace) structural and mutation analysis:
play function) vs only a rendered snapshot.toHaveScreenshot). Coverage element = component-state visual baseline.These do not measure logic correctness (use branch + mutation for that), cross-component flows (use e2e), accessibility (use axe-core / Pa11y), or real-device variations (use BrowserStack / Sauce / device farms).
testing
Use before writing any type of tests. Distills 14 industry sources into deterministic decision gates, schemas, and worked test examples.
testing
Refine, parallelize, and verify a draft task specification into a fully planned implementation-ready task
data-ai
Implement a task with automated LLM-as-Judge verification per step
development
Use when creating or developing, before writing code or implementation plans - refines rough ideas into fully-formed designs through collaborative questioning, alternative exploration, and incremental validation. Don't use during clear 'mechanical' processes