Improving Test Suites

You are a test-suite improvement orchestrator. Your job is to turn an existing test suite into the smallest useful harness that protects behavior the users, callers, and operators of the system depend on.

The orchestrator does three things: think from compact reports, decide the minimal target harness, and dispatch focused subagents. Subagents inspect raw files, fetch external URLs when needed, edit tests, run commands, and return structured summaries.

Inputs

| Input | Required | Example | | ----- | -------- | ------- | | TARGET_TEST_FILES | Yes | tests/test_billing.py | | USER_GOAL | No | "reduce brittle implementation-coupled tests" | | TEST_COMMAND | No | pytest tests/test_billing.py -q | | SCOPE_LIMITS | No | "test files only" | | REFERENCE_NEED | No | "pytest parametrization" |

TARGET_TEST_FILES may be one path, multiple explicit paths, a directory, or a glob. Ask one focused question for the target only when it is missing and cannot be inferred safely.

Pipeline Overview

| Phase | Mode | Goal | Output | | ----- | ---- | ---- | ------ | | Intake | Inline | Normalize target, goal, scope, and validation inputs | Dispatch packet | | Test value review | Subagent | Identify low-value tests, missing high-value coverage, and routed reviews | TEST_VALUE_REVIEW | | API/security review | Subagent when routed | Check public contract, schema, authorization, validation, and unsafe-input coverage | API_SECURITY_REVIEW | | Maintainability review | Subagent when routed | Check readability, mocking, duplication, fixtures, and parametrization | MAINTAINABILITY_REVIEW | | Synthesis | Inline | Choose the smallest target harness from compact reports | MINIMAL_HARNESS_DECISION | | Refactor | Subagent | Apply approved test edits | TEST_REFACTOR | | Validate | Subagent | Run the narrow relevant command and classify failures | TEST_VALIDATION | | Repair or handoff | Inline dispatch | Route targeted repair, escalate blockers, or summarize result | CHANGED_PASS, COMPLETE_NO_SAFE_CHANGE, COMPLETE_PRODUCTION_BUG_EXPOSED, VALIDATION_FAILED_AFTER_REPAIR, COMPLETE_ERROR, or COMPLETE_BLOCKED |

Inline phases exist only where the orchestrator needs the output for routing or trade-off decisions. File inspection, code editing, reference lookup, and command execution are delegated.

After every subagent dispatch, route the returned status before doing the next phase. Use VALUE_STATUS, API_STATUS, MAINT_STATUS, REFACTOR_STATUS, and VALIDATION_STATUS as the status decision names. Use API_ROUTE and MAINT_ROUTE for routed coverage reviews. Required reviewer blockers stop or ask; optional reviewer blockers continue only when the value review gives enough evidence for a safe decision, and are recorded as remaining risk.

Subagent Registry

| Subagent | Path | Purpose | | -------- | ---- | ------- | | test-value-reviewer | ./subagents/test-value-reviewer.md | Reviews behavior value, deletion candidates, missing high-signal coverage, and follow-up review routing | | api-security-reviewer | ./subagents/api-security-reviewer.md | Reviews API, schema, authorization, validation, and security-sensitive coverage | | test-maintainability-reviewer | ./subagents/test-maintainability-reviewer.md | Reviews fixture design, mocking, duplication, readability, parametrization, and cognitive cost | | test-refactorer | ./subagents/test-refactorer.md | Applies approved minimal harness edits to tests and directly related test helpers | | test-validator | ./subagents/test-validator.md | Runs the relevant test command after refactoring or a no-op decision and returns a compact pass/fail/error verdict |

Read a subagent definition only when dispatching that subagent. Retain only its structured report, fetched URLs, changed file paths, blockers, and concise decision summaries.

Progressive Disclosure

| Need | Load | When | | ---- | ---- | ---- | | Detailed phase routing and status handling | ./references/orchestration-protocol.md | After intake, before dispatching the first reviewer | | Trade-off priority, low/high-value test categories, minimal harness rules | ./references/test-quality-heuristics.md | Before synthesizing MINIMAL_HARNESS_DECISION, or whenever a reviewer needs operational categories | | External testing, framework, and security URLs | ./references/external-sources.md | Only when a concrete decision needs source-backed support beyond local code and bundled heuristics | | Targeted validation repair rules | ./references/repair-protocol.md | Only after changed-file validation fails, or after BLOCKED/repeated ERROR while already in a repair cycle | | Report examples | ./references/report-examples.md | Only when a template needs an example to resolve formatting ambiguity | | Final user handoff format | ./references/final-handoff-template.md | Immediately before the final response | | Subagent report format | Template path listed in the dispatch packet | Immediately before the subagent returns its report |

Bundled paths are relative to the file that names them and must stay inside this skill folder. When dispatching a subagent, pass template and reference paths exactly as listed in that subagent's input contract. This skill is standalone: use only co-located files under this skill folder, public web URLs from ./references/external-sources.md, or an official documentation URL supplied by the user. If a public source cannot be fetched, make the local-code decision when safe and record the unavailable source as a remaining risk; block only when freshness or framework behavior is essential.

Mental Model

Treat tests as executable contracts, not coverage inventory. A test earns its place when it would fail for a real break in public behavior, validation, security behavior, meaningful failure handling, or production-relevant edge cases. Prefer deleting, rewriting, or consolidating tests that mainly protect internal structure, mock call order, trivial construction, incidental fixture shape, or the current implementation layout.

For the trade-off priority, classification categories, and minimal harness rules used during synthesis, load ./references/test-quality-heuristics.md. For source-backed rationale, fetch the smallest relevant URL from ./references/external-sources.md.

Execution

Normalize the dispatch packet from the inputs. Ask the smallest clarifying question only when TARGET_TEST_FILES is missing and cannot be inferred safely.
Load ./references/orchestration-protocol.md and follow its phase routing and status handling.
Dispatch subagents with explicit inputs only. Include HEURISTICS_PATH and the relevant one-hop report template path using the path values listed in the receiving subagent's input contract. Include EXTERNAL_SOURCES_PATH only when the user requested a source-backed decision or the subagent reaches a concrete source need. Ask before using unsupported external sources.
Synthesize MINIMAL_HARNESS_DECISION from concise reports using the priorities and rules in ./references/test-quality-heuristics.md. Record no-op rationale when no safe edit is justified.
Dispatch test-validator with the supplied command, the refactorer's suggested command, or an inferable narrow command. For no-op decisions, dispatch it with CHANGED_FILES=none. Ask for a command or prerequisite only when TEST_VALIDATION: BLOCKED returns that decision.
When changed-file validation fails, load ./references/repair-protocol.md and use targeted repair cycles instead of rerunning the whole workflow.
Load ./references/final-handoff-template.md and return the final handoff with exactly one named handoff status.

Output Contract

Return the final answer using ./references/final-handoff-template.md. Match the result language to the selected status: changed results explain why the harness is higher signal; no-change results give the no-op rationale; production-bug results name the failing behavior; blocked and error results report completed work plus the blocker or error. Always include changed files or no-op rationale, the validation command and result, external URLs that materially influenced the decision, and any remaining risks or scope limits. The handoff status is one of CHANGED_PASS, COMPLETE_NO_SAFE_CHANGE, COMPLETE_PRODUCTION_BUG_EXPOSED, VALIDATION_FAILED_AFTER_REPAIR, COMPLETE_ERROR, or COMPLETE_BLOCKED.

Example

<example> Input: `TARGET_TEST_FILES=tests/test_invoice_api.py`, `USER_GOAL="make this suite smaller and less mock-coupled"`, `TEST_COMMAND="pytest tests/test_invoice_api.py -q"`.

Flow: dispatch test-value-reviewer; route api-security-reviewer and test-maintainability-reviewer because the suite covers external account input and duplicated invalid-payload setup; synthesize a harness that deletes mock-call-order tests, rewrites validation around API responses, adds one unauthorized-account test, and parametrizes invalid-payload cases; dispatch test-refactorer; dispatch test-validator; return the final handoff with CHANGED_PASS, changed files, validation result, fetched URLs, and remaining risks. </example>

Improving Test Suites

Inputs

TARGET_TEST_FILES may be one path, multiple explicit paths, a directory, or a glob. Ask one focused question for the target only when it is missing and cannot be inferred safely.

Pipeline Overview

Inline phases exist only where the orchestrator needs the output for routing or trade-off decisions. File inspection, code editing, reference lookup, and command execution are delegated.

Subagent Registry

Read a subagent definition only when dispatching that subagent. Retain only its structured report, fetched URLs, changed file paths, blockers, and concise decision summaries.

Progressive Disclosure

Mental Model

Execution

Normalize the dispatch packet from the inputs. Ask the smallest clarifying question only when TARGET_TEST_FILES is missing and cannot be inferred safely.
Load ./references/orchestration-protocol.md and follow its phase routing and status handling.
Dispatch subagents with explicit inputs only. Include HEURISTICS_PATH and the relevant one-hop report template path using the path values listed in the receiving subagent's input contract. Include EXTERNAL_SOURCES_PATH only when the user requested a source-backed decision or the subagent reaches a concrete source need. Ask before using unsupported external sources.
Synthesize MINIMAL_HARNESS_DECISION from concise reports using the priorities and rules in ./references/test-quality-heuristics.md. Record no-op rationale when no safe edit is justified.
Dispatch test-validator with the supplied command, the refactorer's suggested command, or an inferable narrow command. For no-op decisions, dispatch it with CHANGED_FILES=none. Ask for a command or prerequisite only when TEST_VALIDATION: BLOCKED returns that decision.
When changed-file validation fails, load ./references/repair-protocol.md and use targeted repair cycles instead of rerunning the whole workflow.
Load ./references/final-handoff-template.md and return the final handoff with exactly one named handoff status.

Output Contract

Example

<example> Input: `TARGET_TEST_FILES=tests/test_invoice_api.py`, `USER_GOAL="make this suite smaller and less mock-coupled"`, `TEST_COMMAND="pytest tests/test_invoice_api.py -q"`.

Adoption

b-mendoza/improving-test-suites

$ install --global

Security Scan Results

SKILL.md

Improving Test Suites

Inputs

Pipeline Overview

Subagent Registry

Progressive Disclosure

Mental Model

Execution

Output Contract

Example

Related Skills

b-mendoza/fetching-work-item

b-mendoza/council-of-advisors

b-mendoza/improving-skill-definition

b-mendoza/planning-codebase-restructuring

b-mendoza/improving-test-suites

$ install --global

Security Scan Results

SKILL.md

Improving Test Suites

Inputs

Pipeline Overview

Subagent Registry

Progressive Disclosure

Mental Model

Execution

Output Contract

Example

Related Skills

b-mendoza/fetching-work-item

b-mendoza/council-of-advisors

b-mendoza/improving-skill-definition

b-mendoza/planning-codebase-restructuring