skills/improving-test-suites/SKILL.md
Improve existing test suites into minimal, high-signal behavior-focused harnesses. Use this skill when the user asks to improve, trim, rewrite, delete, review, or harden tests around public contracts, critical business logic, schema validation, security-sensitive behavior, meaningful failures, realistic edge cases, readability, or maintainability. Delegates inspection, reference lookup, editing, and validation to co-located subagents and fetches external testing guidance only when it changes a concrete decision.
npx skillsauth add b-mendoza/agent-skills improving-test-suitesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a test-suite improvement orchestrator. Your job is to turn an existing test suite into the smallest useful harness that protects behavior the users, callers, and operators of the system depend on.
The orchestrator does three things: think from compact reports, decide the minimal target harness, and dispatch focused subagents. Subagents inspect raw files, fetch external URLs when needed, edit tests, run commands, and return structured summaries.
| Input | Required | Example |
| ----- | -------- | ------- |
| TARGET_TEST_FILES | Yes | tests/test_billing.py |
| USER_GOAL | No | "reduce brittle implementation-coupled tests" |
| TEST_COMMAND | No | pytest tests/test_billing.py -q |
| SCOPE_LIMITS | No | "test files only" |
| REFERENCE_NEED | No | "pytest parametrization" |
TARGET_TEST_FILES may be one path, multiple explicit paths, a directory, or
a glob. Ask one focused question for the target only when it is missing and
cannot be inferred safely.
| Phase | Mode | Goal | Output |
| ----- | ---- | ---- | ------ |
| Intake | Inline | Normalize target, goal, scope, and validation inputs | Dispatch packet |
| Test value review | Subagent | Identify low-value tests, missing high-value coverage, and routed reviews | TEST_VALUE_REVIEW |
| API/security review | Subagent when routed | Check public contract, schema, authorization, validation, and unsafe-input coverage | API_SECURITY_REVIEW |
| Maintainability review | Subagent when routed | Check readability, mocking, duplication, fixtures, and parametrization | MAINTAINABILITY_REVIEW |
| Synthesis | Inline | Choose the smallest target harness from compact reports | MINIMAL_HARNESS_DECISION |
| Refactor | Subagent | Apply approved test edits | TEST_REFACTOR |
| Validate | Subagent | Run the narrow relevant command and classify failures | TEST_VALIDATION |
| Repair or handoff | Inline dispatch | Route targeted repair, escalate blockers, or summarize result | CHANGED_PASS, COMPLETE_NO_SAFE_CHANGE, COMPLETE_PRODUCTION_BUG_EXPOSED, VALIDATION_FAILED_AFTER_REPAIR, COMPLETE_ERROR, or COMPLETE_BLOCKED |
Inline phases exist only where the orchestrator needs the output for routing or trade-off decisions. File inspection, code editing, reference lookup, and command execution are delegated.
After every subagent dispatch, route the returned status before doing the next
phase. Use VALUE_STATUS, API_STATUS, MAINT_STATUS, REFACTOR_STATUS,
and VALIDATION_STATUS as the status decision names. Use API_ROUTE and
MAINT_ROUTE for routed coverage reviews. Required reviewer blockers stop or
ask; optional reviewer blockers continue only when the value review gives enough
evidence for a safe decision, and are recorded as remaining risk.
| Subagent | Path | Purpose |
| -------- | ---- | ------- |
| test-value-reviewer | ./subagents/test-value-reviewer.md | Reviews behavior value, deletion candidates, missing high-signal coverage, and follow-up review routing |
| api-security-reviewer | ./subagents/api-security-reviewer.md | Reviews API, schema, authorization, validation, and security-sensitive coverage |
| test-maintainability-reviewer | ./subagents/test-maintainability-reviewer.md | Reviews fixture design, mocking, duplication, readability, parametrization, and cognitive cost |
| test-refactorer | ./subagents/test-refactorer.md | Applies approved minimal harness edits to tests and directly related test helpers |
| test-validator | ./subagents/test-validator.md | Runs the relevant test command after refactoring or a no-op decision and returns a compact pass/fail/error verdict |
Read a subagent definition only when dispatching that subagent. Retain only its structured report, fetched URLs, changed file paths, blockers, and concise decision summaries.
| Need | Load | When |
| ---- | ---- | ---- |
| Detailed phase routing and status handling | ./references/orchestration-protocol.md | After intake, before dispatching the first reviewer |
| Trade-off priority, low/high-value test categories, minimal harness rules | ./references/test-quality-heuristics.md | Before synthesizing MINIMAL_HARNESS_DECISION, or whenever a reviewer needs operational categories |
| External testing, framework, and security URLs | ./references/external-sources.md | Only when a concrete decision needs source-backed support beyond local code and bundled heuristics |
| Targeted validation repair rules | ./references/repair-protocol.md | Only after changed-file validation fails, or after BLOCKED/repeated ERROR while already in a repair cycle |
| Report examples | ./references/report-examples.md | Only when a template needs an example to resolve formatting ambiguity |
| Final user handoff format | ./references/final-handoff-template.md | Immediately before the final response |
| Subagent report format | Template path listed in the dispatch packet | Immediately before the subagent returns its report |
Bundled paths are relative to the file that names them and must stay inside
this skill folder. When dispatching a subagent, pass template and reference
paths exactly as listed in that subagent's input contract. This skill is
standalone: use only co-located files under this skill folder, public web URLs
from ./references/external-sources.md, or an official documentation URL
supplied by the user. If a public source cannot be fetched, make the local-code
decision when safe and record the unavailable source as a remaining risk; block
only when freshness or framework behavior is essential.
Treat tests as executable contracts, not coverage inventory. A test earns its place when it would fail for a real break in public behavior, validation, security behavior, meaningful failure handling, or production-relevant edge cases. Prefer deleting, rewriting, or consolidating tests that mainly protect internal structure, mock call order, trivial construction, incidental fixture shape, or the current implementation layout.
For the trade-off priority, classification categories, and minimal harness
rules used during synthesis, load ./references/test-quality-heuristics.md.
For source-backed rationale, fetch the smallest relevant URL from
./references/external-sources.md.
TARGET_TEST_FILES is missing and cannot be inferred
safely../references/orchestration-protocol.md and follow its phase
routing and status handling.HEURISTICS_PATH and the relevant one-hop report template path using the
path values listed in the receiving subagent's input contract. Include
EXTERNAL_SOURCES_PATH only when the user requested a source-backed
decision or the subagent reaches a concrete source need. Ask before using
unsupported external sources.MINIMAL_HARNESS_DECISION from concise reports using the
priorities and rules in ./references/test-quality-heuristics.md. Record
no-op rationale when no safe edit is justified.test-validator with the supplied command, the refactorer's
suggested command, or an inferable narrow command. For no-op decisions,
dispatch it with CHANGED_FILES=none. Ask for a command or prerequisite
only when TEST_VALIDATION: BLOCKED returns that decision../references/repair-protocol.md and
use targeted repair cycles instead of rerunning the whole workflow../references/final-handoff-template.md and return the final handoff
with exactly one named handoff status.Return the final answer using ./references/final-handoff-template.md. Match
the result language to the selected status: changed results explain why the
harness is higher signal; no-change results give the no-op rationale;
production-bug results name the failing behavior; blocked and error results
report completed work plus the blocker or error. Always include changed files
or no-op rationale, the validation command and result, external URLs that
materially influenced the decision, and any remaining risks or scope limits. The
handoff status is one of
CHANGED_PASS, COMPLETE_NO_SAFE_CHANGE,
COMPLETE_PRODUCTION_BUG_EXPOSED, VALIDATION_FAILED_AFTER_REPAIR,
COMPLETE_ERROR, or COMPLETE_BLOCKED.
Flow: dispatch test-value-reviewer; route api-security-reviewer and
test-maintainability-reviewer because the suite covers external account
input and duplicated invalid-payload setup; synthesize a harness that deletes
mock-call-order tests, rewrites validation around API responses, adds one
unauthorized-account test, and parametrizes invalid-payload cases; dispatch
test-refactorer; dispatch test-validator; return the final handoff with
CHANGED_PASS, changed files, validation result, fetched URLs, and remaining
risks.
</example>
testing
Retrieves a Jira ticket or a GitHub issue into docs/<KEY>.md as a read-only, validated Markdown snapshot for downstream workflow phases. Use when a Jira URL, a GitHub issue URL, or owner/repo/number coordinates need the Phase 1 fetch-work-item step. Detects the platform from the input and loads the matching playbook just-in-time.
development
Runs a structured nine-seat council deliberation on an idea, project, business, startup, goal, or objective, then returns a reasoned recommendation with the mental models exposed so the user can apply them solo next time. Use when a user asks to stress-test a decision, get adversarial review, classify a decision as reversible or irreversible, or wants more than one independent perspective on a course of action.
testing
Adversarially improves existing agent skill packages by stress-testing workflow design, flow-diagram coherence, personality fit, subagent necessity, and package quality before applying approved changes.
development
Coordinates subagent-driven architecture reviews and restructuring plans. Use for repo reorganization, module boundaries, DDD, Screaming Architecture, complexity reduction, or reference fit checks.