kit/plugins/code-testing-agent/skills/reviewing-tests/SKILL.md
Audits tests for quality, coverage, and code alignment. Identifies gaps, redundant tests, and misaligned assertions. Use when the user asks to review, audit, or improve a test suite.
npx skillsauth add shawn-sandy/agentics reviewing-testsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Review existing tests against the source code they cover. Identify quality issues, coverage gaps, and misalignment with developer intent. Every finding is tied to a specific test and explains why it matters.
Freedom level: Flexible — Follow these steps in order. Adapt depth to the test suite's size and the user's request.
Does not run tests — use running-tests. Does not suggest broad new test suites — use code-testing-agent. It may suggest targeted new P1 tests to cover critical gaps as described in Step 7.
Before doing any other work, use TodoWrite to create todos for each step. This gives the user visibility into progress.
Create the following todos (all starting with status: "pending"):
Mark each todo status: "completed" as you finish that step.
Determine which test files to review using this priority order:
git diff --name-only HEAD~1 to find recently changed files. For each changed source file, glob for matching test files:
[name].test.{ts,tsx,js,jsx}, [name].spec.{ts,tsx,js,jsx}test_[name].py, [name]_test.py, [name]_test.go__tests__/, tests/, spec/ directories
Present the list and ask the user to confirm which test files to review.Once resolved, tell the user which test file(s) will be reviewed before proceeding.
Read each target test file in full.
For each test file, find the implementation code it covers:
auth.test.ts, look for auth.ts in the same directory or a parent src/ directory.__tests__/ or tests/, the source is usually in a parallel src/ or project root directory.[test-file] test?"Read each source file in full. Tell the user: "Reviewing tests in [test-file] against source code in [source-file]."
Understanding the developer's intent is critical. Tests should verify the code works as designed. Search for an implementation plan or design context:
Search locations (stop at first match):
docs/plans/ directory — glob for *.md files; if multiple exist, match by filename similarity to the source code (e.g., if reviewing tests for auth-service.ts, look for plans containing "auth" in the name)~/.claude/plans/ directory — same matching logicgit log --oneline -5 for commit messages that reference a plan or describe intent// TODO, // PLAN:, // PURPOSE:, docstrings, or JSDoc @description tags in the source code itselfIf a plan is found:
Read it and extract:
Report to the user: "Found plan: [path]. Using it to evaluate test alignment with intended behavior."
If no plan is found:
Report: "No implementation plan found. I will evaluate tests against behavior inferred from the source code." This is not an error — proceed to Step 4.
Read the source code and identify the following.
Write a 2-4 sentence summary of the code's purpose and primary behavior. This anchors the review — every coverage gap in Step 6 traces back to something identified here.
Identify the paths through the code that matter most:
if/else, switch, early returns) that produce different outcomes.Where does this code connect to other systems?
What does this code promise to its callers that is not enforced by types alone?
Identify fragile areas:
Search for test configuration:
package.json — look for jest, vitest, mocha, ava, playwright, cypress in devDependencies or scriptspytest.ini, pyproject.toml, setup.cfg — Python test configurationCargo.toml — Rust [dev-dependencies] sectiongo.mod — Go uses built-in testing package.github/workflows/ — CI config may reference test commandsMakefile or justfile — may define test targetsSearch for coverage thresholds in project configuration:
jest.config.* or package.json — look for coverageThresholdpyproject.toml — look for [tool.coverage.report] with fail_under.nycrc or .nycrc.json — look for check-coverage and threshold valuescodecov.yml — look for coverage.status.project.default.target.coveragerc — look for [report] with fail_under.github/workflows/ or CI config — look for coverage enforcement flagsReport the detected target: "Coverage target: [X]% (from [config file])."
If no target is found: "No coverage target configured. Will evaluate coverage completeness against all identified code paths."
This is the core output. Load references/test-quality-checklist.md for detailed heuristics on each review dimension. If the file is unavailable, apply the nine review dimensions using the criteria defined in the Review Dimensions list below.
Read every test in the target file(s) and evaluate against the source code analysis from Step 4. For each finding, reference the specific test by name and line number.
Evaluate each test across these 9 dimensions:
When reviewing multiple test files, group findings by test file:
## Test Review for `[test-filename]`
**Source code:** `[source-file]`
**Plan context:** [Brief note or "No plan found — evaluating against inferred behavior"]
**Test framework:** [Detected framework]
### Summary
[2-3 sentence overview: how many tests, what they cover well, where the biggest gaps are]
### Critical Issues
Issues that make tests unreliable, misleading, or actively harmful.
#### Issue: [Descriptive name]
**Test:** `[test name or describe block]` (line [N])
**Problem:** [What's wrong]
**Impact:** [Why this matters — false confidence, missed bugs, CI noise]
**Fix:**
```[language]
// Before → After, or concrete replacement code
```
[Repeat for each critical issue]
### Improvements
Non-critical issues that would make tests more valuable or maintainable.
[Same format as Critical Issues]
### Coverage Gaps
Behaviors identified in the source code analysis (Step 4) that have no corresponding test.
| Untested Behavior | Source Reference | Priority | Why It Matters |
|-------------------|-----------------|----------|----------------|
| [behavior] | [file:line] | P1/P2/P3 | [what breaks undetected] |
**Coverage target:** [X]% | No target configured
**Estimated current gap:** [qualitative: "3 of 8 exported functions have no test"]
### What's Working Well
[1-3 things the tests do right — reinforce good practices]
Follow these rules when evaluating tests:
Behavior over implementation. Flag tests that assert on internal method calls rather than outcomes. "When given an expired token, returns 401" is good. "Calls jwt.verify() once" is bad — it breaks on refactor.
Plan intent drives review, coverage validates completeness. Check tests against plan requirements. Use coverage analysis to verify nothing important is missed. If the project defines a coverage target, evaluate how existing tests contribute to or fall short of it.
One reason to fail per test. Flag tests with multiple unrelated assertions. A test that checks login success, email format, and database write is three tests.
Name tests as behavior sentences. Flag vague test names. "should reject negative quantities in order line items" is good. "test order processing" is bad.
Prioritize by blast radius. Focus review attention on tests that guard the most critical code paths first. A missing test for an auth bypass matters more than a poorly named test for a tooltip.
Acknowledge what's covered. Credit existing tests that work well. Not every test needs improvement.
Evaluate mocking strategy. Flag over-mocking (mocking the thing you're testing), under-mocking (real HTTP calls in unit tests), and stale mocks (mock returns data in a format the real API no longer uses).
Coverage gaps over trivial style issues. Missing tests for critical paths matter more than test naming conventions. Prioritize findings that would actually catch bugs over aesthetic preferences.
After presenting the review, ask:
"Would you like me to apply the fixes? I can update the test file(s) to address the critical issues and improvements above."
If the user says yes:
Apply fixes to the test file(s) using the project's conventions. Include:
After applying, suggest: "Run [test command] to verify the updated tests pass."
If the user says no or wants to fix manually:
Respond: "The review above should guide your improvements. Let me know if you want to discuss any finding in more detail."
If the user asks to fix only specific issues:
Apply only the requested fixes. Do not add unrequested changes.
development
Turns a React component into a social card with preview, code, and props table. Builds a static preview and screenshots react-card.html via Playwright. Use when asked to share a React component.
data-ai
Refine-prompt: interviews users and assembles a structured AI prompt using Anthropic best-practice techniques. Use when the user runs /plan-agent:refine-prompt or asks to refine a prompt.
development
Plan review Agent Team. Reviews HTML implementation plans in parallel, synthesizes findings, and applies improvements in place. Use when the user asks to review or improve an implementation plan.
data-ai
Craft-prompt: interviews users and assembles a structured AI prompt using Anthropic best-practice techniques. Use when the user runs /plan-agent:craft-prompt or asks to craft a prompt.