plugins/pensive/skills/test-review/SKILL.md
Evaluates test suites for coverage gaps, TDD/BDD compliance, and anti-patterns. Use when auditing test quality or before a major release.
npx skillsauth add athola/claude-night-market test-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
test-review:languages-detected)test-review:coverage-inventoried)test-review:scenario-quality)test-review:gap-remediation)test-review:evidence-logged)Evaluate and improve test suites with TDD/BDD rigor.
/test-review
Verification: Run pytest -v to verify tests pass.
test-review:languages-detectedtest-review:coverage-inventoriedtest-review:scenario-qualitytest-review:invariant-preservationtest-review:gap-remediationtest-review:evidence-loggedLoad modules as needed based on review depth:
modules/framework-detection.mdmodules/coverage-analysis.mdmodules/scenario-quality.mdmodules/remediation-planning.mdtest-review:languages-detected)Identify testing frameworks and version constraints.
→ See: modules/framework-detection.md
Quick check:
find . -maxdepth 2 -name "Cargo.toml" -o -name "pyproject.toml" -o -name "package.json" -o -name "go.mod"
Verification: Run the command with --help flag to verify availability.
test-review:coverage-inventoried)Run coverage tools and identify gaps.
→ See: modules/coverage-analysis.md
Quick check:
git diff --name-only | rg 'tests|spec|feature'
Verification: Run pytest -v to verify tests pass.
test-review:scenario-quality)Evaluate test quality using BDD patterns and assertion checks.
→ See: modules/scenario-quality.md
Focus on:
test-review:gap-remediation)Create concrete improvement plan with owners and dates.
→ See: modules/remediation-planning.md
test-review:evidence-logged)Record executed commands, outputs, and recommendations.
→ See: imbue:proof-of-work
Tests do not just verify behavior — they encode design invariants. A test that asserts "module A never imports from module B" encodes a layer boundary. A test that asserts "this function is pure" encodes a concurrency model. These tests are load-bearing in ways that coverage metrics cannot capture.
During review, check:
Were invariant-encoding tests removed or weakened? A test that enforced an architectural boundary, data structure constraint, or API contract should not be deleted without naming the invariant being abandoned and escalating to human judgment.
Were test expectations changed to match a broken implementation? If an assertion value changed, ask: did the requirement change, or did the agent change the test to make its code pass? The latter is the single most dangerous form of test tampering.
Are new invariants encoded as tests? When a design decision is made (choice of data structure, module boundary, error strategy), there should be at least one test whose failure would signal that the invariant was violated.
Red flag patterns:
| Pattern | Risk |
|---------|------|
| @pytest.mark.skip added to a passing test | Invariant being silently dropped |
| Assertion changed from specific to broad | Constraint being relaxed |
| Test renamed to describe new behavior | Old invariant erased from history |
| Test deleted "because it tested old code" | Invariant removed without replacement |
When invariant erosion is detected:
Do NOT approve. Flag as a BLOCKING quality issue and present the three options to the human:
This is a judgment call that models get wrong far too often. Default to option 1 (preserve) when no human is available.
## Summary
[Brief assessment]
## Framework Detection
- Languages: [list] | Frameworks: [list] | Versions: [constraints]
## Coverage Analysis
- Overall: X% | Critical: X% | Gaps: [list]
## Quality Issues
[Q1] [Issue] - Location - Fix
## Remediation Plan
1. [Action] - Owner - Date
## Recommendation
Approve / Approve with actions / Block
Verification: Run the command with --help flag to verify availability.
imbue:proof-of-work for reproducible evidence captureimbue:diff-analysis for risk assessmentimbue:structured-output patternsTests not discovered
Ensure test files match pattern test_*.py or *_test.py. Run pytest --collect-only to verify.
Import errors
Check that the module being tested is in PYTHONPATH or install with pip install -e .
Async tests failing
Install pytest-asyncio and decorate test functions with @pytest.mark.asyncio
tools
Detect friction signals; graduate patterns into rules. Use for session retrospectives.
testing
Use when you need a diff-derived test plan for an MR — reads the diff, groups changes by area, runs targeted verifications, and proves revert-tests are genuine guards, not dead assertions.
development
Curate the web-capture index. Use when the capture backlog grows, captures sit unprocessed at seedling/pending, or to surface stored research during work.
testing
Probe memory/summary clarity via dual anchor questions: task progress, info gaps. Use when verifying session state or summary before handoff or compression.