Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

athola/test-review

Name: test-review
Author: athola

plugins/pensive/skills/test-review/SKILL.md

npx skillsauth add athola/claude-night-market test-review

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Quick Start
When to Use
Required TodoWrite Items
Progressive Loading
Workflow
Step 1: Detect Languages (test-review:languages-detected)
Step 2: Inventory Coverage (test-review:coverage-inventoried)
Step 3: Assess Scenario Quality (test-review:scenario-quality)
Step 4: Plan Remediation (test-review:gap-remediation)
Step 5: Log Evidence (test-review:evidence-logged)
Test Quality Checklist (Condensed)
Output Format
Summary
Framework Detection
Coverage Analysis
Quality Issues
Remediation Plan
Recommendation
Integration Notes
Exit Criteria

Test Review Workflow

Evaluate and improve test suites with TDD/BDD rigor.

Quick Start

/test-review

Verification: Run pytest -v to verify tests pass.

When To Use

Reviewing test suite quality
Analyzing coverage gaps
Before major releases
After test failures
Planning test improvements

When NOT To Use

Writing new tests - use parseltongue:python-testing
Updating existing tests - use sanctum:test-updates

Required TodoWrite Items

test-review:languages-detected
test-review:coverage-inventoried
test-review:scenario-quality
test-review:invariant-preservation
test-review:gap-remediation
test-review:evidence-logged
test-review:findings-verified

Progressive Loading

Load modules as needed based on review depth:

Basic review: Core workflow (this file)
Framework detection: Load modules/framework-detection.md
Coverage analysis: Load modules/coverage-analysis.md
Quality assessment: Load modules/scenario-quality.md
Remediation planning: Load modules/remediation-planning.md

Workflow

Step 1: Detect Languages (`test-review:languages-detected`)

Identify testing frameworks and version constraints. → See: modules/framework-detection.md

Quick check:

find . -maxdepth 2 -name "Cargo.toml" -o -name "pyproject.toml" -o -name "package.json" -o -name "go.mod"

Verification: Run the command with --help flag to verify availability.

Step 2: Inventory Coverage (`test-review:coverage-inventoried`)

Run coverage tools and identify gaps. → See: modules/coverage-analysis.md

Quick check:

git diff --name-only | rg 'tests|spec|feature'

Verification: Run pytest -v to verify tests pass.

Step 3: Assess Scenario Quality (`test-review:scenario-quality`)

Evaluate test quality using BDD patterns and assertion checks. → See: modules/scenario-quality.md

Focus on:

Given/When/Then clarity
Assertion specificity
Anti-patterns (dead waits, mocking internals, repeated boilerplate)

Step 4: Plan Remediation (`test-review:gap-remediation`)

Create concrete improvement plan with owners and dates. → See: modules/remediation-planning.md

Step 5: Log Evidence (`test-review:evidence-logged`)

Record executed commands, outputs, and recommendations. → See: imbue:proof-of-work

Test Quality Checklist (Condensed)

[ ] Clear test structure (Arrange-Act-Assert)
[ ] Critical paths covered (auth, validation, errors)
[ ] Specific assertions with context
[ ] No flaky tests (dead waits, order dependencies)
[ ] Reusable fixtures/factories
[ ] Invariant-encoding tests intact (see below)

Invariant-Encoding Tests

Tests encode design invariants as well as verifying behavior. A test that asserts "module A never imports from module B" encodes a layer boundary. A test that asserts "this function is pure" encodes a concurrency model. These tests are load-bearing in ways that coverage metrics cannot capture.

During review, check:

Were invariant-encoding tests removed or weakened? A test that enforced an architectural boundary, data structure constraint, or API contract should not be deleted without naming the invariant being abandoned and escalating to human judgment.
Were test expectations changed to match a broken implementation? If an assertion value changed, ask: did the requirement change, or did the agent change the test to make its code pass? The latter is the single most dangerous form of test tampering.
Are new invariants encoded as tests? When a design decision is made (choice of data structure, module boundary, error strategy), there should be at least one test whose failure would signal that the invariant was violated.

Red flag patterns:

| Pattern | Risk | |---------|------| | @pytest.mark.skip added to a passing test | Invariant being silently dropped | | Assertion changed from specific to broad | Constraint being relaxed | | Test renamed to describe new behavior | Old invariant erased from history | | Test deleted "because it tested old code" | Invariant removed without replacement |

When invariant erosion is detected:

Do NOT approve. Flag as a BLOCKING quality issue and present the three options to the human:

Preserve: Revert the test change, fix the implementation to satisfy the invariant
Layer: Keep the invariant test, add the new behavior alongside it (accepting inelegance)
Revise: The invariant is genuinely wrong; remove the old test AND write a new test encoding the replacement invariant

This is a judgment call that models get wrong far too often. Default to option 1 (preserve) when no human is available.

Output Format

## Summary
[Brief assessment]

## Framework Detection
- Languages: [list] | Frameworks: [list] | Versions: [constraints]

## Coverage Analysis
- Overall: X% | Critical: X% | Gaps: [list]

## Quality Issues
[Q1] [Issue] - Location - Anchor: `verbatim source text at file:line` - Fix

## Remediation Plan
1. [Action] - Owner - Date

## Recommendation
Approve / Approve with actions / Block

Verification: Run the command with --help flag to verify availability.

Integration Notes

Use imbue:proof-of-work for reproducible evidence capture
Reference imbue:diff-analysis for risk assessment
Format output using imbue:structured-output patterns

Verify Findings Are Grounded (`test-review:findings-verified`)

Every finding must cite a real location and a verbatim anchor. Write findings to .review/findings.json and confirm each citation resolves:

python plugins/imbue/scripts/citation_verifier.py \
  --findings .review/findings.json --repo-root .

Drop or label UNVERIFIED any finding the verifier fails (exit 1); only verified findings enter the report. See Skill(imbue:review-core) Step 5 and Skill(imbue:structured-output) for the schema.

Exit Criteria

Frameworks detected and documented
Coverage analyzed and gaps identified
Scenario quality assessed
Remediation plan created with owners and dates
Evidence logged with citations
Every reported finding carries a Location + verbatim Anchor confirmed by citation_verifier.py (exit 0), or unverified findings were dropped or labeled UNVERIFIED

Troubleshooting

Common Issues

Tests not discovered Ensure test files match pattern test_*.py or *_test.py. Run pytest --collect-only to verify.

Import errors Check that the module being tested is in PYTHONPATH or install with pip install -e .

Async tests failing Install pytest-asyncio and decorate test functions with @pytest.mark.asyncio

athola/test-review

plugins/pensive/skills/test-review/SKILL.md

Evaluates test suites for coverage gaps, TDD/BDD compliance, and anti-patterns. Use when auditing test quality or before a major release.

317 stars

testing

Updated Jun 28, 2026

$ install --global

skillsauth

npx skillsauth add athola/claude-night-market test-review

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 28, 2026, 4:08 AM109.7s6 files scanned

SKILL.md

name:: test-review
description:: Evaluates test suites for coverage gaps, TDD/BDD compliance, and anti-patterns. Use when auditing test quality or before a major release.
alwaysApply:: false
category:: testing
tools:: []
complexity:: intermediate
model_hint:: standard
estimated_tokens:: 200
progressive_loading:: true
- imbue:: structured-output

Quick Start
When to Use
Required TodoWrite Items
Progressive Loading
Workflow
Step 1: Detect Languages (test-review:languages-detected)
Step 2: Inventory Coverage (test-review:coverage-inventoried)
Step 3: Assess Scenario Quality (test-review:scenario-quality)
Step 4: Plan Remediation (test-review:gap-remediation)
Step 5: Log Evidence (test-review:evidence-logged)
Test Quality Checklist (Condensed)
Output Format
Summary
Framework Detection
Coverage Analysis
Quality Issues
Remediation Plan
Recommendation
Integration Notes
Exit Criteria

Test Review Workflow

Evaluate and improve test suites with TDD/BDD rigor.

Quick Start

/test-review

Verification: Run pytest -v to verify tests pass.

When To Use

Reviewing test suite quality
Analyzing coverage gaps
Before major releases
After test failures
Planning test improvements

When NOT To Use

Writing new tests - use parseltongue:python-testing
Updating existing tests - use sanctum:test-updates

Required TodoWrite Items

test-review:languages-detected
test-review:coverage-inventoried
test-review:scenario-quality
test-review:invariant-preservation
test-review:gap-remediation
test-review:evidence-logged
test-review:findings-verified

Progressive Loading

Load modules as needed based on review depth:

Basic review: Core workflow (this file)
Framework detection: Load modules/framework-detection.md
Coverage analysis: Load modules/coverage-analysis.md
Quality assessment: Load modules/scenario-quality.md
Remediation planning: Load modules/remediation-planning.md

Workflow

Step 1: Detect Languages (`test-review:languages-detected`)

Identify testing frameworks and version constraints. → See: modules/framework-detection.md

Quick check:

find . -maxdepth 2 -name "Cargo.toml" -o -name "pyproject.toml" -o -name "package.json" -o -name "go.mod"

Verification: Run the command with --help flag to verify availability.

Step 2: Inventory Coverage (`test-review:coverage-inventoried`)

Run coverage tools and identify gaps. → See: modules/coverage-analysis.md

Quick check:

git diff --name-only | rg 'tests|spec|feature'

Verification: Run pytest -v to verify tests pass.

Step 3: Assess Scenario Quality (`test-review:scenario-quality`)

Evaluate test quality using BDD patterns and assertion checks. → See: modules/scenario-quality.md

Focus on:

Given/When/Then clarity
Assertion specificity
Anti-patterns (dead waits, mocking internals, repeated boilerplate)

Step 4: Plan Remediation (`test-review:gap-remediation`)

Create concrete improvement plan with owners and dates. → See: modules/remediation-planning.md

Step 5: Log Evidence (`test-review:evidence-logged`)

Record executed commands, outputs, and recommendations. → See: imbue:proof-of-work

Test Quality Checklist (Condensed)

[ ] Clear test structure (Arrange-Act-Assert)
[ ] Critical paths covered (auth, validation, errors)
[ ] Specific assertions with context
[ ] No flaky tests (dead waits, order dependencies)
[ ] Reusable fixtures/factories
[ ] Invariant-encoding tests intact (see below)

Invariant-Encoding Tests

During review, check:

Were invariant-encoding tests removed or weakened? A test that enforced an architectural boundary, data structure constraint, or API contract should not be deleted without naming the invariant being abandoned and escalating to human judgment.
Were test expectations changed to match a broken implementation? If an assertion value changed, ask: did the requirement change, or did the agent change the test to make its code pass? The latter is the single most dangerous form of test tampering.
Are new invariants encoded as tests? When a design decision is made (choice of data structure, module boundary, error strategy), there should be at least one test whose failure would signal that the invariant was violated.

Red flag patterns:

When invariant erosion is detected:

Do NOT approve. Flag as a BLOCKING quality issue and present the three options to the human:

Preserve: Revert the test change, fix the implementation to satisfy the invariant
Layer: Keep the invariant test, add the new behavior alongside it (accepting inelegance)
Revise: The invariant is genuinely wrong; remove the old test AND write a new test encoding the replacement invariant

This is a judgment call that models get wrong far too often. Default to option 1 (preserve) when no human is available.

Output Format

## Summary
[Brief assessment]

## Framework Detection
- Languages: [list] | Frameworks: [list] | Versions: [constraints]

## Coverage Analysis
- Overall: X% | Critical: X% | Gaps: [list]

## Quality Issues
[Q1] [Issue] - Location - Anchor: `verbatim source text at file:line` - Fix

## Remediation Plan
1. [Action] - Owner - Date

## Recommendation
Approve / Approve with actions / Block

Verification: Run the command with --help flag to verify availability.

Integration Notes

Use imbue:proof-of-work for reproducible evidence capture
Reference imbue:diff-analysis for risk assessment
Format output using imbue:structured-output patterns

Verify Findings Are Grounded (`test-review:findings-verified`)

Every finding must cite a real location and a verbatim anchor. Write findings to .review/findings.json and confirm each citation resolves:

python plugins/imbue/scripts/citation_verifier.py \
  --findings .review/findings.json --repo-root .

Exit Criteria

Frameworks detected and documented
Coverage analyzed and gaps identified
Scenario quality assessed
Remediation plan created with owners and dates
Evidence logged with citations
Every reported finding carries a Location + verbatim Anchor confirmed by citation_verifier.py (exit 0), or unverified findings were dropped or labeled UNVERIFIED

Troubleshooting

Common Issues

Tests not discovered Ensure test files match pattern test_*.py or *_test.py. Run pytest --collect-only to verify.

Import errors Check that the module being tested is in PYTHONPATH or install with pip install -e .

Async tests failing Install pytest-asyncio and decorate test functions with @pytest.mark.asyncio

Related Skills

athola/architecture-paradigm-domain-driven

data-ai

VerifiedTrustedCommunity

Models a business in its own language. Use when the domain has real business rules to capture.

323SKILL.mdUpdated Jul 15, 2026

athola/architecture-paradigm-domain-driven

athola/ideate

research

VerifiedTrustedCommunity

Generate diverse solution candidates with category-spanning ideation methods and rotation. Use when stuck on a design or fighting repetitive LLM output.

323SKILL.mdUpdated Jun 8, 2026

athola/validate-pr

development

VerifiedTrustedCommunity

Generates and self-executes a diff-derived test plan for a PR. Use when validating PR changes before merge. Do not use for code review; use sanctum:pr-review.

323SKILL.mdUpdated Jun 8, 2026

athola/graduated-implementation

development

VerifiedTrustedCommunity

Ramps implementation ambition a notch only after the prior increment is understood. Use when building a feature you must understand, not just ship.

323SKILL.mdUpdated Jun 8, 2026

athola/graduated-implementation

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/athola/claude-night-market.git

# Copy into Claude Code skills folder (global)
cp -r claude-night-market/plugins/pensive/skills/test-review ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

athola/claude-night-market

317 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

athola/test-review

$ install --global

Security Scan Results

SKILL.md

Table of Contents

Test Review Workflow

Quick Start

When To Use

When NOT To Use

Required TodoWrite Items

Progressive Loading

Workflow

Step 1: Detect Languages (test-review:languages-detected)

Step 2: Inventory Coverage (test-review:coverage-inventoried)

Step 3: Assess Scenario Quality (test-review:scenario-quality)

Step 4: Plan Remediation (test-review:gap-remediation)

Step 5: Log Evidence (test-review:evidence-logged)

Test Quality Checklist (Condensed)

Invariant-Encoding Tests

Output Format

Integration Notes

Verify Findings Are Grounded (test-review:findings-verified)

Exit Criteria

Troubleshooting

Common Issues

Related Skills

athola/architecture-paradigm-domain-driven

athola/ideate

athola/validate-pr

athola/graduated-implementation

athola/test-review

$ install --global

Security Scan Results

SKILL.md

Table of Contents

Test Review Workflow

Quick Start

When To Use

When NOT To Use

Required TodoWrite Items

Progressive Loading

Workflow

Step 1: Detect Languages (test-review:languages-detected)

Step 2: Inventory Coverage (test-review:coverage-inventoried)

Step 3: Assess Scenario Quality (test-review:scenario-quality)

Step 4: Plan Remediation (test-review:gap-remediation)

Step 5: Log Evidence (test-review:evidence-logged)

Test Quality Checklist (Condensed)

Invariant-Encoding Tests

Output Format

Integration Notes

Verify Findings Are Grounded (test-review:findings-verified)

Exit Criteria

Troubleshooting

Common Issues

Related Skills

athola/architecture-paradigm-domain-driven

athola/ideate

athola/validate-pr

athola/graduated-implementation

Step 1: Detect Languages (`test-review:languages-detected`)

Step 2: Inventory Coverage (`test-review:coverage-inventoried`)

Step 3: Assess Scenario Quality (`test-review:scenario-quality`)

Step 4: Plan Remediation (`test-review:gap-remediation`)

Step 5: Log Evidence (`test-review:evidence-logged`)

Verify Findings Are Grounded (`test-review:findings-verified`)

Step 1: Detect Languages (`test-review:languages-detected`)

Step 2: Inventory Coverage (`test-review:coverage-inventoried`)

Step 3: Assess Scenario Quality (`test-review:scenario-quality`)

Step 4: Plan Remediation (`test-review:gap-remediation`)

Step 5: Log Evidence (`test-review:evidence-logged`)

Verify Findings Are Grounded (`test-review:findings-verified`)