skills/test-rx/SKILL.md
Evaluates testing strategy and completeness across 8 dimensions (32 sub-metrics): test pyramid balance, test effectiveness, contract/API testing, UI/visual testing, performance/load testing, test data management, CI integration, and test organization. Produces a scored diagnostic with actionable improvement plans.
npx skillsauth add acardozzo/rx-suite test-rxInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Recommended: lighthouse, pa11y
Check all dependencies: bash scripts/rx-deps.sh or bash scripts/rx-deps.sh --install
Evaluate whether a codebase tests the right things at the right level. This is not about coverage percentages — it is about testing architecture, strategy completeness, and long-term maintainability.
Source: Test Pyramid (Martin Fowler), Practical Test Pyramid (Ham Vocke)
| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M1.1 | Unit test ratio | % of tests that are true unit tests (no I/O, no network, no DB) | | M1.2 | Integration test coverage | API/DB integration tests present and covering critical paths | | M1.3 | E2E test coverage | Critical user journeys covered by end-to-end tests | | M1.4 | Pyramid shape | unit > integration > E2E count (not ice cream cone anti-pattern) |
Source: Mutation Testing (Pitest, Stryker), Google Testing Blog
| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M2.1 | Mutation score | % of mutants killed (via Stryker, Pitest, or similar) | | M2.2 | Assertion density | Assertions per test — strong vs weak assertions | | M2.3 | Test-to-code coupling | Tests break for the right reasons, not tied to implementation details | | M2.4 | False positive rate | Flaky test tracking, quarantine process, retry policy |
Source: Consumer-Driven Contracts (Pact), Schemathesis
| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M3.1 | Contract test coverage | Pact or consumer-driven contract tests for API boundaries | | M3.2 | Schema validation tests | OpenAPI/Zod/JSON Schema compliance tests | | M3.3 | API integration tests | Real HTTP calls testing (not mocked handlers) | | M3.4 | Backward compatibility tests | Breaking change detection in APIs |
Source: Chromatic, Percy, Playwright Visual Comparisons
| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M4.1 | Component tests | Storybook/Testing Library for isolated UI component tests | | M4.2 | Visual regression | Screenshot comparison integrated in CI | | M4.3 | Accessibility testing in tests | axe-core or similar a11y checks in test suite | | M4.4 | Cross-browser testing | Playwright/Cypress multi-browser configuration |
Source: k6, Artillery, Lighthouse CI
| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M5.1 | Load test existence | k6/Artillery/Locust/Gatling scripts present | | M5.2 | Performance budgets | Lighthouse CI thresholds, bundle size limits enforced | | M5.3 | Benchmark tests | Response time baselines with regression detection | | M5.4 | Stress & soak tests | Breaking point documented, memory leak detection |
Source: Test Data Management patterns, Factory pattern (fishery, factory_bot)
| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M6.1 | Test factories | Factory functions used (not inline object literals everywhere) | | M6.2 | Database isolation | Per-test cleanup via transactions, truncation, or containers | | M6.3 | Seed data management | Reproducible, versioned, environment-specific seeds | | M6.4 | Mock & stub quality | Mock factories, no over-mocking, contract-based mocks |
Source: Continuous Delivery (Humble & Farley), DORA Metrics
| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M7.1 | Test parallelization | Sharded/split test runs in CI | | M7.2 | Fail-fast strategy | Unit tests run first, E2E last in pipeline | | M7.3 | Test caching | Only re-run affected/changed tests | | M7.4 | Test reporting | JUnit XML output, coverage reports, trend tracking |
Source: xUnit Test Patterns (Gerard Meszaros)
| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M8.1 | Test file structure | Co-located with source or consistent mirror structure | | M8.2 | Test naming conventions | Descriptive, behavior-focused test names | | M8.3 | Shared test utilities | Custom helpers, matchers, fixtures, test builders | | M8.4 | Test documentation | Test plan, coverage requirements, testing guide present |
1. DISCOVER ─ Run discover.sh to scan the codebase
2. ANALYZE ─ 4 parallel agents score dimensions
3. SCORE ─ Aggregate into weighted scorecard
4. PRESCRIBE ─ Generate improvement plan with priorities
Run the discovery script to collect raw signals from the codebase:
bash ~/.claude/skills/test-rx/scripts/discover.sh "$PROJECT_ROOT"
This produces test-rx-discovery.json with counts, file lists, and pattern matches for all 32 sub-metrics.
Launch 4 agents in parallel, each covering 2 dimensions:
| Agent | Dimensions | Weight | |-------|-----------|--------| | Agent A | D1 (Pyramid Balance) + D2 (Effectiveness) | 30% | | Agent B | D3 (Contract/API) + D4 (UI/Visual) | 20% | | Agent C | D5 (Performance) + D6 (Data Management) | 25% | | Agent D | D7 (CI Integration) + D8 (Organization) | 25% |
Each agent:
test-rx-d{N}.jsonCombine all dimension scores into the final scorecard:
FINAL SCORE = SUM(dimension_score * dimension_weight)
Generate the scorecard in this format:
============================================================
TEST-RX DIAGNOSTIC SCORECARD
Project: {project_name}
Date: {date}
Final Score: {score}/100 — Grade: {grade}
============================================================
D1 Test Pyramid Balance ████████░░ {score}/10 (15%)
D2 Test Effectiveness ██████░░░░ {score}/10 (15%)
D3 Contract & API Testing ████░░░░░░ {score}/10 (10%)
D4 UI & Visual Testing ██░░░░░░░░ {score}/10 (10%)
D5 Performance & Load ███░░░░░░░ {score}/10 (10%)
D6 Test Data Management ████████░░ {score}/10 (15%)
D7 CI Integration ███████░░░ {score}/10 (15%)
D8 Test Organization ██████░░░░ {score}/10 (10%)
WEIGHTED TOTAL: {total}/100
Grade Scale:
90-100 A Exemplary testing strategy
80-89 B Strong with minor gaps
70-79 C Adequate, clear improvement areas
60-69 D Significant strategy gaps
<60 F Testing strategy needs overhaul
============================================================
After generating the scorecard and saving the report to docs/audits/:
docs/rx-plans/{this-skill-name}/{date}-report.mdrx-plan skill to create or update the improvement plan at docs/rx-plans/{this-skill-name}/{dimension}/v{N}-{date}-plan.mddocs/rx-plans/{this-skill-name}/summary.md with current scoresdocs/rx-plans/dashboard.md with overall progressThis happens automatically — the user does not need to run /rx-plan separately.
development
Prescriptive UX/UI evaluation producing scored opportunity maps for Next.js + shadcn/ui projects. Evaluates user experience against Nielsen Heuristics, WCAG 2.2, Core Web Vitals, Laws of UX, and Atomic Design. Use when: auditing UX quality, evaluating accessibility, reviewing component usage, identifying missing shadcn components, improving form UX, or when the user says "ux audit", "run ux-rx", "evaluate UX", "accessibility check", "improve user experience", "shadcn review", "how to reach A+ UX", or "UX opportunities". Measures 11 dimensions (44 sub-metrics). Fixed stack: Next.js App Router + shadcn/ui + Tailwind CSS. Leverages shadcn registry to recommend ready-to-use components. Outputs per-page scorecards with before/after Mermaid diagrams.
development
Code-level security posture evaluation. Scans for OWASP Top 10 vulnerabilities, authentication flaws, injection vectors, authorization gaps, and data protection issues. Complements arch-rx D9 (architectural security) by inspecting actual source code patterns, dependencies, and security configurations. Produces a scored report across 8 dimensions with 32 sub-metrics mapped to OWASP ASVS and CWE references.
testing
Generates versioned improvement plans from rx report results. Creates one plan per dimension that scores below A+ (97). Plans are saved to docs/rx-plans/{domain}/{dimension}/v{N}-{date}-plan.md. Use after running any rx skill, or when the user says "create plan from report", "rx plan", "plan improvements", "generate improvement plan", "what should I fix first", "create roadmap", "improvement plan", "plan from audit", or "next steps from rx".
testing
Executes rx improvement plans step by step with verification. Reads versioned plans from docs/rx-plans/{domain}/{dimension}/, implements each step, verifies acceptance criteria, then re-runs the rx skill to confirm score improvement. Auto-generates next version plan if target not reached. Use when the user says "execute rx plan", "implement improvements", "rx execute", "fix dimension", "improve score", or references a specific plan file.