Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

acardozzo/test-rx

Name: test-rx
Author: acardozzo

skills/test-rx/SKILL.md

npx skillsauth add acardozzo/rx-suite test-rx

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Prerequisites

Recommended: lighthouse, pa11y

Check all dependencies: bash scripts/rx-deps.sh or bash scripts/rx-deps.sh --install

test-rx: Testing Strategy Diagnostic

Purpose

Evaluate whether a codebase tests the right things at the right level. This is not about coverage percentages — it is about testing architecture, strategy completeness, and long-term maintainability.

Dimensions (8) and Sub-Metrics (32)

D1: Test Pyramid Balance (15%)

Source: Test Pyramid (Martin Fowler), Practical Test Pyramid (Ham Vocke)

| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M1.1 | Unit test ratio | % of tests that are true unit tests (no I/O, no network, no DB) | | M1.2 | Integration test coverage | API/DB integration tests present and covering critical paths | | M1.3 | E2E test coverage | Critical user journeys covered by end-to-end tests | | M1.4 | Pyramid shape | unit > integration > E2E count (not ice cream cone anti-pattern) |

D2: Test Effectiveness (15%)

Source: Mutation Testing (Pitest, Stryker), Google Testing Blog

| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M2.1 | Mutation score | % of mutants killed (via Stryker, Pitest, or similar) | | M2.2 | Assertion density | Assertions per test — strong vs weak assertions | | M2.3 | Test-to-code coupling | Tests break for the right reasons, not tied to implementation details | | M2.4 | False positive rate | Flaky test tracking, quarantine process, retry policy |

D3: Contract & API Testing (10%)

Source: Consumer-Driven Contracts (Pact), Schemathesis

| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M3.1 | Contract test coverage | Pact or consumer-driven contract tests for API boundaries | | M3.2 | Schema validation tests | OpenAPI/Zod/JSON Schema compliance tests | | M3.3 | API integration tests | Real HTTP calls testing (not mocked handlers) | | M3.4 | Backward compatibility tests | Breaking change detection in APIs |

D4: UI & Visual Testing (10%)

Source: Chromatic, Percy, Playwright Visual Comparisons

| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M4.1 | Component tests | Storybook/Testing Library for isolated UI component tests | | M4.2 | Visual regression | Screenshot comparison integrated in CI | | M4.3 | Accessibility testing in tests | axe-core or similar a11y checks in test suite | | M4.4 | Cross-browser testing | Playwright/Cypress multi-browser configuration |

D5: Performance & Load Testing (10%)

Source: k6, Artillery, Lighthouse CI

| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M5.1 | Load test existence | k6/Artillery/Locust/Gatling scripts present | | M5.2 | Performance budgets | Lighthouse CI thresholds, bundle size limits enforced | | M5.3 | Benchmark tests | Response time baselines with regression detection | | M5.4 | Stress & soak tests | Breaking point documented, memory leak detection |

D6: Test Data Management (15%)

Source: Test Data Management patterns, Factory pattern (fishery, factory_bot)

| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M6.1 | Test factories | Factory functions used (not inline object literals everywhere) | | M6.2 | Database isolation | Per-test cleanup via transactions, truncation, or containers | | M6.3 | Seed data management | Reproducible, versioned, environment-specific seeds | | M6.4 | Mock & stub quality | Mock factories, no over-mocking, contract-based mocks |

D7: CI Integration (15%)

Source: Continuous Delivery (Humble & Farley), DORA Metrics

| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M7.1 | Test parallelization | Sharded/split test runs in CI | | M7.2 | Fail-fast strategy | Unit tests run first, E2E last in pipeline | | M7.3 | Test caching | Only re-run affected/changed tests | | M7.4 | Test reporting | JUnit XML output, coverage reports, trend tracking |

D8: Test Organization & Maintainability (10%)

Source: xUnit Test Patterns (Gerard Meszaros)

| ID | Sub-Metric | What It Measures | |----|-----------|-----------------| | M8.1 | Test file structure | Co-located with source or consistent mirror structure | | M8.2 | Test naming conventions | Descriptive, behavior-focused test names | | M8.3 | Shared test utilities | Custom helpers, matchers, fixtures, test builders | | M8.4 | Test documentation | Test plan, coverage requirements, testing guide present |

Process Overview

1. DISCOVER  ─  Run discover.sh to scan the codebase
2. ANALYZE   ─  4 parallel agents score dimensions
3. SCORE     ─  Aggregate into weighted scorecard
4. PRESCRIBE ─  Generate improvement plan with priorities

Step 1: Discovery

Run the discovery script to collect raw signals from the codebase:

bash ~/.claude/skills/test-rx/scripts/discover.sh "$PROJECT_ROOT"

This produces test-rx-discovery.json with counts, file lists, and pattern matches for all 32 sub-metrics.

Step 2: Parallel Analysis (4 Agents)

Launch 4 agents in parallel, each covering 2 dimensions:

| Agent | Dimensions | Weight | |-------|-----------|--------| | Agent A | D1 (Pyramid Balance) + D2 (Effectiveness) | 30% | | Agent B | D3 (Contract/API) + D4 (UI/Visual) | 20% | | Agent C | D5 (Performance) + D6 (Data Management) | 25% | | Agent D | D7 (CI Integration) + D8 (Organization) | 25% |

Each agent:

Reads the discovery JSON
Reads relevant source and test files
Scores each sub-metric 0-10 using the grading framework
Writes findings to test-rx-d{N}.json

Step 3: Scorecard Aggregation

Combine all dimension scores into the final scorecard:

FINAL SCORE = SUM(dimension_score * dimension_weight)

Step 4: Output

Generate the scorecard in this format:

============================================================
  TEST-RX DIAGNOSTIC SCORECARD
  Project: {project_name}
  Date: {date}
  Final Score: {score}/100 — Grade: {grade}
============================================================

  D1  Test Pyramid Balance     ████████░░  {score}/10  (15%)
  D2  Test Effectiveness       ██████░░░░  {score}/10  (15%)
  D3  Contract & API Testing   ████░░░░░░  {score}/10  (10%)
  D4  UI & Visual Testing      ██░░░░░░░░  {score}/10  (10%)
  D5  Performance & Load       ███░░░░░░░  {score}/10  (10%)
  D6  Test Data Management     ████████░░  {score}/10  (15%)
  D7  CI Integration           ███████░░░  {score}/10  (15%)
  D8  Test Organization        ██████░░░░  {score}/10  (10%)

  WEIGHTED TOTAL: {total}/100

  Grade Scale:
    90-100  A  Exemplary testing strategy
    80-89   B  Strong with minor gaps
    70-79   C  Adequate, clear improvement areas
    60-69   D  Significant strategy gaps
    <60     F  Testing strategy needs overhaul
============================================================

Rules

Score only what exists. Do not give credit for intent or plans.
Weigh strategy over quantity. 50 good unit tests beat 500 trivial ones.
Penalize anti-patterns. Ice cream cone, over-mocking, snapshot-only testing, implementation-coupled tests.
Credit test infrastructure. Factories, custom matchers, CI integration are force multipliers.
Context matters. A CLI tool does not need visual regression tests. Adjust D4 expectations by project type.
Flag risks. No integration tests on a microservice is a critical risk regardless of unit coverage.
Be specific. Every finding must reference a concrete file, pattern, or configuration.
Improvement plans are mandatory. Every sub-metric scoring below 7 gets a concrete action item.

Auto-Plan Integration

After generating the scorecard and saving the report to docs/audits/:

Save a copy of the report to docs/rx-plans/{this-skill-name}/{date}-report.md
For each dimension scoring below 97, invoke the rx-plan skill to create or update the improvement plan at docs/rx-plans/{this-skill-name}/{dimension}/v{N}-{date}-plan.md
Update docs/rx-plans/{this-skill-name}/summary.md with current scores
Update docs/rx-plans/dashboard.md with overall progress

This happens automatically — the user does not need to run /rx-plan separately.

acardozzo/test-rx

skills/test-rx/SKILL.md

Evaluates testing strategy and completeness across 8 dimensions (32 sub-metrics): test pyramid balance, test effectiveness, contract/API testing, UI/visual testing, performance/load testing, test data management, CI integration, and test organization. Produces a scored diagnostic with actionable improvement plans.

development

Updated Mar 26, 2026

$ install --global

skillsauth

npx skillsauth add acardozzo/rx-suite test-rx

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

70%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 7, 2026, 1:56 PM213.0s1 file scanned

SKILL.md

name:: test-rx
description:: Evaluates testing strategy and completeness across 8 dimensions (32 sub-metrics): test pyramid balance, test effectiveness, contract/API testing, UI/visual testing, performance/load testing, test data management, CI integration, and test organization. Produces a scored diagnostic with actionable improvement plans.

Prerequisites

Recommended: lighthouse, pa11y

Check all dependencies: bash scripts/rx-deps.sh or bash scripts/rx-deps.sh --install

test-rx: Testing Strategy Diagnostic

Purpose

Dimensions (8) and Sub-Metrics (32)

D1: Test Pyramid Balance (15%)

Source: Test Pyramid (Martin Fowler), Practical Test Pyramid (Ham Vocke)

D2: Test Effectiveness (15%)

Source: Mutation Testing (Pitest, Stryker), Google Testing Blog

D3: Contract & API Testing (10%)

Source: Consumer-Driven Contracts (Pact), Schemathesis

D4: UI & Visual Testing (10%)

Source: Chromatic, Percy, Playwright Visual Comparisons

D5: Performance & Load Testing (10%)

Source: k6, Artillery, Lighthouse CI

D6: Test Data Management (15%)

Source: Test Data Management patterns, Factory pattern (fishery, factory_bot)

D7: CI Integration (15%)

Source: Continuous Delivery (Humble & Farley), DORA Metrics

D8: Test Organization & Maintainability (10%)

Source: xUnit Test Patterns (Gerard Meszaros)

Process Overview

1. DISCOVER  ─  Run discover.sh to scan the codebase
2. ANALYZE   ─  4 parallel agents score dimensions
3. SCORE     ─  Aggregate into weighted scorecard
4. PRESCRIBE ─  Generate improvement plan with priorities

Step 1: Discovery

Run the discovery script to collect raw signals from the codebase:

bash ~/.claude/skills/test-rx/scripts/discover.sh "$PROJECT_ROOT"

This produces test-rx-discovery.json with counts, file lists, and pattern matches for all 32 sub-metrics.

Step 2: Parallel Analysis (4 Agents)

Launch 4 agents in parallel, each covering 2 dimensions:

Each agent:

Reads the discovery JSON
Reads relevant source and test files
Scores each sub-metric 0-10 using the grading framework
Writes findings to test-rx-d{N}.json

Step 3: Scorecard Aggregation

Combine all dimension scores into the final scorecard:

FINAL SCORE = SUM(dimension_score * dimension_weight)

Step 4: Output

Generate the scorecard in this format:

============================================================
  TEST-RX DIAGNOSTIC SCORECARD
  Project: {project_name}
  Date: {date}
  Final Score: {score}/100 — Grade: {grade}
============================================================

  D1  Test Pyramid Balance     ████████░░  {score}/10  (15%)
  D2  Test Effectiveness       ██████░░░░  {score}/10  (15%)
  D3  Contract & API Testing   ████░░░░░░  {score}/10  (10%)
  D4  UI & Visual Testing      ██░░░░░░░░  {score}/10  (10%)
  D5  Performance & Load       ███░░░░░░░  {score}/10  (10%)
  D6  Test Data Management     ████████░░  {score}/10  (15%)
  D7  CI Integration           ███████░░░  {score}/10  (15%)
  D8  Test Organization        ██████░░░░  {score}/10  (10%)

  WEIGHTED TOTAL: {total}/100

  Grade Scale:
    90-100  A  Exemplary testing strategy
    80-89   B  Strong with minor gaps
    70-79   C  Adequate, clear improvement areas
    60-69   D  Significant strategy gaps
    <60     F  Testing strategy needs overhaul
============================================================

Rules

Score only what exists. Do not give credit for intent or plans.
Weigh strategy over quantity. 50 good unit tests beat 500 trivial ones.
Penalize anti-patterns. Ice cream cone, over-mocking, snapshot-only testing, implementation-coupled tests.
Credit test infrastructure. Factories, custom matchers, CI integration are force multipliers.
Context matters. A CLI tool does not need visual regression tests. Adjust D4 expectations by project type.
Flag risks. No integration tests on a microservice is a critical risk regardless of unit coverage.
Be specific. Every finding must reference a concrete file, pattern, or configuration.
Improvement plans are mandatory. Every sub-metric scoring below 7 gets a concrete action item.

Auto-Plan Integration

After generating the scorecard and saving the report to docs/audits/:

Save a copy of the report to docs/rx-plans/{this-skill-name}/{date}-report.md
For each dimension scoring below 97, invoke the rx-plan skill to create or update the improvement plan at docs/rx-plans/{this-skill-name}/{dimension}/v{N}-{date}-plan.md
Update docs/rx-plans/{this-skill-name}/summary.md with current scores
Update docs/rx-plans/dashboard.md with overall progress

This happens automatically — the user does not need to run /rx-plan separately.

Related Skills

acardozzo/ux-rx

development

VerifiedTrustedCommunity

Prescriptive UX/UI evaluation producing scored opportunity maps for Next.js + shadcn/ui projects. Evaluates user experience against Nielsen Heuristics, WCAG 2.2, Core Web Vitals, Laws of UX, and Atomic Design. Use when: auditing UX quality, evaluating accessibility, reviewing component usage, identifying missing shadcn components, improving form UX, or when the user says "ux audit", "run ux-rx", "evaluate UX", "accessibility check", "improve user experience", "shadcn review", "how to reach A+ UX", or "UX opportunities". Measures 11 dimensions (44 sub-metrics). Fixed stack: Next.js App Router + shadcn/ui + Tailwind CSS. Leverages shadcn registry to recommend ready-to-use components. Outputs per-page scorecards with before/after Mermaid diagrams.

SKILL.mdUpdated Mar 26, 2026

acardozzo/sec-rx

development

VerifiedTrustedCommunity

Code-level security posture evaluation. Scans for OWASP Top 10 vulnerabilities, authentication flaws, injection vectors, authorization gaps, and data protection issues. Complements arch-rx D9 (architectural security) by inspecting actual source code patterns, dependencies, and security configurations. Produces a scored report across 8 dimensions with 32 sub-metrics mapped to OWASP ASVS and CWE references.

SKILL.mdUpdated Mar 26, 2026

acardozzo/rx-plan

testing

VerifiedTrustedCommunity

Generates versioned improvement plans from rx report results. Creates one plan per dimension that scores below A+ (97). Plans are saved to docs/rx-plans/{domain}/{dimension}/v{N}-{date}-plan.md. Use after running any rx skill, or when the user says "create plan from report", "rx plan", "plan improvements", "generate improvement plan", "what should I fix first", "create roadmap", "improvement plan", "plan from audit", or "next steps from rx".

SKILL.mdUpdated Mar 26, 2026

acardozzo/rx-execute

testing

VerifiedTrustedCommunity

Executes rx improvement plans step by step with verification. Reads versioned plans from docs/rx-plans/{domain}/{dimension}/, implements each step, verifies acceptance criteria, then re-runs the rx skill to confirm score improvement. Auto-generates next version plan if target not reached. Use when the user says "execute rx plan", "implement improvements", "rx execute", "fix dimension", "improve score", or references a specific plan file.

SKILL.mdUpdated Mar 26, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/acardozzo/rx-suite.git

# Copy into Claude Code skills folder (global)
cp -r rx-suite/skills/test-rx ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

acardozzo/rx-suite

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT