skills/knowledge/testing/SKILL.md
Stack-agnostic testing principles, strategies, and patterns for building reliable test suites. Use when the user asks to design a test strategy, choose between unit/integration/e2e tests, apply TDD, fix flaky tests, improve test quality, use test doubles (mocks, stubs, fakes, spies), or review tests for antipatterns. Covers the testing pyramid, test design heuristics, arrange-act-assert structure, test isolation, property-based testing, and common testing pitfalls.
npx skillsauth add krzysztofsurdy/code-virtuoso testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A disciplined approach to verifying that software behaves correctly, remains stable under change, and communicates intent to future developers. Good tests act as living documentation, a safety net for refactoring, and a design feedback mechanism.
This skill covers universal testing concepts that apply regardless of language, framework, or tooling.
The testing pyramid describes the ideal distribution of tests across three levels. More tests at the base, fewer at the top.
/ E2E \ Few, slow, expensive
/----------\
/ Integration \ Moderate number, moderate speed
/----------------\
/ Unit Tests \ Many, fast, cheap
/____________________\
The inverted pyramid: many e2e tests, few unit tests. Symptoms:
Fix: Identify what each e2e test is actually verifying. Push that verification down to the lowest possible level. Most business logic can be tested at the unit level.
Every test should follow three distinct phases:
Keep each phase clearly separated. If Arrange dominates the test, extract a builder or factory. If Act requires multiple steps, you may be testing too much at once.
A test should verify one logical concept. This does not mean literally one assert call — asserting multiple properties of a single result is fine. What matters is that the test fails for exactly one reason.
// Good: one concept — "completed order has correct totals"
assert order.subtotal == 100
assert order.tax == 21
assert order.total == 121
// Bad: two unrelated concepts in one test
assert order.total == 121
assert emailService.wasCalled()
Test names should describe the behavior, not the implementation. A good test name answers: "What scenario is being tested, and what is the expected outcome?"
Patterns that work across languages:
should_return_zero_when_cart_is_emptyrejects_negative_quantitiesapplies_discount_for_premium_customersAvoid names like testCalculate, test1, or testGetterSetter.
Each test must be completely independent of every other test:
A test must produce the same result every time it runs, regardless of:
Non-deterministic tests (flaky tests) destroy trust in the test suite and are worse than no tests at all.
| Principle | Meaning | |---|---| | Fast | Tests should run in seconds, not minutes. Slow tests don't get run. | | Independent | No test relies on the output of another test. | | Repeatable | Same result in any environment — local, CI, staging. | | Self-validating | Pass or fail with no human interpretation required. | | Timely | Written at the right time — ideally before or alongside the production code. |
TDD is a design discipline where tests are written before production code, following a tight feedback loop.
Rules:
| Aspect | Chicago (Classical) | London (Mockist) | |---|---|---| | Verification | State-based | Interaction-based | | Direction | Inside-out | Outside-in | | Collaborators | Real objects | Mocks/stubs | | Strength | Refactoring-resilient tests | Drives interface design | | Risk | Complex setup for deep graphs | Tests coupled to implementation |
See TDD Schools reference for detailed comparison and guidance.
Test doubles replace real dependencies during testing. Each type serves a different purpose.
| Double | Purpose | Verifies? | |---|---|---| | Dummy | Fill parameter lists. Never actually used. | No | | Stub | Provide canned responses to method calls. | No | | Spy | Record interactions for later assertion. | Yes (after the fact) | | Mock | Pre-programmed with expectations. Fails if not called correctly. | Yes (inline) | | Fake | Simplified working implementation (e.g., in-memory repository). | No |
See Test Doubles reference for detailed guidance on when to use each type.
Use test doubles at architectural boundaries (ports, external services), not between internal collaborators. Mocking internal classes couples your tests to implementation details and makes refactoring painful.
Test behavior, not implementation. A good test describes what the system does, not how it does it internally.
Signs you are testing implementation:
Signs you are testing behavior:
Different architectural layers call for different testing approaches. See Testing Strategies reference for detailed guidance.
| Layer | Primary Test Type | Key Technique | |---|---|---| | Domain/Business Logic | Unit tests | State-based verification, no I/O | | Application Services | Unit + Integration | Test doubles for infrastructure ports | | Data Access | Integration | Real database (test containers, in-memory) | | API Endpoints | Integration + Contract | Request/response validation | | UI Components | Component tests | Interaction simulation | | Full System | E2E (selective) | Critical paths only |
| Antipattern | Symptoms | Fix | |---|---|---| | Brittle tests | Tests break on every refactor even when behavior is unchanged | Test behavior through public API, not internal structure | | Testing implementation | Asserting on method call order, private state, internal wiring | Assert on outputs and observable side effects | | Slow test suite | Test suite takes 10+ minutes; developers skip running tests | Push tests down the pyramid; use test doubles for I/O | | Flaky tests | Tests pass/fail randomly without code changes | Remove time dependencies, shared state, and ordering assumptions | | Excessive mocking | More mock setup than actual test logic; tests are unreadable | Use real collaborators where possible; mock only at boundaries | | Test data coupling | Tests share fixtures and break when shared data changes | Each test creates its own data; use builders/factories | | Missing error paths | Only happy path tested; failures discovered in production | Explicitly test error cases, edge cases, and boundary conditions | | Commented-out tests | Failing tests are disabled rather than fixed or deleted | Fix the test, or delete it if the behavior changed intentionally | | Giant test methods | Tests are 50+ lines with multiple acts and asserts | Split into focused tests; extract setup into helpers | | No assertion | Test executes code but never asserts anything | Every test must have at least one meaningful assertion |
Use this checklist when writing or reviewing tests:
development
Spawn and coordinate a pre-composed agent team from a team definition file. Reads team files from teams/, resolves agents and skills, picks the best spawning mode (peer or sequential), and runs the workflow. Use when the user asks to run a team, dispatch a development team, start a feature delivery, or coordinate multiple agents for a multi-phase task.
development
Pre-composed agent team library. Use when the user asks which teams are available, what a team does, when to pick one team over another, or to browse multi-agent compositions. Catalogs ready-to-run teams (development team, review squad, war room) with their purpose, agent roster, workflow type, and when to use each. The actual dispatching is handled by the dispatching-agent-teams skill.
tools
Ecosystem discovery advisor. Use when the user asks 'what skill should I use', 'what agent should I delegate to', 'which team fits this task', or when onboarding to available skills, agents, and teams. Scans ALL installed skills at runtime -- not limited to any single plugin or vendor. Triggers: 'which skill', 'which agent', 'what do I use for', 'orient me', 'what tools do I have'.
tools
Interactive tool to scaffold a complete Claude Code plugin -- plugin.json manifest, skills, agents, hooks, MCP servers, LSP servers, and an optional marketplace.json catalog entry. Use when the user asks to create a plugin, build a Claude Code plugin, scaffold a plugin marketplace, convert an existing .claude/ configuration into a plugin, or package skills and agents for distribution. Runs a guided questionnaire, writes all required files to disk, and prints test instructions.