skills/software-testing/SKILL.md
Use this skill when designing test strategies, writing tests beyond basic unit tests, verifying software for production readiness, or improving test coverage and reliability. Triggers when the user asks about testing strategy, integration tests, end-to-end tests, contract tests, property-based tests, load tests, chaos testing, test architecture, flaky tests, test confidence, 'how do I test this,' 'how do I know this is safe to deploy,' 'my tests are flaky,' 'what should I test,' 'test coverage,' CI/CD test pipelines, or any question about software verification and validation. Also triggers when the user is shipping a change and wants confidence it won't break production. Primarily targets TypeScript and Go but principles apply universally. Do NOT use for writing basic unit tests for simple functions — this skill is for the harder testing questions.
npx skillsauth add kylejryan/better-code software-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Tests exist for one reason: to give you justified confidence that your software does what you intend and won't break what already works when you change it.
The key word is "justified." Confidence without evidence is delusion. A green test suite that doesn't exercise real failure modes is theater. 100% code coverage where every test asserts expect(true).toBe(true) is worse than no tests — it creates false confidence.
The question is never "do we have enough tests?" The question is: "if this change introduced a bug, which test would catch it?" If you can't point to the specific test, you don't have coverage for that behavior — regardless of what the coverage percentage says.
Use this skill when:
| # | Category | Prefix | Impact | Description |
|---|----------|--------|--------|-------------|
| 1 | Testing Philosophy | philosophy | CRITICAL | Core principles: confidence over coverage, test what matters |
| 2 | Test Strategy | strategy | CRITICAL | Risk-driven testing, the right test at the right level |
| 3 | Unit Tests | unit | HIGH | Effective unit test design: structure, naming, table-driven patterns |
| 4 | Integration Tests | integration | CRITICAL | Testing real component interactions with real dependencies |
| 5 | Contract Tests | contract | HIGH | Verifying service boundary agreements |
| 6 | Advanced Test Types | advanced | HIGH | Property-based, load, chaos, and snapshot testing |
| 7 | Test Architecture | arch | HIGH | Test doubles, data management, flaky test discipline |
| 8 | CI Pipeline | pipeline | MEDIUM-HIGH | Pipeline design, coverage ratchets, deploy gates |
| 9 | Production Verification | prod | MEDIUM-HIGH | Canary deploys, feature flags, observability as testing |
Detailed patterns and examples are in references/. Each file follows the format:
{prefix}-{topic}.md
Access them when you need specific implementation patterns for a testing category.
The classic pyramid — many unit tests, fewer integration tests, even fewer E2E tests — was good advice when integration tests were slow and expensive. Modern tooling has changed the cost equation:
/\
/ \ E2E / Smoke tests (few, critical paths only)
/ \
/------\
/ \ Integration tests (many, real interactions)
/ \
/------------\
/ \ Focused unit tests (targeted, complex logic)
/ \
/------------------\
/ \ Static analysis + type system (zero runtime cost)
/ \
/------------------------\
The base is the type system and static analysis — not unit tests. A well-typed codebase eliminates entire categories of bugs with zero runtime cost.
The middle is integration tests — not unit tests. The bugs that reach production are usually "these two components don't agree on the contract," not "this function computes the wrong value."
Unit tests are for complex, branchy logic — algorithms, parsers, state machines, business rules with many code paths.
E2E tests are for critical path smoke tests — the 3-5 journeys that, if broken, mean the product is fundamentally non-functional.
Before shipping any change:
development
Use this skill when performing the actual vulnerability analysis AFTER a threat model has been established (see threat-model skill). Triggers when the user asks to find vulnerabilities, audit code for security, hunt for bugs, or perform security review of source code AND a threat model already exists or the codebase context is clear. This skill enforces depth-first, exploitability-proven analysis — it actively prevents the breadth-first pattern-matching that produces lists of theoretical vulnerabilities. Do NOT use without a threat model; use threat-model skill first. Do NOT use for general code quality review.
development
Staff+ engineering patterns for maximum leverage per line of code. Use this skill when designing abstractions, building reusable primitives, creating shared libraries, reducing code through architecture, reviewing code for leverage and reuse potential, choosing between building vs configuring, or establishing conventions and patterns across a codebase.
development
Use this skill when debugging software issues, performing root cause analysis, triaging errors from logs or alerts, or investigating why code isn't working as expected. Triggers when the user shares an error message, stack trace, log output, failing test, unexpected behavior, crash report, performance degradation, or says things like 'this isn't working,' 'I'm getting an error,' 'help me debug,' 'why is this failing,' 'something broke,' or 'I can't figure out what's wrong.' Also use when the user has been going back and forth trying fixes that aren't working — this is the signal to stop guessing and start systematically diagnosing. Do NOT use for writing new code from scratch, general code review, or feature development unless a bug is involved.
development
Principal-engineer / architect review loop driven by desired state and invariants rather than feature lists. Use this skill when scoping a new initiative, kicking off a feature or refactor, reviewing a design doc or PR for over-scope, cutting work that isn't paying for itself, deciding what to defer, or reviewing whether a system actually reaches the state it claims. Triggers on phrases like "what should we cut," "is this the right scope," "what are the invariants here," "are we over-engineering," "design review," "principal review," "architect review," "what must be true when this is done," or whenever the team is choosing between building more vs. building right.