Testing Principles

Language-Specific References

For language-specific testing patterns, also read:

TypeScript (Vitest/RTL/Playwright): references/typescript.md

Core Testing Philosophy

Tests are First-Class Code: Maintain test quality equal to production code
Fast Feedback: Tests should run quickly and provide immediate feedback
Reliability: Tests should be deterministic and reproducible
Independence: Each test should run in isolation

TDD Process [MANDATORY for all code changes]

Execute this process for every code change:

RED Phase

STEP 1: Write test that defines expected behavior STEP 2: Run test STEP 3: Confirm test FAILS (if it passes, the test is wrong)

GREEN Phase

STEP 1: Write MINIMAL code to make test pass STEP 2: Run test STEP 3: Confirm test PASSES

REFACTOR Phase

STEP 1: Improve code quality (eliminate duplication, improve naming) STEP 2: Run test STEP 3: Confirm test STILL PASSES

VERIFY Phase [MANDATORY - 0 ERRORS REQUIRED]

STEP 1: Execute ALL quality check commands for your language/project STEP 2: Fix any errors until ALL commands pass with 0 errors STEP 3: Confirm no regressions

ENFORCEMENT: Cannot proceed to next phase with ANY quality check failures

TDD Exceptions (no TDD required)

Pure configuration files
Documentation only
Emergency fixes (but add tests immediately after)
Exploratory spikes (discard or rewrite with tests before merging)
Build/deployment scripts (unless they contain business logic)

Quality Requirements [MANDATORY]

Coverage Standards

Minimum 80% code coverage for production code
Prioritize critical paths and business logic
Use coverage as a guide, not a goal

Test Characteristics

All tests MUST be:

Independent: No dependencies between tests
Reproducible: Same input always produces same output
Fast: Complete the full test suite within the project's accepted feedback window and flag suites that materially slow local iteration or CI
Self-checking: Clear pass/fail without manual verification
Timely: Written close to the code they test

ENFORCEMENT: Tests failing ANY characteristic MUST be fixed immediately

Test Types

Unit Tests

Purpose: Test individual components in isolation

Characteristics:

Test single function, method, or class
Fast execution (milliseconds)
No external dependencies
Mock external services
Majority of your test suite

Integration Tests

Purpose: Test interactions between components

Characteristics:

Test multiple components together
May include database, file system, or APIs
Slower than unit tests
Verify contracts between modules
Smaller portion of test suite

End-to-End (E2E) Tests

Purpose: Test complete workflows from user perspective

Characteristics:

Test entire application stack
Simulate real user interactions
Slowest test type
Fewest in number
Highest confidence level

Test Pyramid

Follow the test pyramid structure:

    /\    <- Few E2E Tests (High confidence, slow)
   /  \
  /    \  <- Some Integration Tests (Medium confidence, medium speed)
 /      \
/________\ <- Many Unit Tests (Fast, foundational)

Test Design Principles

AAA Pattern (Arrange-Act-Assert)

Structure every test in three clear phases:

// Arrange: Setup test data and conditions
user = createTestUser()
validator = createValidator()

// Act: Execute the code under test
result = validator.validate(user)

// Assert: Verify expected outcome
assert(result.isValid == true)

One Assertion Per Concept

Test one behavior per test case
Multiple assertions OK if testing single concept
Split unrelated assertions into separate tests — one test MUST verify one behavior

Descriptive Test Names

Test names should clearly describe:

What is being tested
Under what conditions
What the expected outcome is

Recommended format: "should [expected behavior] when [condition]"

Test Independence

Isolation Requirements

Each test creates its own test data
No dependencies on execution order
Clean up own state
Pass when run in isolation

Setup and Teardown

Use setup hooks to prepare test environment
Use teardown hooks to clean up resources
Keep setup scoped to the data, dependencies, and fixtures required for the behavior under test
Ensure teardown runs even if test fails

Mocking and Test Doubles

When to Use Mocks

Mock external dependencies: APIs, databases, file systems
Mock slow operations: Network calls, heavy computations
Mock unpredictable behavior: Random values, current time
Mock unavailable services: Third-party services

Mocking Principles [MANDATORY]

Mock at boundaries, not internally — use real implementations for internal utilities
Keep each mock limited to the behavior the test needs to control or observe
Verify mock expectations when relevant
Use adapters for external libraries/frameworks you do not control

Types of Test Doubles

Stub: Returns predetermined values
Mock: Verifies it was called correctly
Spy: Records information about calls
Fake: Simplified working implementation
Dummy: Passed but never used

Data Layer Testing

Mock Limitations for Data Access

Mocks validate call patterns but do not validate schema correctness, query correctness, or storage constraints. Examples of issues that mocks can miss:

schema drift
column or field mismatches
incorrect joins, filters, or aggregations
migration incompatibility

When Real Data Layer Verification Adds Value

Use real or production-like data access verification when testing:

repository or DAO implementations
ORM mappings
query builders or raw SQL
persistence behavior that depends on constraints or schema shape

Environment Options

Choose the most practical option for the project environment:

containerized database
dedicated test database
in-memory database with documented limitations
adapter-backed local test harness

Design Alignment

When a Design Doc includes Test Boundaries, follow it as the baseline for deciding which dependencies stay real and which boundaries are isolated.

Test Quality Practices [MANDATORY]

Keep Tests Active

Fix or delete failing tests: Resolve failures immediately
Remove commented-out tests: Fix them or delete entirely
Keep tests running: Broken tests lose value quickly
Maintain test suite: Refactor tests as needed

Test Code Quality

Apply same standards as production code
Use descriptive variable names
Extract test helpers to reduce duplication
Keep tests readable and maintainable

Test Helpers and Utilities

Create reusable test data builders
Extract common setup into helper functions
Build test utilities for complex scenarios
Share helpers across test files appropriately

What to Test

Focus on Behavior

Test observable behavior, not implementation:

Good: Test that function returns expected output
Good: Test that correct API endpoint is called
Bad: Test that internal variable was set
Bad: Test order of private method calls

Test Edge Cases

Always test:

Boundary conditions: Min/max values, empty collections
Error cases: Invalid input, null values, missing data
Edge cases: Special characters, extreme values
Happy path: Normal, expected usage

Test Quality Criteria [MANDATORY]

Literal expectations: Use hardcoded literal values in assertions — expected value ≠ mock return value (implementation processes data)
Result verification: Assert return values and state, not call order
Meaningful assertions: Every test MUST have at least one assertion — a test without assertions provides zero value
Mock external I/O only: Mock DB/API/filesystem, use real internal utilities
Boundary coverage: Include empty/zero/max/error cases with happy paths

ENFORCEMENT: Tests violating ANY criterion MUST be rewritten

Verification Requirements [MANDATORY for VERIFY phase]

Before Commit Checklist

☐ All tests pass ☐ No tests skipped or commented ☐ No debug code left in tests ☐ Test coverage meets standards (≥ 80%) ☐ Tests run in reasonable time

Zero Tolerance Policy

Zero failing tests: Fix immediately
Zero skipped tests: Delete or fix
Zero flaky tests: Make deterministic
Zero slow tests: Optimize or split

ENFORCEMENT: Cannot proceed with task completion if ANY quality check fails

Test Organization

File Structure

Mirror production structure: Tests follow code organization
Clear naming conventions: Follow project's test file patterns
Logical grouping: Group related tests together
Separate test types: Unit, integration, e2e in separate directories

Test Suite Organization

tests/
├── unit/           # Fast, isolated unit tests
├── integration/    # Integration tests
├── e2e/            # End-to-end tests
├── fixtures/       # Test data and fixtures
└── helpers/        # Shared test utilities

Performance Considerations

Test Speed

Unit tests: < 100ms each
Integration tests: < 1s each
Full suite: Should run frequently (< 10 minutes)

Common Anti-Patterns

Detect and eliminate these patterns immediately:

Tests that test nothing (always pass)
Tests that depend on execution order
Tests that depend on external state
Tests with complex logic (tests that need their own tests)
Testing implementation details instead of observable behavior
Excessive mocking (mock boundaries only, use real internals)
Test code duplication

Flaky Tests

Eliminate tests that fail intermittently:

Remove timing dependencies
Use deterministic data instead of random values
Ensure proper cleanup
Fix race conditions
Make all tests deterministic

Regression Testing

Add test for every bug fix
Maintain comprehensive test suite
Run full suite regularly
Delete a test only when the covered behavior no longer exists or the same behavior is covered by a stronger test at the correct level

Legacy Code

Add characterization tests before refactoring
Test existing behavior first
Gradually improve coverage
Refactor with confidence

Testing Principles

Language-Specific References

For language-specific testing patterns, also read:

TypeScript (Vitest/RTL/Playwright): references/typescript.md

Core Testing Philosophy

Tests are First-Class Code: Maintain test quality equal to production code
Fast Feedback: Tests should run quickly and provide immediate feedback
Reliability: Tests should be deterministic and reproducible
Independence: Each test should run in isolation

TDD Process [MANDATORY for all code changes]

Execute this process for every code change:

RED Phase

STEP 1: Write test that defines expected behavior STEP 2: Run test STEP 3: Confirm test FAILS (if it passes, the test is wrong)

GREEN Phase

STEP 1: Write MINIMAL code to make test pass STEP 2: Run test STEP 3: Confirm test PASSES

REFACTOR Phase

STEP 1: Improve code quality (eliminate duplication, improve naming) STEP 2: Run test STEP 3: Confirm test STILL PASSES

VERIFY Phase [MANDATORY - 0 ERRORS REQUIRED]

STEP 1: Execute ALL quality check commands for your language/project STEP 2: Fix any errors until ALL commands pass with 0 errors STEP 3: Confirm no regressions

ENFORCEMENT: Cannot proceed to next phase with ANY quality check failures

TDD Exceptions (no TDD required)

Pure configuration files
Documentation only
Emergency fixes (but add tests immediately after)
Exploratory spikes (discard or rewrite with tests before merging)
Build/deployment scripts (unless they contain business logic)

Quality Requirements [MANDATORY]

Coverage Standards

Minimum 80% code coverage for production code
Prioritize critical paths and business logic
Use coverage as a guide, not a goal

Test Characteristics

All tests MUST be:

Independent: No dependencies between tests
Reproducible: Same input always produces same output
Fast: Complete the full test suite within the project's accepted feedback window and flag suites that materially slow local iteration or CI
Self-checking: Clear pass/fail without manual verification
Timely: Written close to the code they test

ENFORCEMENT: Tests failing ANY characteristic MUST be fixed immediately

Test Types

Unit Tests

Purpose: Test individual components in isolation

Characteristics:

Test single function, method, or class
Fast execution (milliseconds)
No external dependencies
Mock external services
Majority of your test suite

Integration Tests

Purpose: Test interactions between components

Characteristics:

Test multiple components together
May include database, file system, or APIs
Slower than unit tests
Verify contracts between modules
Smaller portion of test suite

End-to-End (E2E) Tests

Purpose: Test complete workflows from user perspective

Characteristics:

Test entire application stack
Simulate real user interactions
Slowest test type
Fewest in number
Highest confidence level

Test Pyramid

Follow the test pyramid structure:

    /\    <- Few E2E Tests (High confidence, slow)
   /  \
  /    \  <- Some Integration Tests (Medium confidence, medium speed)
 /      \
/________\ <- Many Unit Tests (Fast, foundational)

Test Design Principles

AAA Pattern (Arrange-Act-Assert)

Structure every test in three clear phases:

// Arrange: Setup test data and conditions
user = createTestUser()
validator = createValidator()

// Act: Execute the code under test
result = validator.validate(user)

// Assert: Verify expected outcome
assert(result.isValid == true)

One Assertion Per Concept

Test one behavior per test case
Multiple assertions OK if testing single concept
Split unrelated assertions into separate tests — one test MUST verify one behavior

Descriptive Test Names

Test names should clearly describe:

What is being tested
Under what conditions
What the expected outcome is

Recommended format: "should [expected behavior] when [condition]"

Test Independence

Isolation Requirements

Each test creates its own test data
No dependencies on execution order
Clean up own state
Pass when run in isolation

Setup and Teardown

Use setup hooks to prepare test environment
Use teardown hooks to clean up resources
Keep setup scoped to the data, dependencies, and fixtures required for the behavior under test
Ensure teardown runs even if test fails

Mocking and Test Doubles

When to Use Mocks

Mock external dependencies: APIs, databases, file systems
Mock slow operations: Network calls, heavy computations
Mock unpredictable behavior: Random values, current time
Mock unavailable services: Third-party services

Mocking Principles [MANDATORY]

Mock at boundaries, not internally — use real implementations for internal utilities
Keep each mock limited to the behavior the test needs to control or observe
Verify mock expectations when relevant
Use adapters for external libraries/frameworks you do not control

Types of Test Doubles

Stub: Returns predetermined values
Mock: Verifies it was called correctly
Spy: Records information about calls
Fake: Simplified working implementation
Dummy: Passed but never used

Data Layer Testing

Mock Limitations for Data Access

Mocks validate call patterns but do not validate schema correctness, query correctness, or storage constraints. Examples of issues that mocks can miss:

schema drift
column or field mismatches
incorrect joins, filters, or aggregations
migration incompatibility

When Real Data Layer Verification Adds Value

Use real or production-like data access verification when testing:

repository or DAO implementations
ORM mappings
query builders or raw SQL
persistence behavior that depends on constraints or schema shape

Environment Options

Choose the most practical option for the project environment:

containerized database
dedicated test database
in-memory database with documented limitations
adapter-backed local test harness

Design Alignment

When a Design Doc includes Test Boundaries, follow it as the baseline for deciding which dependencies stay real and which boundaries are isolated.

Test Quality Practices [MANDATORY]

Keep Tests Active

Fix or delete failing tests: Resolve failures immediately
Remove commented-out tests: Fix them or delete entirely
Keep tests running: Broken tests lose value quickly
Maintain test suite: Refactor tests as needed

Test Code Quality

Apply same standards as production code
Use descriptive variable names
Extract test helpers to reduce duplication
Keep tests readable and maintainable

Test Helpers and Utilities

Create reusable test data builders
Extract common setup into helper functions
Build test utilities for complex scenarios
Share helpers across test files appropriately

What to Test

Focus on Behavior

Test observable behavior, not implementation:

Good: Test that function returns expected output
Good: Test that correct API endpoint is called
Bad: Test that internal variable was set
Bad: Test order of private method calls

Test Edge Cases

Always test:

Boundary conditions: Min/max values, empty collections
Error cases: Invalid input, null values, missing data
Edge cases: Special characters, extreme values
Happy path: Normal, expected usage

Test Quality Criteria [MANDATORY]

Literal expectations: Use hardcoded literal values in assertions — expected value ≠ mock return value (implementation processes data)
Result verification: Assert return values and state, not call order
Meaningful assertions: Every test MUST have at least one assertion — a test without assertions provides zero value
Mock external I/O only: Mock DB/API/filesystem, use real internal utilities
Boundary coverage: Include empty/zero/max/error cases with happy paths

ENFORCEMENT: Tests violating ANY criterion MUST be rewritten

Verification Requirements [MANDATORY for VERIFY phase]

Before Commit Checklist

☐ All tests pass ☐ No tests skipped or commented ☐ No debug code left in tests ☐ Test coverage meets standards (≥ 80%) ☐ Tests run in reasonable time

Zero Tolerance Policy

Zero failing tests: Fix immediately
Zero skipped tests: Delete or fix
Zero flaky tests: Make deterministic
Zero slow tests: Optimize or split

ENFORCEMENT: Cannot proceed with task completion if ANY quality check fails

Test Organization

File Structure

Mirror production structure: Tests follow code organization
Clear naming conventions: Follow project's test file patterns
Logical grouping: Group related tests together
Separate test types: Unit, integration, e2e in separate directories

Test Suite Organization

tests/
├── unit/           # Fast, isolated unit tests
├── integration/    # Integration tests
├── e2e/            # End-to-end tests
├── fixtures/       # Test data and fixtures
└── helpers/        # Shared test utilities

Performance Considerations

Test Speed

Unit tests: < 100ms each
Integration tests: < 1s each
Full suite: Should run frequently (< 10 minutes)

Common Anti-Patterns

Detect and eliminate these patterns immediately:

Tests that test nothing (always pass)
Tests that depend on execution order
Tests that depend on external state
Tests with complex logic (tests that need their own tests)
Testing implementation details instead of observable behavior
Excessive mocking (mock boundaries only, use real internals)
Test code duplication

Flaky Tests

Eliminate tests that fail intermittently:

Remove timing dependencies
Use deterministic data instead of random values
Ensure proper cleanup
Fix race conditions
Make all tests deterministic

Regression Testing

Add test for every bug fix
Maintain comprehensive test suite
Run full suite regularly
Delete a test only when the covered behavior no longer exists or the same behavior is covered by a stronger test at the correct level

Legacy Code

Add characterization tests before refactoring
Test existing behavior first
Gradually improve coverage
Refactor with confidence

Adoption

yangshu2087/testing

$ install --global

Security Scan Results

SKILL.md

Testing Principles

Language-Specific References

Core Testing Philosophy

TDD Process [MANDATORY for all code changes]

RED Phase

GREEN Phase

REFACTOR Phase

VERIFY Phase [MANDATORY - 0 ERRORS REQUIRED]

TDD Exceptions (no TDD required)

Quality Requirements [MANDATORY]

Coverage Standards

Test Characteristics

Test Types

Unit Tests

Integration Tests

End-to-End (E2E) Tests

Test Pyramid

Test Design Principles

AAA Pattern (Arrange-Act-Assert)

One Assertion Per Concept

Descriptive Test Names

Test Independence

Isolation Requirements

Setup and Teardown

Mocking and Test Doubles

When to Use Mocks

Mocking Principles [MANDATORY]

Types of Test Doubles

Data Layer Testing

Mock Limitations for Data Access

When Real Data Layer Verification Adds Value

Environment Options

Design Alignment

Test Quality Practices [MANDATORY]

Keep Tests Active

Test Code Quality

Test Helpers and Utilities

What to Test

Focus on Behavior

Test Edge Cases

Test Quality Criteria [MANDATORY]

Verification Requirements [MANDATORY for VERIFY phase]

Before Commit Checklist

Zero Tolerance Policy

Test Organization

File Structure

Test Suite Organization

Performance Considerations

Test Speed

Common Anti-Patterns

Flaky Tests

Regression Testing

Legacy Code

Related Skills

yangshu2087/terrashark

yangshu2087/stitch-ui-prompt-architect

yangshu2087/stitch-react-components

yangshu2087/stitch-nextjs-components

yangshu2087/testing

$ install --global

Security Scan Results

SKILL.md

Testing Principles

Language-Specific References

Core Testing Philosophy

TDD Process [MANDATORY for all code changes]

RED Phase

GREEN Phase

REFACTOR Phase

VERIFY Phase [MANDATORY - 0 ERRORS REQUIRED]

TDD Exceptions (no TDD required)

Quality Requirements [MANDATORY]

Coverage Standards

Test Characteristics

Test Types