Writing Tests

Core Philosophy: Test user-observable behavior with real dependencies. Tests should survive refactoring when behavior is unchanged.

Iron Laws:

<IMPORTANT> 1. Test real behavior, not mock behavior 2. Never add test-only methods to production code 3. Never mock without understanding dependencies </IMPORTANT>

Testing Trophy Model

Write tests in this priority order:

Integration Tests (PRIMARY) - Multiple units with real dependencies
E2E Tests (SECONDARY) - Complete workflows across the stack
Unit Tests (RARE) - Pure functions only (no dependencies)

Default to integration tests. Only drop to unit tests for pure utility functions.

Pre-Test Workflow

BEFORE writing any tests, copy this checklist and track your progress:

Test Writing Progress:
- [ ] Step 1: Review project standards (check existing tests)
- [ ] Step 2: Understand behavior (what should it do? what can fail?)
- [ ] Step 3: Choose test type (Integration/E2E/Unit)
- [ ] Step 4: Identify dependencies (real vs mocked)
- [ ] Step 5: Write failing test first (TDD)
- [ ] Step 6: Implement minimal code to pass
- [ ] Step 7: Verify coverage (happy path, errors, edge cases)

Before writing any tests:

Review project standards - Check existing test files, testing docs, or project conventions
Understand behavior - What should this do? What can go wrong?
Choose test type - Integration (default), E2E (critical workflows), or Unit (pure functions)
Identify dependencies - What needs to be real vs mocked?

Test Type Decision

Is this a complete user workflow?
  → YES: E2E test

Is this a pure function (no side effects/dependencies)?
  → YES: Unit test

Everything else:
  → Integration test (with real dependencies)

Mocking Guidelines

Default: Don't mock. Use real dependencies.

Only Mock These

External HTTP/API calls
Time-dependent operations (timers, dates)
Randomness (random numbers, UUIDs)
File system I/O
Third-party services (payments, analytics, email)
Network boundaries

Never Mock These

Internal modules/packages
Database queries (use test database)
Business logic
Data transformations
Your own code calling your own code

Why: Mocking internal dependencies creates brittle tests that break during refactoring.

Before Mocking, Ask:

"What side effects does this method have?"
"Does my test depend on those side effects?"
If yes → Mock at lower level (the slow/external operation, not the method test needs)
Unsure? → Run with real implementation first, observe what's needed, THEN add minimal mocking

Mock Red Flags

"I'll mock this to be safe"
"This might be slow, better mock it"
Can't explain why mock is needed
Mock setup longer than test logic
Test fails when removing mock

Integration Test Pattern

describe("Feature Name", () => {
  setup(initialState)

  test("should produce expected output when action is performed", () => {
    // Arrange: Set up preconditions
    // Act: Perform the action being tested
    // Assert: Verify observable output
  })
})

Key principles:

Use real state/data, not mocks
Assert on outputs users/callers can observe
Test the behavior, not the implementation

For language-specific patterns, see the Language-Specific Patterns section.

Async Waiting Patterns

When tests involve async operations, avoid arbitrary timeouts:

// BAD: Guessing at timing
sleep(500)
assert result == expected

// GOOD: Wait for the actual condition
wait_for(lambda: result == expected)

When to use condition-based waiting:

Tests use sleep, setTimeout, or arbitrary delays
Tests are flaky (pass locally, fail in CI)
Tests timeout when run in parallel
Waiting for async operations to complete

Delegate to skill: When you encounter these patterns, invoke Skill(ce:condition-based-waiting) for detailed guidance on implementing proper condition polling and fixing flaky tests.

Assertion Strategy

The Golden Rule of Assertions: A test must fail if, and only if, the intention behind the system is not met.

This rule is bidirectional. A test is broken when it:

Doesn't fail when the intention is actually broken
Does fail when the intention is perfectly fine (implementation detail changed, external service unreachable, etc.)

Before merging any test, ask: "When will this test fail?" If the answer includes anything other than "when the behavior this test describes is broken," the test needs work.

Assert on observable outputs, not internal state

| Context | Assert On | Avoid | | ------- | ----------------------------------------------------- | ------------------------------------- | | UI | Visible text, accessibility roles, user-visible state | CSS classes, internal state, test IDs | | API | Response body, status code, headers | Internal DB state directly | | CLI | stdout/stderr, exit code | Internal variables | | Library | Return values, documented side effects | Private methods, internal state |

Why: Tests that assert on implementation details break when you refactor, even if behavior is unchanged.

Respect test boundaries

A test that makes a real network request doesn't just test your function -- it tests DNS resolution, network connectivity, server uptime, and response timing. When any of those fail, your test fails even though your code is fine. That violates the Golden Rule.

Fix: Mock at the boundary of what you don't own or control. The function under test isn't responsible for the server's validity -- it's responsible for making the right request and handling the response correctly. Use API mocking (e.g., MSW, respx, httpmock) to make external interactions fixed, predictable givens.

This is not a contradiction with "default: don't mock." Internal modules stay real. External boundaries (network, third-party services) get mocked so your test only fails when your code's intention is broken.

Test Data Management

Use source constants and fixtures, not hard-coded values:

// Good - References actual constant or fixture
expected_message = APP_MESSAGES.SUCCESS
assert response.message == expected_message

// Bad - Hard-coded, breaks when copy changes
assert response.message == "Action completed successfully!"

Why: When product copy changes, you want one place to update, not every test file.

Anti-Patterns to Avoid

Testing Mock Behavior

// BAD: Testing that the mock was called, not real behavior
mock_service.assert_called_once()

// GOOD: Test the actual outcome
assert user.is_active == True
assert len(sent_emails) == 1

Gate: Before asserting on mock calls, ask "Am I testing real behavior or mock interactions?" If testing mocks → Stop, test the actual outcome instead.

Test-Only Methods in Production

// BAD: destroy() only used in tests - pollutes production code
class Session:
    def destroy(self):  # Only exists for test cleanup
        ...

// GOOD: Test utilities handle cleanup
# In test_utils.py
def cleanup_session(session):
    # Access internals here, not in production code
    ...

Gate: Before adding methods to production code, ask "Is this only for tests?" Yes → Put in test utilities.

Mocking Without Understanding

// BAD: Mock prevents side effect test actually needs
mock(database.save)  # Now duplicate detection won't work!

add_item(item)
add_item(item)  # Should fail as duplicate, but won't

// GOOD: Mock at correct level
mock(external_api.validate)  # Mock slow external call only

add_item(item)  # DB save works, duplicate detected
add_item(item)  # Fails correctly

Incomplete Mocks

// BAD: Partial mock - missing fields downstream code needs
mock_response = {
    status: "success",
    data: {...}
    // Missing: metadata.request_id that downstream code uses
}

// GOOD: Mirror real API completely
mock_response = {
    status: "success",
    data: {...},
    metadata: {request_id: "...", timestamp: ...}
}

Gate: Before creating mocks, check "What does the real thing return?" Include ALL fields.

TDD Prevents Anti-Patterns

Write test first → Think about what you're testing (not mocks)
Watch it fail → Confirms test tests real behavior
Minimal implementation → No test-only methods creep in
Real dependencies first → See what test needs before mocking

If testing mock behavior, you violated TDD - you added mocks without watching test fail against real code.

Language-Specific Patterns

For detailed framework and language-specific patterns:

JavaScript/React: See references/javascript-react.md for React Testing Library queries, Jest/Vitest setup, Playwright E2E, and component testing patterns
Python: See references/python.md for pytest fixtures, polyfactory, respx mocking, testcontainers, and FastAPI testing
Go: See references/go.md for table-driven tests, testify/go-cmp assertions, testcontainers-go, and interface fakes

Quality Checklist

Before completing tests, verify:

[ ] Happy path covered
[ ] Error conditions handled
[ ] Edge cases considered
[ ] Real dependencies used (minimal mocking)
[ ] Async waiting uses conditions, not arbitrary timeouts
[ ] Golden Rule: test fails if and only if the intention is broken
[ ] Tests survive refactoring (no implementation details)
[ ] No test-only methods added to production code
[ ] No assertions on mock existence or call counts
[ ] Test names describe behavior, not implementation

What NOT to Test

Internal state
Private methods
Function call counts
Implementation details
Mock existence
Framework internals

Test behavior users/callers observe, not code structure.

Quick Reference

| Test Type | When | Dependencies | | ----------- | ----------------------- | ---------------------------- | | Integration | Default choice | Real (test DB, real modules) | | E2E | Critical user workflows | Real (full stack) | | Unit | Pure functions only | None |

| Anti-Pattern | Fix | | ------------------------------- | --------------------------------------- | | Testing mock existence | Test actual outcome instead | | Test-only methods in production | Move to test utilities | | Mocking without understanding | Understand dependencies, mock minimally | | Incomplete mocks | Mirror real API completely | | Tests as afterthought | TDD - write tests first | | Arbitrary timeouts/sleeps | Use condition-based waiting |

<IMPORTANT> **Remember:** Behavior over implementation. Real over mocked. Outputs over internals. </IMPORTANT>

Writing Tests

Core Philosophy: Test user-observable behavior with real dependencies. Tests should survive refactoring when behavior is unchanged.

Iron Laws:

<IMPORTANT> 1. Test real behavior, not mock behavior 2. Never add test-only methods to production code 3. Never mock without understanding dependencies </IMPORTANT>

Testing Trophy Model

Write tests in this priority order:

Integration Tests (PRIMARY) - Multiple units with real dependencies
E2E Tests (SECONDARY) - Complete workflows across the stack
Unit Tests (RARE) - Pure functions only (no dependencies)

Default to integration tests. Only drop to unit tests for pure utility functions.

Pre-Test Workflow

BEFORE writing any tests, copy this checklist and track your progress:

Test Writing Progress:
- [ ] Step 1: Review project standards (check existing tests)
- [ ] Step 2: Understand behavior (what should it do? what can fail?)
- [ ] Step 3: Choose test type (Integration/E2E/Unit)
- [ ] Step 4: Identify dependencies (real vs mocked)
- [ ] Step 5: Write failing test first (TDD)
- [ ] Step 6: Implement minimal code to pass
- [ ] Step 7: Verify coverage (happy path, errors, edge cases)

Before writing any tests:

Review project standards - Check existing test files, testing docs, or project conventions
Understand behavior - What should this do? What can go wrong?
Choose test type - Integration (default), E2E (critical workflows), or Unit (pure functions)
Identify dependencies - What needs to be real vs mocked?

Test Type Decision

Is this a complete user workflow?
  → YES: E2E test

Is this a pure function (no side effects/dependencies)?
  → YES: Unit test

Everything else:
  → Integration test (with real dependencies)

Mocking Guidelines

Default: Don't mock. Use real dependencies.

Only Mock These

External HTTP/API calls
Time-dependent operations (timers, dates)
Randomness (random numbers, UUIDs)
File system I/O
Third-party services (payments, analytics, email)
Network boundaries

Never Mock These

Internal modules/packages
Database queries (use test database)
Business logic
Data transformations
Your own code calling your own code

Why: Mocking internal dependencies creates brittle tests that break during refactoring.

Before Mocking, Ask:

"What side effects does this method have?"
"Does my test depend on those side effects?"
If yes → Mock at lower level (the slow/external operation, not the method test needs)
Unsure? → Run with real implementation first, observe what's needed, THEN add minimal mocking

Mock Red Flags

"I'll mock this to be safe"
"This might be slow, better mock it"
Can't explain why mock is needed
Mock setup longer than test logic
Test fails when removing mock

Integration Test Pattern

describe("Feature Name", () => {
  setup(initialState)

  test("should produce expected output when action is performed", () => {
    // Arrange: Set up preconditions
    // Act: Perform the action being tested
    // Assert: Verify observable output
  })
})

Key principles:

Use real state/data, not mocks
Assert on outputs users/callers can observe
Test the behavior, not the implementation

For language-specific patterns, see the Language-Specific Patterns section.

Async Waiting Patterns

When tests involve async operations, avoid arbitrary timeouts:

// BAD: Guessing at timing
sleep(500)
assert result == expected

// GOOD: Wait for the actual condition
wait_for(lambda: result == expected)

When to use condition-based waiting:

Tests use sleep, setTimeout, or arbitrary delays
Tests are flaky (pass locally, fail in CI)
Tests timeout when run in parallel
Waiting for async operations to complete

Delegate to skill: When you encounter these patterns, invoke Skill(ce:condition-based-waiting) for detailed guidance on implementing proper condition polling and fixing flaky tests.

Assertion Strategy

The Golden Rule of Assertions: A test must fail if, and only if, the intention behind the system is not met.

This rule is bidirectional. A test is broken when it:

Doesn't fail when the intention is actually broken
Does fail when the intention is perfectly fine (implementation detail changed, external service unreachable, etc.)

Before merging any test, ask: "When will this test fail?" If the answer includes anything other than "when the behavior this test describes is broken," the test needs work.

Assert on observable outputs, not internal state

Why: Tests that assert on implementation details break when you refactor, even if behavior is unchanged.

Respect test boundaries

Test Data Management

Use source constants and fixtures, not hard-coded values:

// Good - References actual constant or fixture
expected_message = APP_MESSAGES.SUCCESS
assert response.message == expected_message

// Bad - Hard-coded, breaks when copy changes
assert response.message == "Action completed successfully!"

Why: When product copy changes, you want one place to update, not every test file.

Anti-Patterns to Avoid

Testing Mock Behavior

// BAD: Testing that the mock was called, not real behavior
mock_service.assert_called_once()

// GOOD: Test the actual outcome
assert user.is_active == True
assert len(sent_emails) == 1

Gate: Before asserting on mock calls, ask "Am I testing real behavior or mock interactions?" If testing mocks → Stop, test the actual outcome instead.

Test-Only Methods in Production

// BAD: destroy() only used in tests - pollutes production code
class Session:
    def destroy(self):  # Only exists for test cleanup
        ...

// GOOD: Test utilities handle cleanup
# In test_utils.py
def cleanup_session(session):
    # Access internals here, not in production code
    ...

Gate: Before adding methods to production code, ask "Is this only for tests?" Yes → Put in test utilities.

Mocking Without Understanding

// BAD: Mock prevents side effect test actually needs
mock(database.save)  # Now duplicate detection won't work!

add_item(item)
add_item(item)  # Should fail as duplicate, but won't

// GOOD: Mock at correct level
mock(external_api.validate)  # Mock slow external call only

add_item(item)  # DB save works, duplicate detected
add_item(item)  # Fails correctly

Incomplete Mocks

// BAD: Partial mock - missing fields downstream code needs
mock_response = {
    status: "success",
    data: {...}
    // Missing: metadata.request_id that downstream code uses
}

// GOOD: Mirror real API completely
mock_response = {
    status: "success",
    data: {...},
    metadata: {request_id: "...", timestamp: ...}
}

Gate: Before creating mocks, check "What does the real thing return?" Include ALL fields.

TDD Prevents Anti-Patterns

Write test first → Think about what you're testing (not mocks)
Watch it fail → Confirms test tests real behavior
Minimal implementation → No test-only methods creep in
Real dependencies first → See what test needs before mocking

If testing mock behavior, you violated TDD - you added mocks without watching test fail against real code.

Language-Specific Patterns

For detailed framework and language-specific patterns:

JavaScript/React: See references/javascript-react.md for React Testing Library queries, Jest/Vitest setup, Playwright E2E, and component testing patterns
Python: See references/python.md for pytest fixtures, polyfactory, respx mocking, testcontainers, and FastAPI testing
Go: See references/go.md for table-driven tests, testify/go-cmp assertions, testcontainers-go, and interface fakes

Quality Checklist

Before completing tests, verify:

[ ] Happy path covered
[ ] Error conditions handled
[ ] Edge cases considered
[ ] Real dependencies used (minimal mocking)
[ ] Async waiting uses conditions, not arbitrary timeouts
[ ] Golden Rule: test fails if and only if the intention is broken
[ ] Tests survive refactoring (no implementation details)
[ ] No test-only methods added to production code
[ ] No assertions on mock existence or call counts
[ ] Test names describe behavior, not implementation

What NOT to Test

Internal state
Private methods
Function call counts
Implementation details
Mock existence
Framework internals

Test behavior users/callers observe, not code structure.

Quick Reference

<IMPORTANT> **Remember:** Behavior over implementation. Real over mocked. Outputs over internals. </IMPORTANT>

Adoption

third774/writing-tests

$ install --global

Security Scan Results

SKILL.md

Writing Tests

Testing Trophy Model

Pre-Test Workflow

Test Type Decision

Mocking Guidelines

Only Mock These

Never Mock These

Before Mocking, Ask:

Mock Red Flags

Integration Test Pattern

Async Waiting Patterns

Assertion Strategy

Assert on observable outputs, not internal state

Respect test boundaries

Test Data Management

Anti-Patterns to Avoid

Testing Mock Behavior

Test-Only Methods in Production

Mocking Without Understanding

Incomplete Mocks

TDD Prevents Anti-Patterns

Language-Specific Patterns

Quality Checklist

What NOT to Test

Quick Reference

Related Skills

third774/youtube-captions

third774/xcodebuildmcp

third774/visualizing-with-mermaid

third774/verification-before-completion

third774/writing-tests

$ install --global

Security Scan Results

SKILL.md

Writing Tests

Testing Trophy Model

Pre-Test Workflow

Test Type Decision

Mocking Guidelines

Only Mock These

Never Mock These

Before Mocking, Ask:

Mock Red Flags

Integration Test Pattern

Async Waiting Patterns

Assertion Strategy

Assert on observable outputs, not internal state

Respect test boundaries

Test Data Management

Anti-Patterns to Avoid

Testing Mock Behavior

Test-Only Methods in Production

Mocking Without Understanding

Incomplete Mocks

TDD Prevents Anti-Patterns

Language-Specific Patterns

Quality Checklist

What NOT to Test

Quick Reference

Related Skills

third774/youtube-captions

third774/xcodebuildmcp

third774/visualizing-with-mermaid

third774/verification-before-completion