opencode/skills/writing-tests/SKILL.md
Write behavior-focused tests following Testing Trophy model with real dependencies, avoiding common anti-patterns like testing mocks and polluting production code. Use when writing new tests, reviewing test quality, or improving test coverage.
npx skillsauth add third774/dotfiles writing-testsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Core Philosophy: Test user-observable behavior with real dependencies. Tests should survive refactoring when behavior is unchanged.
Iron Laws:
<IMPORTANT> 1. Test real behavior, not mock behavior 2. Never add test-only methods to production code 3. Never mock without understanding dependencies </IMPORTANT>Write tests in this priority order:
Default to integration tests. Only drop to unit tests for pure utility functions.
BEFORE writing any tests, copy this checklist and track your progress:
Test Writing Progress:
- [ ] Step 1: Review project standards (check existing tests)
- [ ] Step 2: Understand behavior (what should it do? what can fail?)
- [ ] Step 3: Choose test type (Integration/E2E/Unit)
- [ ] Step 4: Identify dependencies (real vs mocked)
- [ ] Step 5: Write failing test first (TDD)
- [ ] Step 6: Implement minimal code to pass
- [ ] Step 7: Verify coverage (happy path, errors, edge cases)
Before writing any tests:
Is this a complete user workflow?
→ YES: E2E test
Is this a pure function (no side effects/dependencies)?
→ YES: Unit test
Everything else:
→ Integration test (with real dependencies)
Default: Don't mock. Use real dependencies.
Why: Mocking internal dependencies creates brittle tests that break during refactoring.
describe("Feature Name", () => {
setup(initialState)
test("should produce expected output when action is performed", () => {
// Arrange: Set up preconditions
// Act: Perform the action being tested
// Assert: Verify observable output
})
})
Key principles:
For language-specific patterns, see the Language-Specific Patterns section.
When tests involve async operations, avoid arbitrary timeouts:
// BAD: Guessing at timing
sleep(500)
assert result == expected
// GOOD: Wait for the actual condition
wait_for(lambda: result == expected)
When to use condition-based waiting:
sleep, setTimeout, or arbitrary delaysDelegate to skill: When you encounter these patterns, invoke Skill(ce:condition-based-waiting) for detailed guidance on implementing proper condition polling and fixing flaky tests.
The Golden Rule of Assertions: A test must fail if, and only if, the intention behind the system is not met.
This rule is bidirectional. A test is broken when it:
Before merging any test, ask: "When will this test fail?" If the answer includes anything other than "when the behavior this test describes is broken," the test needs work.
| Context | Assert On | Avoid | | ------- | ----------------------------------------------------- | ------------------------------------- | | UI | Visible text, accessibility roles, user-visible state | CSS classes, internal state, test IDs | | API | Response body, status code, headers | Internal DB state directly | | CLI | stdout/stderr, exit code | Internal variables | | Library | Return values, documented side effects | Private methods, internal state |
Why: Tests that assert on implementation details break when you refactor, even if behavior is unchanged.
A test that makes a real network request doesn't just test your function -- it tests DNS resolution, network connectivity, server uptime, and response timing. When any of those fail, your test fails even though your code is fine. That violates the Golden Rule.
Fix: Mock at the boundary of what you don't own or control. The function under test isn't responsible for the server's validity -- it's responsible for making the right request and handling the response correctly. Use API mocking (e.g., MSW, respx, httpmock) to make external interactions fixed, predictable givens.
This is not a contradiction with "default: don't mock." Internal modules stay real. External boundaries (network, third-party services) get mocked so your test only fails when your code's intention is broken.
Use source constants and fixtures, not hard-coded values:
// Good - References actual constant or fixture
expected_message = APP_MESSAGES.SUCCESS
assert response.message == expected_message
// Bad - Hard-coded, breaks when copy changes
assert response.message == "Action completed successfully!"
Why: When product copy changes, you want one place to update, not every test file.
// BAD: Testing that the mock was called, not real behavior
mock_service.assert_called_once()
// GOOD: Test the actual outcome
assert user.is_active == True
assert len(sent_emails) == 1
Gate: Before asserting on mock calls, ask "Am I testing real behavior or mock interactions?" If testing mocks → Stop, test the actual outcome instead.
// BAD: destroy() only used in tests - pollutes production code
class Session:
def destroy(self): # Only exists for test cleanup
...
// GOOD: Test utilities handle cleanup
# In test_utils.py
def cleanup_session(session):
# Access internals here, not in production code
...
Gate: Before adding methods to production code, ask "Is this only for tests?" Yes → Put in test utilities.
// BAD: Mock prevents side effect test actually needs
mock(database.save) # Now duplicate detection won't work!
add_item(item)
add_item(item) # Should fail as duplicate, but won't
// GOOD: Mock at correct level
mock(external_api.validate) # Mock slow external call only
add_item(item) # DB save works, duplicate detected
add_item(item) # Fails correctly
// BAD: Partial mock - missing fields downstream code needs
mock_response = {
status: "success",
data: {...}
// Missing: metadata.request_id that downstream code uses
}
// GOOD: Mirror real API completely
mock_response = {
status: "success",
data: {...},
metadata: {request_id: "...", timestamp: ...}
}
Gate: Before creating mocks, check "What does the real thing return?" Include ALL fields.
If testing mock behavior, you violated TDD - you added mocks without watching test fail against real code.
For detailed framework and language-specific patterns:
references/javascript-react.md for React Testing Library queries, Jest/Vitest setup, Playwright E2E, and component testing patternsreferences/python.md for pytest fixtures, polyfactory, respx mocking, testcontainers, and FastAPI testingreferences/go.md for table-driven tests, testify/go-cmp assertions, testcontainers-go, and interface fakesBefore completing tests, verify:
Test behavior users/callers observe, not code structure.
| Test Type | When | Dependencies | | ----------- | ----------------------- | ---------------------------- | | Integration | Default choice | Real (test DB, real modules) | | E2E | Critical user workflows | Real (full stack) | | Unit | Pure functions only | None |
| Anti-Pattern | Fix | | ------------------------------- | --------------------------------------- | | Testing mock existence | Test actual outcome instead | | Test-only methods in production | Move to test utilities | | Mocking without understanding | Understand dependencies, mock minimally | | Incomplete mocks | Mirror real API completely | | Tests as afterthought | TDD - write tests first | | Arbitrary timeouts/sleeps | Use condition-based waiting |
<IMPORTANT> **Remember:** Behavior over implementation. Real over mocked. Outputs over internals. </IMPORTANT>data-ai
Extract captions and transcripts from YouTube videos for agent context. Tries manual subtitles, then auto-generated, then falls back to audio transcription via Whisper. Use when a user provides a YouTube URL and wants to understand, summarize, reference, or search video content.
tools
Official skill for XcodeBuildMCP. Use when doing iOS/macOS/watchOS/tvOS/visionOS work (build, test, run, debug, log, UI automation).
data-ai
Create professional Mermaid diagrams with proper styling and visual hierarchy. Use when creating flowcharts, sequence diagrams, state machines, class diagrams, or architecture visualizations.
testing
Run verification commands before claiming work is complete or fixed. Use before asserting any task is done, bug is fixed, tests pass, or feature works.