generated/claude/skills/testing/SKILL.md
Behavioral testing strategy — deciding what to test and how. Use when writing tests, reviewing test quality, or fixing tests that test mocks instead of behavior. Triggers on: 'use testing mode', 'write tests', 'test strategy', 'tests are brittle', 'tests test mocks', 'improve test quality', 'what should I test'. Full access mode - can write and run tests.
npx skillsauth add mcouthon/agents testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Decide what to test and how to test it. Write tests that catch real regressions.
"Tests should be coupled to the behavior of code and decoupled from the structure of code." — Kent Beck
Ask these 5 questions about EVERY test you write. If any answer is "no", rewrite the test.
| # | Question | What It Checks | | --- | --------------------------------------------------------- | ---------------------- | | 1 | Could I rewrite the internals and this test still passes? | Structure-insensitive | | 2 | Am I testing what the code SHOULD DO, not what it DOES? | Behavioral | | 3 | If this test passes, do I trust the code works? | Predictive / Inspiring | | 4 | Am I testing through the public API? | Public contract | | 5 | Am I checking state/output, not verifying call sequences? | State > Interaction |
The Refactoring Litmus Test: After writing a test, imagine completely rewriting the internals while keeping the same public behavior. Would the test still pass? If not, it's coupled to structure and will become a maintenance burden.
Choose the simplest test double that gives confidence. Work top to bottom — stop at the first "yes".
Can I use the REAL implementation?
├─ Yes → Use it (always the first choice)
└─ No → Is it slow, non-deterministic, or expensive?
├─ Yes → Is a FAKE available? (in-memory DB, fake server)
│ ├─ Yes → Use the fake
│ └─ No → STUB specific return values (keep count low)
└─ No → Do I need to verify external SIDE EFFECTS?
(email sent, record saved, event published)
├─ Yes → Interaction test with verify (LAST RESORT)
└─ No → Re-examine — you probably CAN use the real thing
Mock AT boundaries (external edges of your system):
Never mock these:
"Don't Mock What You Don't Own": If you must mock a third-party API, wrap it in your own adapter and mock the adapter.
Think in behaviors, not methods. Each test covers one behavior: "Given X, when Y, then Z."
Don't write one test per method. Write one test per behavior:
# BAD — one method, one test (grows unwieldy)
def test_process_transaction():
# tests display, validation, AND balance check in one test
...
# GOOD — one behavior, one test
def test_process_transaction_displays_item_name(): ...
def test_process_transaction_rejects_negative_amount(): ...
def test_process_transaction_warns_on_low_balance(): ...
Third-party library internals, simple getters/setters, framework boilerplate, and implementation details that may change.
Always write a failing test FIRST that reproduces the bug. Then fix it. Never fix a bug without a regression test.
# BEFORE — testing mocks, not code
@patch("myapp.cache.get")
@patch("myapp.db.query")
@patch("myapp.validator.check")
def test_process(mock_check, mock_cache, mock_db):
mock_db.return_value = {"id": 1}
mock_cache.return_value = None
mock_check.return_value = True
result = process(1) # What are we even testing?
assert result == {"id": 1}
# AFTER — test with real collaborators
def test_process():
db = InMemoryDatabase({"users": [{"id": 1, "name": "Alice"}]})
service = ProcessingService(db=db, cache=MemoryCache())
result = service.process(1)
assert result.name == "Alice"
# BEFORE — mirrors implementation step by step
def test_register_user():
service.register("[email protected]", "pass123")
mock_validator.validate.assert_called_once_with("[email protected]")
mock_db.insert.assert_called_once()
mock_email.send.assert_called_once_with(
to="[email protected]", template="welcome"
)
# AFTER — asserts observable outcomes
def test_register_user():
service.register("[email protected]", "pass123")
assert service.get_user("[email protected]") is not None # user exists
assert len(email_server.sent) == 1 # email sent
assert email_server.sent[0].to == "[email protected]"
Describe what scenario is tested and what outcome is expected:
# Good — scenario + expected outcome
def test_expired_token_returns_401(): ...
def test_checkout_with_empty_cart_raises_error(): ...
def test_transfer_insufficient_balance_raises_error(): ...
def test_login_wrong_password_locks_after_3_attempts(): ...
def test_search_returns_empty_list_when_no_matches(): ...
# Bad — vague, no outcome
def test_token_expiry_check(): ...
def test_transfer(): ...
def test_login_error(): ...
def test_search_works(): ...
Structure each test as Arrange → Act → Assert (or Given → When → Then). Keep sections visually distinct — whitespace or comments between the three phases help readability.
Tests should be Descriptive And Meaningful Phrases. Duplicate for clarity:
make_user(**overrides) to build test objects with sensible defaults — override only fields relevant to the testTests should be trivially correct on inspection. Straight-line code only:
Before committing tests, verify each one:
- [ ] Passes the 5-question Self-Test (above)
- [ ] Name describes scenario + expected outcome
- [ ] No logic (loops, conditionals) in test body
- [ ] Each test is self-contained (DAMP: readable without shared context)
- [ ] Failure message tells you what's wrong without reading the test
- [ ] Uses real implementations where possible (mocks only at boundaries)
- [ ] Tests behavior through public API, not internal methods
| Excuse | Reality | Required Action | | --------------------------------- | ----------------------------------------------- | --------------------------------------------- | | "The change is too small to test" | Small changes cause regressions | Write at least one test for the behavior | | "Tests are passing" | You haven't actually run them | Run the test suite and show output | | "Existing tests cover this" | You haven't checked | Find and cite the specific test | | "I'll add tests later" | Later never comes | Write tests before marking done | | "Mocking is too complex here" | Complex mocking means bad design | Refactor to test real behavior instead | | "This is just a prototype" | Prototypes without tests become production code | Write at least a smoke test for core behavior |
development
Systematic debugging with hypothesis-driven investigation. Use when something is broken, tests are failing, unexpected behavior occurs, or errors need investigation. Triggers on: 'this is broken', 'debug', 'why is this failing', 'unexpected error', 'not working', 'bug', 'fix this issue', 'investigate', 'tests failing', 'trace the error', 'use debug mode'. Full access mode - can run commands, add logging, and fix issues.
development
Systematic debugging with hypothesis-driven investigation. Use when something is broken, tests are failing, unexpected behavior occurs, or errors need investigation. Triggers on: 'this is broken', 'debug', 'why is this failing', 'unexpected error', 'not working', 'bug', 'fix this issue', 'investigate', 'tests failing', 'trace the error', 'use debug mode'. Full access mode - can run commands, add logging, and fix issues.
testing
Behavioral testing strategy — deciding what to test and how. Use when writing tests, reviewing test quality, or fixing tests that test mocks instead of behavior. Triggers on: 'use testing mode', 'write tests', 'test strategy', 'tests are brittle', 'tests test mocks', 'improve test quality', 'what should I test'. Full access mode - can write and run tests.
development
Use when finding code smells, auditing TODOs, removing dead code, cleaning up unused imports, or assessing code quality. Triggers on: 'use tech-debt mode', 'tech debt', 'code smells', 'clean up', 'remove dead code', 'delete unused', 'simplify'. Full access mode - can modify files and run tests.