skills/testing/mocking-test-generator/SKILL.md
Generates tests that mock external dependencies — HTTP, databases, filesystems, clocks — isolating the unit under test while still exercising realistic interactions. Use when the code has side effects you can't run in a test, when external services are slow or unavailable, or when testing error paths that are hard to trigger for real.
npx skillsauth add santosomar/general-secure-coding-agent-skills mocking-test-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Mocks replace real dependencies with controllable fakes. Done right, they isolate the unit and let you test error paths. Done wrong, they test the mock instead of the code.
| Dependency | Mock? | Why |
| ---------------------------- | ----------------------------------------------- | ------------------------------------------- |
| External HTTP API | Yes | Slow, flaky, costs money, rate limits |
| Database | Usually yes (unit), no (integration) | Real DB in unit tests = slow + shared state |
| Filesystem | Yes, or use a temp dir | Test pollution, permission issues |
| Clock / time | Yes | sleep(3600) in a test is not a test |
| Random number generator | Yes (seed it) | Reproducibility |
| Your own pure functions | No | That's what you're testing |
| Your own classes (same module) | Rarely | Over-mocking tests wiring, not behavior |
Rule of thumb: mock at the process boundary (network, disk, clock). Don't mock your own code.
| Type | What it does | Use for | | ------------- | --------------------------------------------- | ------------------------------------------ | | Stub | Returns canned values | "When I call X, give me Y" — most tests | | Mock | Stub + records calls for assertion | "Verify X was called with args A, B" | | Fake | Working implementation, simplified (in-memory DB) | Integration-ish tests, stateful interactions | | Spy | Real object + call recording | "Let it really happen, but verify it did" |
Default to stubs. Mocks (with call-count assertions) couple the test to how the code does something, not what it does. Brittle.
Code under test:
def fetch_with_retry(url, max_attempts=3):
for attempt in range(max_attempts):
resp = requests.get(url, timeout=5)
if resp.status_code == 200:
return resp.json()
time.sleep(2 ** attempt)
raise FetchError(f"{max_attempts} attempts failed")
What to mock: requests.get (network) and time.sleep (clock).
Test — happy path:
def test_fetch_succeeds_first_try(mocker):
mock_get = mocker.patch("requests.get")
mock_get.return_value.status_code = 200
mock_get.return_value.json.return_value = {"data": 42}
result = fetch_with_retry("http://example.com")
assert result == {"data": 42}
assert mock_get.call_count == 1 # didn't retry
Test — retry then succeed:
def test_fetch_retries_on_500(mocker):
mocker.patch("time.sleep") # don't actually sleep
mock_get = mocker.patch("requests.get")
# Two 500s, then a 200:
r500 = mocker.Mock(status_code=500)
r200 = mocker.Mock(status_code=200)
r200.json.return_value = {"data": 42}
mock_get.side_effect = [r500, r500, r200]
result = fetch_with_retry("http://example.com")
assert result == {"data": 42}
assert mock_get.call_count == 3
Test — exhausted retries:
def test_fetch_raises_after_max_attempts(mocker):
mocker.patch("time.sleep")
mock_get = mocker.patch("requests.get")
mock_get.return_value.status_code = 500
with pytest.raises(FetchError, match="3 attempts"):
fetch_with_retry("http://example.com", max_attempts=3)
assert mock_get.call_count == 3
The error path is the hardest to trigger for real and the easiest to test with mocks. That's where mocking earns its keep.
def test_process_order(mocker):
mocker.patch.object(Order, "validate")
mocker.patch.object(Order, "calculate_total")
mocker.patch.object(Order, "save")
mocker.patch.object(Inventory, "reserve")
mocker.patch.object(Notifier, "send")
process_order(order)
Order.validate.assert_called_once()
Order.calculate_total.assert_called_once()
# ...
This tests that process_order calls five things in order. It doesn't test that the order is processed correctly. Change the implementation (merge two methods, reorder calls) → test breaks, but nothing's wrong.
Fix: mock only Inventory.reserve and Notifier.send (external side effects). Let Order's methods actually run — they're your code.
Time is the sneakiest dependency. Two approaches:
| Approach | How |
| ----------------------------------- | -------------------------------------------------------- |
| Inject clock | def __init__(self, clock=time.time) → pass lambda: fake_now in tests |
| Patch globally | freezegun.freeze_time("2025-01-01") — monkey-patches datetime/time everywhere |
Injection is cleaner (explicit dependency). Freezegun is easier for existing code that calls datetime.now() inline.
parse(), don't mock parse()'s helpers — mock its I/O._helper twice" is implementation — don't.{"status": "ok"}; your mock returns {"ok": true}. Tests pass, prod breaks. Verify mocks against a real response periodically (contract tests).time.sleep in retry tests. A test that sleeps 7 seconds is a test nobody runs.## Code under test
<function/class>
## Dependencies
| Dependency | Type | Mock? | Why |
| ---------- | ---- | ----- | --- |
## Tests
### <test name>
Mocks: <what's patched>
Scenario: <what path — happy, retry, error>
<code>
## Over-mocking check
<any mocks that patch your own code — justify or remove>
## Contract risk
<mocks that mirror external APIs — how you verify they match reality>
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.