skills/python-testing/SKILL.md
Use when writing or reviewing tests for Python behavior, contracts, async lifecycles, or reliability paths. Also use when tests are flaky, coupled to implementation details, missing regression coverage, slow to run, or when unclear what tests a change needs. Use for multi-Python version testing (nox) and free-threaded Python thread-safety validation.
npx skillsauth add ahgraber/skills python-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Test observable behavior and contracts, not internal implementation. Keep unit tests fast, deterministic, and patched at module boundaries.
These are preferred defaults for common cases. When a default conflicts with project constraints, suggest a better-fit alternative, call out tradeoffs, and note compensating controls.
For language-agnostic testing principles (AAA structure, naming, test doubles taxonomy, portfolio strategy), see writing-tests.
This skill extends those foundations with Python-specific practices.
python-testing.When NOT to use:
python-types-contracts).python-runtime-operations).python-concurrency-performance).pytest-run-parallel, set PYTHON_GIL=0, always set CI timeouts.See writing-tests → references/test-doubles.md for the full Meszaros taxonomy (Dummy, Fake, Stub, Spy, Mock), the classical vs. mockist distinction, and general guidance.
Python-specific notes:
unittest.mock naming is misleading.
Mock and MagicMock are named "Mock" but behave as Stubs by default — configured via return_value or side_effect, they return canned values and do not self-verify.
They become Mocks (behavior verification) only when you call .assert_called_with() / .assert_called_once_with().
Prefer asserting on observable output instead.
Use create_autospec() or autospec=True.
A plain Mock() accepts any arguments silently.
create_autospec(SomeClass) enforces the real method signatures — a call-site mismatch fails the test immediately rather than hiding a TypeError until production.
Patch at the import location used by the module under test, not at the original definition site:
# Wrong: patches the definition; callers in other modules are unaffected
patch("requests.post", ...)
# Right: patches where your module imported it
patch("mymodule.client.requests.post", ...)
Consider pytest-mock's mocker fixture if the project already uses it.
It integrates with pytest's fixture lifecycle, handles teardown automatically, and avoids decorator stacking on test functions.
unittest.mock.patch is sufficient when pytest-mock is not already a dependency — adding it solely for ergonomics is not warranted.
Fakes over deep mock chains.
Prefer in-memory implementations (FakeRepo, a dict-backed store) over multi-level Mock() chains.
Chains that mirror the production call graph are brittle and a sign that I/O has not been decoupled from logic.
uv run pytest scripts/test_pypi_security_audit.py -vpyleak diagnostics.nox.PYTHON_GIL=0 uv run --python 3.Xt pytest --parallel-threads=auto --timeout=300 on a free-threaded build (3.13t+).references/testing-strategy.md § Multi-Path and Derived-Field Patterns.field_a == f(field_b) is a required relationship and each field is written by a different producer (resolver, fetcher, merge step), stage-local tests don't prove the invariant holds after composition.
Write a test that asserts the relationship on the assembled output.
See references/testing-strategy.md § Multi-Path and Derived-Field Patterns.t-suffixed) matrix entry; the GIL hides race conditions that will surface when free-threaded Python becomes the default.Py_mod_gil silently re-enables the GIL; check sys._is_gil_enabled() after imports.3.10 unquoted in CI matrix YAML; it parses as 3.1 and installs the wrong Python.references/testing-strategy.mdreferences/pytest-practices.mdreferences/async-and-concurrency-testing.mdreferences/reliability-lifecycle-testing.mdreferences/multi-python-testing.mdreferences/free-threaded-testing.mdwriting-tests skill — language-agnostic foundations (AAA, naming, test doubles, portfolio strategy)development
Use when the user wants rigorous, non-sycophantic editorial feedback on a draft, essay, blog post, or argument through back-and-forth dialogue — pressure-testing thesis, structure, argument, clarity, tone, and evidence. Triggers: "be my sparring partner", "pressure-test this draft", "poke holes in my argument", "is this ready to publish", "sharpen this post", "where is this weak". Not for one-shot copyediting, proofreading, or ghostwriting.
testing
Use when distilling the through-line gist of one or more sources — the spine, argument, tension, or recurring frame running through a set of documents, notes, research, or transcripts, OR across the ideas within a single rich piece — into a few concise paragraphs. Triggers: "synthesize", "what's the through-line/gist", "extract the insight", "pull these together". Not for faithful summary or condensation that covers what a source says, nor for comparisons or catalogs where enumeration is the deliverable.
development
Use when writing or reviewing tests in any language, or diagnosing a suite that is slow, brittle, or hard to read. Triggers: "write tests", "how should I test this", "what kind of test", "test is flaky/fragile", "should I mock this", "test is hard to read". For Python-specific guidance see `python-testing`.
development
Use when writing, debugging, or explaining Strudel live-coding music patterns — mini-notation syntax, pattern functions (fast/slow/every/off/stack), synth/sample selection, audio effects, scale/chord/voicing API, or EDM production recipes. Triggers: "write a Strudel pattern", "how do I make a bassline in Strudel", "what does .every() do", "strudel drum beat", "strudel chord voicing", any Strudel code question.