python-package/skills/testing-strategy/SKILL.md
This skill should be used when the user is configuring pytest, writing tests, setting up test fixtures, using parametrize, measuring code coverage, writing async tests with pytest-asyncio, using Hypothesis for property-based testing, choosing between nox and tox, building CI test matrices, setting up snapshot testing with syrupy, mocking with pytest-mock, or reviewing test organization. Covers pytest configuration, fixtures, coverage thresholds, async testing, Hypothesis profiles, CI matrices, and mocking best practices.
npx skillsauth add oborchers/fractional-cto testing-strategyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every serious Python package -- attrs, httpx, Pydantic, FastAPI, Rich -- shares the same pytest configuration philosophy: strict by default, warnings as errors, no silent regressions. Without strict settings, typos in markers go unnoticed, deprecated upstream APIs break you without warning, and xfail tests silently pass for months hiding fixed bugs that never get their markers removed.
Testing strategy failures are quiet. Coverage regresses 1% at a time. A missing --strict-markers lets @pytest.mark.solw pass silently. filterwarnings without "error" lets upstream deprecation warnings accumulate until a dependency update breaks everything at once. The configuration below prevents all of this.
These settings are non-negotiable. They appear in every major package's pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["tests"]
xfail_strict = true
filterwarnings = ["error"]
addopts = ["--strict-markers", "--strict-config", "-ra"]
markers = [
"slow: marks tests as slow (deselect with '-m \"not slow\"')",
"network: marks tests that require network access",
"integration: marks integration tests requiring external services",
]
| Setting | What It Prevents |
|---------|-----------------|
| testpaths = ["tests"] | Scanning src/, docs/, node_modules/ -- faster collection |
| xfail_strict = true | Unexpectedly passing xfail silently succeeding instead of failing |
| filterwarnings = ["error"] | Missing upstream DeprecationWarning until it breaks you |
| --strict-markers | Typos like @pytest.mark.solw passing without error |
| --strict-config | Typos like filterwarning (missing 's') being silently ignored |
| -ra | Forgetting to check which tests were skipped or xfailed |
Add targeted warning ignores only for known upstream issues you cannot control:
filterwarnings = [
"error",
"ignore::DeprecationWarning:some_dependency.*",
]
Mirror the source directory -- the tests/ directory must mirror src/my_package/ exactly, with the same subdirectories and a test_-prefixed file for every source module. This makes it obvious where tests live and immediately reveals untested modules. See the project-structure skill for the full directory mapping.
Start flat within that mirror. Refactor to additional directories (e.g., tests/unit/, tests/integration/) only when test count exceeds 500 or different test layers need different infrastructure.
| Structure | When | Run Subsets |
|-----------|------|-------------|
| Flat mirror (tests/test_*.py matching src/) | < 500 tests, same fixtures | pytest -m "not slow" |
| Directories (tests/unit/, tests/integration/) | > 500 tests, different infrastructure per layer | pytest tests/unit/ |
pytest_plugins = ["tests.fixtures.database"]# DO: Factory with sensible defaults
@pytest.fixture
def make_user():
def _make_user(name="test_user", email="[email protected]", role="user"):
return User(name=name, email=email, role=role)
return _make_user
def test_admin_permissions(make_user):
admin = make_user(role="admin")
assert admin.can_delete(make_user())
| Good | Bad |
|------|-----|
| One factory fixture with parameters | Separate fixture per variant (admin_user, inactive_user) |
| Compose fixtures: client(app(config)) | Monolithic fixture that sets up everything |
| Use built-ins: tmp_path, capsys, monkeypatch | Reinvent temporary directories or stdout capture |
| autouse=True only for leak prevention | autouse=True for convenience |
A fixture can only depend on fixtures with equal or broader scope. Expensive resources (DB engines, HTTP servers) use scope="session", cheap per-test resources (DB transactions, test clients) use default scope with rollback in teardown.
Always use ids for readable test output. Include expected values in parameters -- never use conditionals inside parametrized tests.
# DO: Expected value in parameters
@pytest.mark.parametrize(("fmt", "expected"), [
pytest.param("json", '"name"', id="json_format"),
pytest.param("xml", "<name>", id="xml_format"),
])
def test_export(fmt, expected):
assert expected in export(data, fmt)
# DON'T: Conditionals inside parametrized test
@pytest.mark.parametrize("fmt", ["json", "xml"])
def test_export(fmt):
result = export(data, fmt)
if fmt == "json": assert '"name"' in result # Three tests pretending to be one
elif fmt == "xml": assert "<name>" in result
Stack decorators for cartesian products:
@pytest.mark.parametrize("method", ["GET", "POST", "PUT"])
@pytest.mark.parametrize("auth", ["token", "api_key"])
def test_endpoint(method, auth): # 3 x 2 = 6 tests
...
[tool.coverage.run]
source_pkgs = ["my_library"]
branch = true
parallel = true
[tool.coverage.report]
show_missing = true
fail_under = 85
exclude_also = [
"if TYPE_CHECKING:",
"@overload",
"raise NotImplementedError",
"assert_never",
"\\.\\.\\.",
]
| Decision | Recommendation |
|----------|---------------|
| Branch coverage | Always enable (branch = true). Line coverage misses untested else paths. |
| fail_under | Start at 80, raise as coverage improves. Never lower it. Prevents silent regression. |
| Target | 80-85% for libraries, 85-90% for production APIs, never chase 100% |
| Exclusions | TYPE_CHECKING blocks, @overload, abstract methods, sentinel ... |
Run locally: pytest --cov=my_library --cov-report=term-missing
Enable pytest-asyncio auto mode to avoid decorating every async test:
[tool.pytest.ini_options]
asyncio_mode = "auto"
Any async def test_* is automatically detected. For trio or anyio backends, use asyncio_mode = "auto" with the anyio pytest plugin instead.
For FastAPI, use httpx.AsyncClient with ASGITransport:
@pytest.fixture
async def client():
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as ac:
yield ac
async def test_create_item(client):
response = await client.post("/items/", json={"name": "Foo"})
assert response.status_code == 201
Use Hypothesis for serialization round-trips, parsers, data transformations, and mathematical properties. Used by Pydantic, attrs, CPython, NumPy. Not worth it for simple CRUD or UI tests.
Register separate profiles for CI and local development in conftest.py:
from hypothesis import settings, HealthCheck
settings.register_profile("ci", max_examples=1000, deadline=None,
suppress_health_check=[HealthCheck.too_slow])
settings.register_profile("dev", max_examples=50, deadline=400)
settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "default"))
Pin regression cases with @example() so they run on every invocation, not just when Hypothesis rediscovers them.
Add .hypothesis/ to .gitignore.
| Mock | Do Not Mock |
|------|-------------|
| External HTTP APIs, databases in unit tests | Your own pure functions |
| Time/dates (time-machine), third-party services | Data structures, simple transformations |
| Environment variables (monkeypatch) | The thing you are testing |
Patch where the name is used, not where it is defined: mocker.patch("myapp.email.SMTP") (correct) vs mocker.patch("smtplib.SMTP") (wrong). Prefer dependency injection over mocking -- pass InMemoryDatabase() instead of patching PostgresDatabase.
Full Python version matrix on Linux. Add macOS and Windows only if your package has platform-specific behavior — when needed, test oldest + newest Python versions only. See the ci-cd skill for the full GitHub Actions workflow and reusable workflow patterns.
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]
os: [ubuntu-latest]
include:
# Add these only if your package has platform-specific behavior
- { python-version: "3.10", os: macos-latest }
- { python-version: "3.13", os: macos-latest }
- { python-version: "3.10", os: windows-latest }
- { python-version: "3.13", os: windows-latest }
Test with uv sync --resolution lowest-direct to verify minimum dependency bounds are correct.
Combine the pytest and coverage sections from above into pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["tests"]
xfail_strict = true
filterwarnings = ["error"]
addopts = ["--strict-markers", "--strict-config", "-ra"]
asyncio_mode = "auto"
[tool.coverage.run]
source_pkgs = ["my_library"]
branch = true
parallel = true
[tool.coverage.report]
show_missing = true
fail_under = 85
exclude_also = ["if TYPE_CHECKING:", "@overload", "raise NotImplementedError", "assert_never", "\\.\\.\\.",]
When reviewing tests and test configuration:
xfail_strict = true is set -- unexpectedly passing xfail tests fail the buildfilterwarnings = ["error"] is set with only targeted, module-specific ignores--strict-markers and --strict-config are in addoptsmarkers listpytest.raises always includes match="..." -- no bare exception catchingids for readable output and include expected values in parametersbranch = true and fail_under is set (minimum 80)TYPE_CHECKING blocks and @overload are excluded from coverageasyncio_mode = "auto" -- no manual decoratorsfail_under is treated as a ratchet -- raise it as coverage improves, never lower itmax_examples=1000) and dev (max_examples=50)@example() decoratorstools
This skill should be used when the user invokes any /plan-* command from the planning-tools plugin (/plan-context, /plan-master, /plan-open-questions, /plan-verify, /plan-tick, /plan-progress, /plan-delete), asks how Claude Code's plan files work, asks where plans are stored, asks to author or audit a multi-phase master planning document, asks how to walk through a plan's Open Questions interactively, asks how to write progress entries, or mentions ~/.claude/plans/ or .claude/planning-tools.local.md. Provides the index of planning-tools commands, the master-plan workflow lifecycle, the v0.3.0+ list-shape mandate (phases and questions as headings + bulleted scope items, never tables), the v0.3.2+ plain-bullet shape (no `- [ ]` checkboxes — heading emoji is the sole tick signal), the progress-entry methodology, and the mechanics of Claude Code's plan-mode file storage.
testing
This skill should be used when the user is adjusting spacing, padding, margins, content density, section gaps, vertical rhythm, or separation between elements. Also applies when reviewing whether a design feels cramped or too sparse, choosing between borders and whitespace for separation, or defining a spacing system. Covers the 4px/8px spacing system, macro vs micro whitespace, content density spectrum, separation techniques (whitespace > background shifts > borders), and vertical rhythm.
development
This skill should be used when the user is defining brand personality in design, choosing between illustration and photography, adding motion or animation, creating visual motifs, ensuring layout variety, customizing CSS framework defaults, or calibrating the level of creative expression for a given context. Covers Lavie & Tractinsky's expressive aesthetics, the expression spectrum (restrained to bold), brand personality translation, illustration systems, photography direction, and template independence.
development
This skill should be used when the user is establishing visual importance, designing headings, creating focal points, designing CTAs or buttons, arranging label-data relationships, implementing scanning patterns (F-pattern, Z-pattern), or ensuring one dominant element per screen. Covers the three levers of hierarchy (size, weight, color), three-tier information architecture, the 'emphasize by de-emphasizing' principle, CTA design, and label-data relationships.