hunter-party-py/test-hunter-py/SKILL.md
Audit Python test code for quality gaps — missing coverage on critical paths, brittle tests coupled to implementation, over-mocking, assertion-free tests, missing edge cases, and duplicated test setup. Focuses on test effectiveness, not production code structure. Use when: reviewing Python test suites for reliability, reducing false-positive test failures, improving coverage of critical business logic, or cleaning up test debt.
npx skillsauth add skyosev/agent-skills test-hunter-pyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Audit test code for quality and coverage gaps — tests that verify implementation details instead of behavior, mocking so aggressively that nothing real is tested, critical paths with no coverage, and duplicated setup that makes the suite fragile. The goal: tests catch real bugs, survive refactors, and cover the paths that matter most.
Test behavior, not implementation. A test should verify what the code does, not how it does it. If renaming an internal function or reordering private methods breaks a test, the test is coupled to implementation — it will fail on safe refactors and pass on real bugs.
Coverage is not confidence. 100% line coverage with shallow assertions is worse than 70% coverage with meaningful assertions on critical paths. Prioritize coverage of business logic, error handling, and edge cases over hitting every line.
Mocks are costly. Every mock is a place where the test diverges from reality. Mock external boundaries (network, filesystem, clock), not internal modules. Internal seam mocking can be legitimate (e.g., isolating a slow subsystem, testing error paths that are hard to trigger), but each mock is a maintenance cost and a potential false-confidence point. If a unit test requires 5+ mocks to run, the unit under test has too many dependencies (solid-hunter territory) or the test is too isolated.
Tests are code. They deserve the same quality standards as production code — no duplication, clear naming, and readable assertions. A test that requires reading the source to understand what it verifies is a bad test.
Edge cases are where bugs hide. Empty input, None, zero, negative numbers, maximum values, Unicode, concurrent
access, timeout — these are the inputs developers forget and users provide.
Business logic, error handling, and security-sensitive code with no test coverage.
Signals:
coverage.py reports when available;
fall back to searching for test file imports as a heuristic, but note this is unreliable with pytest's
discovery model and black-box integration tests)except blocks, error returns) with no test exercising themAction: Flag the critical path and its risk level. Recommend specific test cases for the highest-risk gaps.
Tests that break on safe refactors because they depend on internal details.
Signals:
mock.assert_called_once()) where the call
count is an implementation detail, not a business requirement. (Call-count assertions on boundary interactions —
e.g., verifying an email was sent exactly once, or a payment API was charged once — may be legitimate when the count
is business-significant.)_ prefixed)Action: Rewrite to assert on observable outputs: return values, side effects at boundaries (HTTP responses, DB state, emitted events), or user-visible behavior.
Tests where so much is mocked that the test verifies the mock wiring, not the actual logic.
Signals:
patch()/Mock() setup lines than assertion lines@patch decorators stacked 3+ deep on a single test functionAction: Remove mocks on internal modules. Mock only external boundaries (network, filesystem, clock, randomness). If the unit requires many mocks, consider testing at a higher integration level.
Tests that execute code but never verify meaningful outcomes.
Signals:
assert statement or pytest.raises() callassert result is not None or assert not raisesassert result) when a specific value is expectedtry/except in test that swallows assertion errorsAction: Add specific assertions on return values, state changes, or side effects. Every test should answer "what would break if this code had a bug?"
Tests that cover the happy path but miss boundary conditions, error cases, and unusual inputs.
Signals:
None, zero, negative numbers, very large valuesAction: Flag the specific edge cases missing for each function/module. Prioritize by risk: error paths and boundary conditions in core logic first.
Repeated setup, shared fixtures, and copied test blocks that make the suite fragile and hard to maintain.
Signals:
@pytest.fixture definitions across multiple test filesconftest.py files over 200 lines@patch decorators across many test functions in same fileAction: Extract shared setup into focused fixtures or factory functions in conftest.py. Parameterize repeated
test cases with @pytest.mark.parametrize. Eliminate shared mutable state between tests.
Patterns that cause tests to pass or fail non-deterministically.
Signals:
time.sleep() / asyncio.sleep() in tests (timing-dependent)datetime.now() or random values without seeding/freezingAction: Replace timing waits with polling/event-based assertions. Freeze time with freezegun or
unittest.mock.patch. Isolate tests from shared state and external dependencies. Use tmp_path fixture for file
operations.
Patterns specific to async Python testing that cause silent skips, resource leaks, or false passes.
Signals:
async def test_* without @pytest.mark.asyncio decorator (test is collected but never awaited — silently passes
without executing the body). Only applies when asyncio_mode is not set to "auto" in pyproject.toml/conftest.pyasync def in @pytest.fixture) without @pytest_asyncio.fixture (fixture returns a coroutine
object instead of the resolved value)asyncio.run() called inside a test function that already runs in an event loop (nested event loop error, or
masked by nest_asyncio)session = AsyncSession() yielded but not closed/rolled-back
in the finally/yield teardownpytest-asyncio mode configuration causing inconsistent behavior between strict and auto modes
across the test suiteaiohttp.ClientSession() or httpx.AsyncClient() created in test body without async with (resource leak)MagicMock) used where AsyncMock is needed — mock returns a MagicMock object instead of awaitingAction: Use @pytest.mark.asyncio or configure asyncio_mode = "auto". Use @pytest_asyncio.fixture for async
fixtures. Wrap async resources in async with in test bodies and fixture teardown. Use AsyncMock for async
interfaces. Verify tests actually execute by adding a deliberate failure and confirming it's caught.
main/master)BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@' || echo main)
SCOPE=$(git diff --name-only $(git merge-base HEAD $BASE)...HEAD)
Constrain all subsequent scans to the resolved surface.test_*.py, *_test.py,
tests/ directories).# Test files
rg -l '(def test_|class Test)' --type py --glob '**/*test*'
# Assertion-free tests (test functions without assert/raises — verify manually)
rg -U 'def test_\w+\s*\([^)]*\)\s*:' --type py --glob '**/*test*'
# Mock density (count patch/Mock calls per file)
rg -c '(@patch|Mock\(|MagicMock\(|patch\()' --type py --glob '**/*test*' --sort path
# Stack of patches
rg --pcre2 '(@patch.*\n){3,}' --type py --glob '**/*test*'
# Timing-dependent patterns
rg 'time\.sleep|asyncio\.sleep' --type py --glob '**/*test*'
# Shared mutable state in tests
rg '(global\s|module-level.*=)' --type py --glob '**/*test*'
# Coverage gaps: source files without corresponding test files
# (compare src file list to test file list — project-specific)
# Weak assertions
rg 'assert\s+\w+\s+is not None$|assert\s+result$|assert\s+True$' --type py --glob '**/*test*'
# Async test antipatterns
rg 'async def test_' --type py --glob '**/*test*' # async test functions
rg '@pytest.mark.asyncio' --type py --glob '**/*test*' # marked tests
rg 'asyncio\.run\(' --type py --glob '**/*test*' # nested event loop risk
rg 'MagicMock\(' --type py --glob '**/*test*' # may need AsyncMock
rg 'asyncio_mode' --glob '**/pyproject.toml' --glob '**/conftest.py' # mode configuration
For each test file:
Save as YYYY-MM-DD-test-hunter-audit-{$LLM-name}.md in the project's docs folder (or project root if no docs folder
exists).
# Test Hunter Audit — {date}
## Scope
- Surface: {diff / path / codebase}
- Files: {count or list}
- Test framework: {pytest / unittest / nose2 / etc.}
- Exclusions: {list}
## Coverage Gaps
| # | Module | Location | Risk | Test File | Action |
| - | ------ | -------- | ---- | --------- | ------ |
| 1 | PaymentService | file:line | High | None | Add tests for charge, refund, and failure paths |
## Findings
### Brittle Tests
| # | Test | Location | Coupling | Action |
| - | ---- | -------- | -------- | ------ |
| 1 | `test_process_order` | file:line | Asserts `save` called 3 times | Assert on returned order state |
### Over-Mocking
| # | Test File | Location | Mocks | Assertions | Action |
| - | --------- | -------- | ----- | ---------- | ------ |
| 1 | test_user.py | file:line | 8 @patch | 2 assertions | Remove internal mocks, test at integration level |
### Assertion-Free / Weak Assertions
| # | Test | Location | Issue | Action |
| - | ---- | -------- | ----- | ------ |
| 1 | `test_create_user` | file:line | Only checks `is not None` | Assert on specific user fields |
### Missing Edge Cases
| # | Module | Location | Missing Cases | Action |
| - | ------ | -------- | ------------- | ------ |
| 1 | `validate_email()` | file:line | Empty string, Unicode, max length | Add boundary tests |
### Test Duplication
| # | Pattern | Locations | Action |
| - | ------- | --------- | ------ |
| 1 | Identical user factory in 5 files | file:line, file:line, ... | Extract shared fixture to conftest.py |
### Flaky Test Indicators
| # | Test | Location | Pattern | Action |
| - | ---- | -------- | ------- | ------ |
| 1 | `test_timeout` | file:line | `time.sleep(1)` | Use freezegun or event-based assertion |
### Async Test Antipatterns
| # | Test | Location | Pattern | Action |
| - | ---- | -------- | ------- | ------ |
| 1 | `test_fetch_data` | file:line | Missing `@pytest.mark.asyncio` | Add decorator or configure `asyncio_mode = "auto"` |
## Recommendations (Priority Order)
1. **Must-fix**: {missing coverage on critical paths, assertion-free tests, flaky indicators}
2. **Should-fix**: {brittle tests, over-mocking, missing edge cases}
3. **Consider**: {test duplication, weak assertions on non-critical code, async test cleanup}
file/path.py:line with the exact test code.development
Transforms vague feature ideas into precise, codebase-grounded technical requirements. Use when requirements are ambiguous/incomplete, the user struggles to describe behavior, terminology is unclear, or multiple concepts are mixed. Output is a requirements spec—NOT an implementation plan.
tools
Audit TypeScript type definitions for design debt — duplicated shapes, missing derivations, over-engineered generics, under-constrained type parameters, reinvented utility types, and disorganized type architecture. Type structure and maintainability, not type enforcement. Use when: reviewing type definitions for maintainability, reducing type duplication, simplifying over-engineered type-level logic, or reorganizing type architecture after growth.
development
Audit TypeScript test code for quality gaps — missing coverage on critical paths, brittle tests coupled to implementation, over-mocking, assertion-free tests, missing edge cases, and duplicated test setup. Focuses on test effectiveness, not production code structure. Use when: reviewing TypeScript test suites for reliability, reducing false-positive test failures, improving coverage of critical business logic, or cleaning up test debt.
tools
Audit TypeScript class and interface design for SOLID violations — god classes, rigid extension points, broken substitutability, fat interfaces, and concrete dependency chains. Focuses on responsibility assignment and abstraction fitness. Use when: reviewing class hierarchies, preparing for extension with new variants, reducing coupling between services, or improving testability of class-heavy code.