skills/test-harness/SKILL.md
Generates pytest test suites with happy path, edge cases, error conditions, fixture scaffolding, mocks, async patterns. Triggers on: "generate tests", "write tests for", "test this function", "create test suite", "pytest for", "unit tests for", "mock strategy for".
npx skillsauth add mathews-tom/armory test-harnessInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic test suite generation that transforms source code into comprehensive, runnable pytest files. Analyzes function signatures, dependency graphs, and complexity hotspots to produce tests covering happy paths, boundary conditions, error states, and async flows — with properly scoped fixtures and focused mocks.
| File | Contents | Load When |
| -------------------------------- | ---------------------------------------------------------------------- | -------------------------------- |
| references/pytest-patterns.md | Fixture scopes, parametrize, marks, conftest layout, built-in fixtures | Always |
| references/mock-strategies.md | Mock decision tree, patch boundaries, assertions, anti-patterns | Target has external dependencies |
| references/async-testing.md | pytest-asyncio modes, event loop fixtures, async mocking | Target contains async code |
| references/fixture-design.md | Factory fixtures, yield teardown, scope selection, composition | Test requires non-trivial setup |
| references/coverage-targets.md | Threshold table, branch vs line, pytest-cov config, exclusion patterns | Coverage assessment requested |
mocker fixture as alternative to unittest.mockBefore writing a single test, build a model of the target code:
git diff --name-only HEAD~5For each function under test, enumerate cases across four categories:
| Category | What to Test | Example |
| ---------- | ---------------------------------------------- | ------------------------------------------- |
| Happy path | Expected inputs produce expected outputs | add(2, 3) returns 5 |
| Boundary | Edge values at limits of valid input | Empty string, zero, max int, single element |
| Error | Invalid inputs trigger proper exceptions | None where str expected, negative index |
| State | State transitions produce correct side effects | Object moves from pending to active |
For each case, note:
Parametrize cases that share the same test logic but differ only in input/output values.
Identify shared setup — If 3+ tests need the same object, extract a fixture.
Select scope — Use the narrowest scope that avoids redundant setup:
| Scope | Use When | Example |
| ---------- | ------------------------------------------ | ---------------------------- |
| function | Default. Each test gets fresh state | Most unit tests |
| class | Tests within a class share expensive setup | DB connection per test class |
| module | All tests in a file share setup | Loaded config file |
| session | Entire test run shares setup | Docker container startup |
Design teardown — Use yield fixtures when cleanup is needed. Never leave
side effects (temp files, DB rows, monkey-patches) after a test.
Identify conftest candidates — Fixtures used across multiple test files belong
in conftest.py. Fixtures used in one file stay in that file.
Decide what to mock — Mock external dependencies only:
datetime.now, time.sleep)Decide what NOT to mock — Never mock:
Choose mock level — Patch at the import boundary of the module under test,
not at the definition site. @patch('mymodule.requests.get'), not
@patch('requests.get').
Add mock assertions — Every mock should assert it was called with expected arguments and the expected number of times. Mocks without assertions are coverage holes.
Generate the test file following this structure:
# tests/test_{module}.py
import pytest
from unittest.mock import Mock, patch, MagicMock
from {module} import {target_function, TargetClass}
# ============================================================
# Fixtures
# ============================================================
@pytest.fixture
def valid_input():
"""Standard valid input for happy path tests."""
return {concrete values}
@pytest.fixture
def mock_database():
"""Mock database connection."""
with patch("{module}.db_connection") as mock_db:
mock_db.query.return_value = [{expected data}]
yield mock_db
# ============================================================
# {target_function} Tests
# ============================================================
class TestTargetFunction:
"""Tests for {target_function}."""
def test_happy_path(self, valid_input):
"""Returns expected result for valid input."""
result = target_function(valid_input)
assert result == {expected}
@pytest.mark.parametrize(
"input_val, expected",
[
({boundary_1}, {expected_1}),
({boundary_2}, {expected_2}),
({boundary_3}, {expected_3}),
],
ids=["empty", "single", "maximum"],
)
def test_boundary_conditions(self, input_val, expected):
"""Handles boundary inputs correctly."""
assert target_function(input_val) == expected
def test_invalid_input_raises(self):
"""Raises TypeError for invalid input."""
with pytest.raises(TypeError, match="expected str"):
target_function(None)
def test_external_call(self, mock_database):
"""Calls database with correct query."""
target_function("lookup_key")
mock_database.query.assert_called_once_with("SELECT * FROM t WHERE key = %s", ("lookup_key",))
| Mode | Scope | Depth | When to Use |
| --------------- | ----------------- | -------------------------------------------- | ------------------------------------ |
| quick | Single function | Happy path + 1 error case | Rapid iteration, TDD red-green cycle |
| standard | File or class | Happy + boundary + error + mocks | Default for most requests |
| comprehensive | Module or package | All categories + async + parametrized matrix | Pre-release, critical path code |
"[email protected]"
not "test_email". 42 not "some_number". Concrete values catch type mismatches that
abstract placeholders mask.@pytest.mark.parametrize. Use ids for readable test names.conftest.py fixtures, class-based tests,
or specific markers, follow those patterns. Do not introduce a conflicting test style.| Problem | Resolution |
| --------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
| Target function has no type hints | Infer types from usage patterns, default values, and docstrings. Note uncertainty in test docstring. |
| Target has deeply nested dependencies | Mock at the nearest boundary to the function under test. Do not mock transitive dependencies individually. |
| No existing test infrastructure (no conftest, no pytest config) | Generate a minimal conftest.py alongside the test file. Note the addition in output. |
| Target code is untestable (global state, hidden dependencies) | Flag the design issue in the output. Generate tests for what is testable. Suggest refactoring to improve testability. |
| Async code detected but pytest-asyncio not installed | Note the dependency requirement. Generate async test stubs with @pytest.mark.asyncio and instruct user to install. |
| Target module cannot be imported | Report the import error. Do not generate tests for unimportable code. |
Push back if:
| Rationalization | Reality | |---|---| | "Manual testing is sufficient" | Manual testing doesn't run in CI, doesn't catch regressions, and doesn't scale with the codebase | | "This code is too simple to test" | Simple code becomes complex code — tests document expected behavior and catch regressions from future changes | | "I'll add tests later" | Tests are specifications; without them, code behavior is undefined and later never comes | | "Mocking everything makes the test fast" | Over-mocked tests pass when the real system fails — mock at boundaries, not deep in the call chain | | "100% coverage means the code is correct" | Coverage measures execution, not correctness — a test that runs code without meaningful assertions adds no value | | "The happy path test is enough" | Edge cases and error paths cause most production incidents — happy-path-only testing is false confidence |
test_<unit>_<scenario>_<expected_outcome>pytest / npm test exits 0 with output capturedtesting
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.
development
Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.