areas/software/qa/skills/test-pyramid/SKILL.md
Decide what type of test to write, structure the suite, measure health, and apply test doubles correctly.
npx skillsauth add sawrus/agent-guides test-pyramidInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expertise: Test type selection, suite health, test doubles, coverage strategy, CI integration.
Is this a user-visible multi-step workflow (login → action → confirmation)?
→ E2E test (Playwright/Cypress/Detox)
Does the code call external systems (DB, API, queue, file system)?
→ Integration test (real or containerized dependency)
Is this pure business logic, calculation, data transformation, conditional?
→ Unit test (fast, isolated, no I/O)
Is this a contract between two services?
→ Contract test (Pact or schema validation)
| Layer | Target % | When runs | Max duration | |---|---|---|---| | Unit | 70% | Every commit | < 2 min | | Integration | 20% | Every PR | < 5 min | | E2E | 10% | Pre-release | < 20 min |
Suite health signals to act on:
Situation → Double
──────────────────────────────────────────────────────────
Verify a function WAS called → Mock
Control what a dependency returns → Stub
Need working but simplified implementation → Fake (in-memory DB)
Observe calls without replacing behavior → Spy
Golden rule: Never mock what you don't own. Wrap third-party libraries in your own adapter → mock the adapter.
# ❌ Mocking requests directly
with patch("requests.get") as mock:
mock.return_value.json.return_value = {"status": "ok"}
# ✅ Mock your own wrapper
class HttpClient:
async def get(self, url: str) -> dict: ...
class FakeHttpClient:
async def get(self, url: str) -> dict:
return {"status": "ok"}
service = MyService(http_client=FakeHttpClient())
Coverage is a floor, not a ceiling. Priority:
# ❌ Coverage inflation — tests nothing meaningful
def test_order_fields_exist():
order = Order(id=1, status="pending")
assert order.id == 1 # tests Python, not your logic
# ✅ Tests behavior and business rules
def test_order_cannot_be_cancelled_if_already_shipped():
order = Order(id=1, status="shipped")
with pytest.raises(OrderStateError, match="Cannot cancel shipped order"):
order.cancel()
# Naming: test_<when>_<expected_outcome>
def test_create_order_with_invalid_product_id_raises_not_found(): ...
def test_apply_discount_when_code_expired_returns_zero(): ...
# Structure: Arrange / Act / Assert
def test_order_total_includes_tax():
order = Order(items=[OrderItem(price=Decimal("100.00"), quantity=2)])
total = order.calculate_total(tax_rate=Decimal("0.20"))
assert total == Decimal("240.00")
# Parametrize for multiple inputs
@pytest.mark.parametrize("quantity,expected_error", [
(0, "must be greater than 0"),
(-1, "must be greater than 0"),
(1001, "exceeds maximum"),
])
def test_order_item_quantity_validation(quantity, expected_error):
with pytest.raises(ValidationError, match=expected_error):
OrderItem(product_id="prod_1", quantity=quantity)
make test (unit + integration) < 5 mintime.sleep() — use explicit waits or mocks for timetesting
QA Expert for writing E2E tests, test scenarios, test plans, and ensuring test coverage quality.
development
Expert UI/UX design intelligence for creating distinctive, high-craft, and mobile-first interfaces. Focuses on premium aesthetics, touch-first ergonomics, and Flutter performance.
development
Code Review Expert for static analysis, security auditing, architecture review, and ensuring code quality standards.
development
Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.