dev-testing/SKILL.md
Testing guide for orchestrated sub-agents. Covers strategy selection, backend/API harnesses, contract verification, Playwright E2E, CI pipelines, TDD enforcement, security scanning, and coverage gates. Inject when testing or regression verification is required.
npx skillsauth add lidge-jun/cli-jaw-skills dev-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
dev-debugging.| Model | Best For | Emphasis | |-------|----------|----------| | Test Pyramid | monoliths, libraries | speed, isolation | | Testing Trophy | modern web apps, REST backends | confidence-to-cost | | Test Honeycomb | microservices, async systems | boundary verification |
| Layer | Default Share | Typical Tools |
|-------|---------------|---------------|
| Static analysis | base layer | tsc, ESLint, mypy, Ruff |
| Unit | ~25% | Vitest, Jest, pytest |
| Integration | ~50% | Supertest, httpx, Testcontainers |
| Contract | ~10% | Pact, OpenAPI validators, Schemathesis |
| E2E | ~10% | Playwright |
| Manual / exploratory | ~5% | human review |
| Problem | Primary Harness | Avoid | |---------|-----------------|-------| | pure business rule | unit / service test | browser test | | route + middleware + serialization | API integration test | mocking the route itself | | DB query / migration / transaction | real DB integration test | fake repository for SQL correctness | | frontend consuming backend JSON | contract test | manual-only verification | | rendered critical flow | Playwright smoke | asserting internal React state |
dev-debugging, then return here for the regression harness.Deep reference:
references/backend-testing.md
| Layer | Verify | TypeScript Default | Python Default | |-------|--------|-------------------|----------------| | Service layer | validation, orchestration, domain errors | Vitest | pytest | | API layer | status, envelope, middleware, auth | Supertest | httpx / ASGITransport | | Repository layer | SQL / ORM correctness | Testcontainers + real DB | Testcontainers + real DB | | Background jobs | idempotency, retry, dead-letter | Vitest + fake clock | pytest + monkeypatch |
real deterministic dependency
→ Testcontainers / ephemeral infra
→ recorded responses / thin fake
→ manual stub / fake
→ framework mock as last resort
Mock dependencies at service boundaries. Use Supertest/httpx for route-level integration tests. Match response envelope shape from backend contracts.
Use a real database when verifying migrations, transactions, unique constraints, foreign keys, query translation, and performance-sensitive SQL. Use Testcontainers for real DB truth in correctness-sensitive persistence tests. Start container in beforeAll/fixture setup, capture connection URI.
fixtures/contracts/ or equivalent.Contract tests protect the frontend↔backend boundary. They sit between API tests and browser tests. Rule: Playwright proves the experience. Contract tests prove the shared shape.
success, data, error, metaerror.coderequestId propagation| Style | Best For | Tooling | |-------|----------|---------| | consumer-driven contract | rapidly changing frontend/backend teams | Pact | | schema-first contract | OpenAPI-led backends | OpenAPI validators, Schemathesis | | type-level contract | TS monorepos | shared types / codegen | | full-stack smoke | final user confidence | Playwright |
PactV3 workflow:
MatchersV3 for flexible matching)pacts/ → publish to broker → provider verifiesSee references/backend-testing.md for full PactV3 example.
Use schema-based API testing (Schemathesis, Dredd) to verify OpenAPI contract compliance.
Use Playwright after API and contract tests are already trustworthy. Browser tests should validate rendered flows, accessibility-critical interactions, and real integration seams that lower layers cannot prove alone. Helper Scripts Available:
scripts/with_server.py - Manages server lifecycle (supports multiple servers)
Run scripts with --help first — treat as black boxes to avoid context window pollution.User task → Static HTML? → Read file → find selectors → write Playwright script
→ Dynamic app? → Server running? → No: `python scripts/with_server.py --help`
→ Yes: Recon-then-action (navigate → screenshot → selectors → act)
# Single server:
python scripts/with_server.py --server "npm run dev" --port 5173 -- python your_automation.py
# Multiple servers:
python scripts/with_server.py \
--server "cd backend && python server.py" --port 3000 \
--server "cd frontend && npm run dev" --port 5173 \
-- python your_automation.py
--help first, invoke directly.sync_playwright() for synchronous scripts; always close the browser.text=, role=, CSS, or IDs.page.wait_for_selector() or page.wait_for_timeout().element_discovery.py - Discovering buttons, links, and inputs on a pagestatic_html_automation.py - Using file:// URLs for local HTMLconsole_logging.py - Capturing console logs during automationFull workflow templates:
references/ci-pipeline.md
quality (lint / typecheck)
→ unit + integration tests
→ contract tests
→ Playwright E2E
→ security scan
→ coverage aggregation + artifacts
Structure CI jobs in dependency chain: quality → backend-tests → contract-tests → e2e
Key configuration:
concurrency.cancel-in-progress: true — avoid wasted runsstrategy.fail-fast: false — for matrix builds--shard=${{ matrix.shard }}/Nnpx playwright install --with-deps chromiumSee references/ci-pipeline.md for full GitHub Actions and GitLab CI templates.
| Dimension | When to Use | |-----------|-------------| | Node / Python version matrix | packages, SDKs, shared libraries | | OS matrix | native modules, CLI behavior | | shard matrix | large suites exceeding CI budget |
npx vitest run --shard=1/4
npx playwright test --shard=1/4 --workers=4
pytest -n auto --dist=loadgroup
| Symptom | First Fix | |---------|-----------| | passes locally, fails in CI | deterministic seeds, containerized deps, explicit waits | | order-dependent failure | reset shared state in fixtures | | green on retry only | remove wall-clock / random assumptions | | screenshot noise | stable CI image, mask dynamic regions | Protocol: detect → quarantine if blocking → assign owner → reinstate after repeated green runs.
When ENFORCE_TDD=true is set in project instructions or explicitly requested, this section becomes mandatory.
| Check | Pass Criteria | |-------|--------------| | Test written before implementation? | test file added / updated before or with code | | Failure observed before fix? | red state was actually executed | | Behavior-focused assertions? | checks outputs, side effects, contracts | | Regression locked in? | failing case is now protected by a persistent test |
| Style | Best For | |-------|----------| | London / mockist | orchestration-heavy boundaries | | Chicago / classicist | domain logic and transforms | | Hybrid | most production code | Default to Hybrid: mock external systems, keep internal collaboration real unless it becomes too slow or unstable.
dev-testing owns the regression harness and enforcement loop.dev-debugging owns root-cause methodology once a failure is mysterious or multi-layered.dev-debugging isolates the cause, come back here to lock it in with tests.When an AI writes and reviews its own code, it carries the same assumptions into both steps. Automated tests break this feedback loop.
| Pattern | Description | Test Strategy | |---------|-------------|---------------| | Sandbox/production mismatch | Fix applied to one code path, not both | Assert same response shape in both modes | | SELECT clause omission | New field in response but missing from DB query | Assert all required fields are present and defined | | Error state leakage | Error set but stale data not cleared | Assert state cleanup on error transitions | | Missing rollback | Optimistic UI update without recovery on failure | Assert state restoration after simulated API error |
Name regression tests with BUG-R{N} convention. Assert all required fields with a loop.
When the project supports a sandbox/mock mode, use it for fast DB-free regression testing:
process.env.SANDBOX_MODE = 'true'→ Delegated: threat modeling and secure design policy belong to dev-security.
This section covers the automated test hooks and CI gates that enforce those rules.
fast local checks
→ Semgrep / CodeQL gate
→ dependency audit
→ auth / validation regression tests
npm audit --audit-level=high
pip-audit --strict --desc
- uses: returntocorp/semgrep-action@v1
with:
config: >-
p/default
p/javascript
p/typescript
p/python
Test missing auth (expect 401) and verify error.code matches contract for every auth-protected endpoint.
These are project/risk-based, not universal minimums. Adjust for your context.
| Metric | Suggested Floor | Ideal | |--------|-----------------|-------| | Line coverage | 70% | 85%+ | | Branch coverage | 60% | 80%+ | | Function coverage | 80% | 90%+ | | Diff coverage | 80% | 90%+ |
| Metric | Target | |--------|--------| | Defect detection rate | > 80% | | Mean time to detect | < 1 CI run | | Test signal-to-noise | > 95% | | Contract drift rate | near 0 |
npm test -- --coverage
npx vitest run --coverage
pytest --cov --cov-report=xml
ENFORCE_TDD requirements were followed if enabledrequestId, pagination, and nullability are verified where relevantIf you can only point to a manual click-through or one green Playwright run, the testing story is incomplete.
unit / service
→ API integration
→ contract verification
→ Playwright smoke
→ CI gate + coverage + security scan
development
Goal execution guidelines with PABCD integration, verification tiers, documentation workflow, and AI-driven planning
tools
A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.
development
Use this skill any time a spreadsheet file is the primary input or output (.xlsx, .xlsm, .csv, .tsv). This includes: creating, reading, editing, analyzing, or formatting spreadsheets; cleaning messy tabular data; converting between formats; and data visualization with charts. Also use for pandas-based data analysis when the deliverable is a spreadsheet. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration.
tools
Use this skill when the user wants to build a financial model, 3-statement model, DCF valuation, cap table, scenario analysis, or financial projections in Excel. Trigger on: 'financial model', '3-statement model', 'DCF', 'cap table', 'pro forma', 'projections', 'sensitivity analysis', 'waterfall', 'debt schedule', 'break-even', 'discounted cash flow', 'capitalization table', 'fundraising model', 'WACC calculation', 'scenario analysis model'. Input is a text prompt with assumptions. Output is a single .xlsx file with formula-driven, interconnected statement sheets.