skills/testing/SKILL.md
Characterization testing and safety-net backfill for existing code. Use when legacy, under-tested, or risky code needs tests before a refactor, bug fix, or behavior change. Captures current behavior through public interfaces, identifies coverage gaps, and adds focused unit, integration, or E2E tests without replacing TDD implementation workflows.
npx skillsauth add howells/arc testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
<tool_restrictions>
AskUserQuestion — Preserve the one-question-at-a-time interaction pattern. In Claude Code, use the tool. In Codex, ask one concise plain-text question at a time unless a structured question tool is actually available in the current mode. Do not narrate missing tools or fallbacks to the user.EnterPlanMode — BANNED. Do NOT call this tool. This skill has its own structured testing workflow. Execute it directly.ExitPlanMode — BANNED. You are never in plan mode.
</tool_restrictions><arc_runtime> This workflow requires the full Arc bundle, not a prompts-only install.
Paths in this skill use these conventions:
agents/..., references/..., disciplines/..., templates/..., scripts/..., rules/..., skills/<name>/... are Arc-owned files at the plugin root. Resolve the plugin root from this skill's filesystem location — it's the directory containing agents/ and skills/../... is local to this skill's directory..ruler/..., docs/..., src/..., or any project-relative path refers to the user's project repository.
</arc_runtime>Backfill focused tests around existing code before a risky change. The goal is not "more tests" in the abstract; it is a trustworthy safety net around behavior that must survive a refactor, migration, or bug fix.
Use this skill when:
Do not use this skill as the normal new-feature workflow. For new work, use /arc:implement or a dedicated TDD skill so RED/GREEN/REFACTOR remains the governing loop.
<required_reading> Read before testing:
references/testing-patterns.md — Test philosophy, vitest/playwright patternsreferences/testing-anti-patterns.md — What weak or misleading tests look likerules/testing.md — Arc testing conventionsdisciplines/change-impact-testing.md — Blast radius analysis for code changesreferences/llm-api-testing.md — If testing LLM integrationsreferences/maintainability-review.md — If tests are being added before decomposing a god file or tangled modulereferences/complexity-optimization.md — If tests are being added before optimizing algorithmic complexity, rendering churn, or N+1 behavior
</required_reading>Use specialist agents only when the slice is large enough to justify delegation:
| Agent | Model | Purpose | Framework |
|-------|-------|---------|-----------|
| unit-test-writer | sonnet | Characterize pure functions, hooks, or isolated components | vitest |
| integration-test-writer | sonnet | Characterize API, auth, state, and component integration behavior | vitest + MSW |
| e2e-test-writer | opus | Characterize critical browser journeys | Playwright |
| test-runner | haiku | Run unit/integration suites and analyze failures | vitest |
| e2e-runner | opus | Run Playwright, inspect screenshots/traces, iterate on failures | Playwright |
<rules_context> Check for project testing rules:
Use Glob tool: .ruler/testing.md
If it exists, read it for MUST/SHOULD/NEVER constraints.
Detect test framework:
| File | Framework |
|------|-----------|
| vitest.config.* | vitest |
| jest.config.* | jest |
| playwright.config.* | Playwright |
| package.json scripts | Project-specific test commands |
</rules_context>
Ask one question only if the target is unclear:
AskUserQuestion:
question: "What existing code or behavior needs a safety net before we change it?"
header: "Test Target"
Then identify:
Gather evidence before writing tests:
Do not silently fix production behavior during baseline work. If you discover an obvious bug, capture it as either:
List behavior in terms of callers or users, not internal implementation details:
## Safety Net: [Target]
### Planned Change
- [Refactor / bug fix / migration / cleanup]
### Public Interfaces
- [Function/component/API route/page/CLI command]
### Current Observable Behavior
| Behavior | Evidence | Risk |
|----------|----------|------|
| [behavior] | [code path, existing test, manual observation] | [high/medium/low] |
### Test Slices
| Slice | Level | Why this level |
|-------|-------|----------------|
| [one behavior] | [unit/integration/e2e] | [fastest useful proof] |
For each slice:
If existing code is hard to test:
Mocks are acceptable for true boundaries: network, time, filesystem, database, auth providers, payment providers, and external LLM APIs. Prefer real code inside the project boundary.
Run checks in widening order:
When E2E output is verbose or flaky, dispatch e2e-runner with the exact test file and failure evidence.
End with a concise report:
## Safety Net Result
**Target:** [code/feature]
**Reason:** [refactor/bug fix/legacy coverage/launch risk]
**Tests added:** [files]
**Behavior characterized:**
- [behavior]
**Verification:**
- [command] — [pass/fail]
**Remaining risk:**
- [untested behavior or reason it was deferred]
**Ready for next change:** [yes/no]
| Level | Use when | Avoid when | |-------|----------|------------| | Unit | Pure functions, deterministic formatting, isolated hooks, small state transitions | Behavior depends on routing, browser, API, auth, or multiple components | | Integration | Component + state, API routes, auth states, form submissions, data adapters | A single pure function is enough or only a real browser proves it | | E2E | Critical user journeys, auth flows, checkout/signup, routing/browser behavior | The behavior can be proven faster below the browser |
| Feature Type | First Useful Backfill | Notes | |--------------|----------------------|-------| | Utility functions | Unit | Cover edge cases and invariants through exported functions | | UI components | Integration | Prefer user-visible behavior over snapshots | | Forms | Integration | Add E2E only for critical end-to-end flows | | API routes | Integration | Exercise request/response behavior and error paths | | Auth flows | Integration + selective E2E | Mock provider states below browser; use real/browser flow sparingly | | Checkout/payment | Integration + E2E | Mock external provider below browser; keep one critical browser path | | LLM integrations | Unit/integration with fixtures | Avoid live calls unless explicitly required |
Use this only when auth behavior is part of the safety net.
Integration tests:
useAuth and useUser hooks.getToken for API calls.E2E tests:
tests/auth.setup.ts for login flow.playwright/.auth/user.json.storageState in playwright.config.ts.Common issues:
ClerkProvider instead of hooks.isLoaded: false state.getToken mock.Integration tests:
getUser from @workos-inc/authkit-nextjs.organizationId, role, and permissions.E2E tests:
/api/auth/test-login for faster auth in test environments only.Common issues:
organizationId in org-level features.For faster E2E tests, create a test-only auth endpoint:
// app/api/auth/test-login/route.ts
// ONLY available in test/development
export async function POST(request: Request) {
if (process.env.NODE_ENV === "production") {
return new Response("Not found", { status: 404 });
}
// Create session directly without SSO flow
}
Tests must fail fast. Never:
Playwright config:
export default defineConfig({
timeout: 30_000,
expect: {
timeout: 5_000,
},
use: {
actionTimeout: 10_000,
},
});
<success_criteria> The safety-net pass is complete when:
development
Go-live and shareability checklist covering the basics needed to make a project visitable, shareable, and ready for a first real audience. Use when asked to "launch", "go live", "make this shareable", "get this ready to show people", or prepare a project for a public URL.
development
Discover architectural friction and propose structural refactors with competing interface options. Focuses on deepening shallow modules, extracting grouped concerns into packages/modules, breaking up god files, reducing duplication, and improving testability. Use when asked to "improve the architecture", "find refactoring opportunities", "deepen modules", "consolidate coupling", "break up god components", "extract this into a package", "make this more testable", or "find architectural friction".
development
Create, review, or revise a concise project vision document that captures what a project is, who it is for, why it exists, success criteria, constraints, non-goals, and decision principles. Use when starting a new project, clarifying product direction, aligning a codebase for future agent work, defining a north star, or turning a vague idea into docs/vision.md.
tools
Use when starting any conversation - establishes Arc's skill routing, instruction priority, and bootstrap rules