skills/qa-test/SKILL.md
Browser-based QA verification after any implementation. Use when someone says "QA this", "test this in browser", "verify the feature", "qa test", "browser test", or after completing an /implement-change to verify acceptance criteria in a real browser. Opens Chrome via MCP, exercises each acceptance criterion, verifies via DOM snapshots, and reports pass/fail. The "closer" for every implementation — proof it works, not just that tests pass.
npx skillsauth add teambrilliant/dev-skills qa-testInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Verify implemented features in a real browser. Exercise each acceptance criterion, verify via snapshots, report results.
Context-efficient design: browser testing runs in a sub-agent so snapshot/interaction data stays out of the main thread. Main thread only sees compact pass/fail summaries.
Sub-agent rule: Always use sub-agents for browser interaction and pre-flight exploration — these are broad, multi-step tasks that generate significant context. Fan out pre-flight and browser testing as separate sub-agents when they're independent. Handle directly only when verifying a single, specific element or reading one file for criteria.
Launch an Explore sub-agent before any browser interaction to gather all context in parallel.
Sub-agent prompt:
Gather QA pre-flight context for testing. Return a structured JSON block with:
1. **acceptance_criteria**: List of testable criteria. Check these sources in order,
stop at the first that has criteria:
- **The shape doc referenced by the implementation plan** (canonical source):
look for `## Acceptance Criteria` section in `thoughts/plans/*.md` — it should
link to a `thoughts/research/*.md` shape doc. Read that shape doc's
`### Acceptance Criteria` section and use those criteria verbatim.
- The user's prompt (if criteria were given explicitly)
- Current plan file (if it lists criteria directly — unshaped work path)
- Current PR description: run `gh pr view --json body` via Bash
- Current branch diff: run `git diff main...HEAD --stat` then read changed files
to infer what user-visible behavior changed (LAST RESORT — unshaped work only)
- Linked issue: check PR body for issue references, fetch with `gh issue view`
Return `criteria_source` alongside the criteria so the main thread knows which
tier was used (shape doc / prompt / plan / PR / diff / issue).
2. **test_url**: Where to test. Check in order:
- `.tap/tap-audit.md` Environments section
- `package.json` scripts for `dev`, `start`, or similar
- Common defaults: localhost:3000, :5173, :4321, :6886
3. **app_running**: Try to fetch the test_url via `curl -s -o /dev/null -w '%{http_code}'`.
Return the status code. If not running, return the dev command that would start it.
4. **test_pages**: List of specific page URLs/routes to visit based on the changed files
(e.g., if `modules/campaigns/` changed, the test page is likely `/campaigns/...`)
5. **db_available**: Check if postgres MCP tools are available (search for
`mcp__postgres__execute_sql` or similar). Return true/false.
6. **has_async_flows**: Based on the changed files, flag whether the feature involves
background jobs (Temporal workflows, queues, webhooks) that need async verification.
7. **needs_login**: Whether the app requires authentication. Check for login pages,
auth middleware, or session requirements in the codebase.
Return results as a structured summary, not raw tool output.
Using the pre-flight results:
app_running is not 200, start the dev server (background) and wait for itacceptance_criteria as the test plantest_pages to know where to navigate firstdb_available, include database verification stepshas_async_flows, use the async testing patternneeds_login, prompt user for interactive setup before launching browser sub-agentFallback (no sub-agent): If sub-agents are unavailable, gather criteria and resolve URL sequentially.
<details> <summary>Manual fallback: Gather criteria and resolve URL</summary>Gather acceptance criteria from (in priority order):
thoughts/research/*.md via
thoughts/plans/*.md) — canonical source when it existsIf no criteria found, ask in human mode. In agent mode, infer from the diff.
Resolve test URL (in priority order):
.tap/tap-audit.md → Environments sectionpackage.json scripts → dev, start, or similarhttp://localhost:3000, http://localhost:5173, http://localhost:4321Verify the app is running before proceeding.
</details>Before launching the browser testing sub-agent, handle anything that's hard to automate in the main thread. The browser state persists since the sub-agent connects to the same Chrome instance.
When to prompt for interactive setup:
needs_login is true → ask user: "App requires login. Want me to navigate to login page so you can sign in, or should I attempt automated login?"What to do:
If no interactive setup is needed, skip directly to step 3.
Launch a general-purpose sub-agent for all browser interaction. This keeps snapshot/interaction data out of the main thread context.
Sub-agent prompt template:
You are running browser-based QA tests. The browser is already open and may already
be logged in / set up.
Test URL: {test_url}
Acceptance criteria to verify:
{numbered list of criteria}
Additional context:
- DB tools available: {db_available}
- Has async flows: {has_async_flows}
- Test pages: {test_pages}
## Evaluator mindset
You are an independent evaluator. Your job is to be skeptical, not helpful. The
generator already believes its work is correct — your value is catching what it
missed. Rules:
- **Binary outcomes**: a criterion fully passes or it fails. If you catch yourself
writing "PASS with caveats" or "close enough", mark it **FAIL** and state the gap.
- **Evidence required**: for each PASS, cite what you observed — element text, URL
after action, console/network status, DB row. PASS with no cited evidence → FAIL.
- **Do not rationalize**: if something looked off but "probably works", mark it FAIL
or PARTIAL and describe what looked off. Let the human/agent decide if it's
acceptable — that's not your call.
- **Specific bugs, not vague assessments**: when a criterion fails, file a concrete
actionable finding ("Delete button calls `/api/items/:id` which returns 500;
expected 204") — not "delete doesn't work."
## How to test
Use Chrome MCP tools (`mcp__chrome-devtools__*`).
**Snapshot-first workflow** — use `take_snapshot` for BOTH finding elements AND
verifying results. Do NOT use `take_screenshot` unless a criterion fails and you
need visual debugging evidence.
**For each criterion:**
1. Navigate to the relevant page
2. `take_snapshot` → get element UIDs and current state
3. Interact via UIDs (`click`, `fill`, `hover`)
4. `take_snapshot` → verify state changed as expected
5. Check `list_console_messages` for errors
6. Check `list_network_requests` for failed requests (4xx, 5xx)
**Important**: UIDs are ephemeral — always take a fresh snapshot before interacting.
**On failure only**: `take_screenshot` and save to `./qa-evidence/` for debugging.
**React/SPA hover interactions:**
Chrome DevTools `hover` only triggers CSS `:hover`, NOT JS `mouseenter`/`mouseover`.
If a UI element only appears via React's `onMouseEnter`:
1. Try `click` directly in the area
2. If that fails, `evaluate_script` to dispatch mouseenter event
3. `take_snapshot` to confirm
**Testing patterns:**
- Form submission: fill → submit → snapshot to verify success + check no errors
- Navigation: click → snapshot to verify new state + check URL
- State changes: trigger action → snapshot to verify → reload → snapshot to verify persistence
- Async: trigger → snapshot for intermediate state → poll snapshots → verify final state
- Error states: trigger invalid input → snapshot to verify error messaging
**Always check:**
- Console errors (JS exceptions)
- Failed network requests (4xx, 5xx)
## Report format
Return ONLY a compact summary in this exact format:
RESULT: [PASS / FAIL / PARTIAL]
CRITERIA:
1. [criterion] — PASS/FAIL — [one-line observation]
2. [criterion] — PASS/FAIL — [one-line observation]
...
ERRORS: [any console errors or failed network requests, or "none"]
FAILURES: [for any failed criterion: what happened, what was expected,
screenshot path if captured]
NEEDS_MANUAL: [any criteria that couldn't be tested due to automation
limitations — e.g., drag-and-drop, complex gestures]
The sub-agent returns a compact summary. Present it to the user.
Human mode: Show the summary. If any failures, ask: "Want me to fix this and re-test, or is this expected?"
Agent mode: If all pass, proceed (e.g., open PR). If any fail, attempt fix-and-retest.
For UI-visible changes: after functional QA passes, offer design-fidelity QA as the second half of the closer — "Want to run /dev-skills:design-language Review against the changed components? Functional pass doesn't mean it matches the product's design language." This only applies when the diff touches user-visible UI (components, layouts, copy); skip for pure backend or infra changes.
Automation failures (NEEDS_MANUAL):
Code failures (FAIL):
Agent mode:
After 2 failed cycles, escalate — do NOT keep iterating. Branch by failure shape:
/dev-skills:implementation-planning with the failing
criterion and the observed behavior; do not attempt a third code fix.expected: … vs
observed: … plus the specific bug finding from the evaluator. Stop.Human mode:
Include in the sub-agent prompt when db_available is true. For features that create or modify data:
tools
Harvest course-corrections from the current conversation and convert them into durable fixes so the agent doesn't need the same steer next time. Use when someone says "tighten the loop", "tighten loop", "debrief this session", "session debrief", "what should I update so next time you don't…", "how can I make your life easier", "what tripped you up", "what slowed you down", or at the end of a session when the user reflects on agent friction. The scope marker is THIS SESSION / THIS CONVERSATION — that's the discriminator from sibling skills. Reads the transcript only, classifies each steer, routes to the right fix tool. NOT for: assessing repo readiness before work (use loop-check), retros on past PRs/incidents/releases (use tap-skills:retrospective), coding, or applying edits inline.
development
Capture and enforce a product's visual design language — principles and patterns that make it feel like itself. Use when the user wants to: distill design inspiration into a durable reference ("capture this design from figma", "distill this screenshot", "extract our design language", "capture design direction"), or check an implementation against the product's design language ("review design fidelity", "check against design direction", "does this match our design?", review a PR for design drift). Core signal: user points at a design source (Figma URL, screenshot, live URL) OR built UI and asks how it relates to the product's visual language. Operates on a consumer-repo `docs/design.md` — proposes diffs, never writes directly. Pairs with shaping-work, implementation-planning, implement-change, and qa-test. NOT for: generating new UI (→ frontend-design), translating a Figma design into code (→ figma-implement-design), accessibility audits (→ a11y-debugging), or token extraction.
development
Use for cross-domain strategic reasoning, approach selection, and systems-level analysis. Trigger when the user wants to: think through how to approach a problem, evaluate tradeoffs between architectural or technical approaches, sanity-check a plan or direction, understand second-order effects of a decision, get a holistic view across code/org/time dimensions, or pressure-test assumptions. The core signal is the user asking "what's the right approach?", "think about this", "what am I not seeing?", "sanity check", "tradeoffs", "how should we tackle this?", or any request for multi-level reasoning that spans product, architecture, and organization. Also use when the user needs help deciding which workflow skill to invoke next. NOT for: product/user/business decisions (→ product-thinker), work definition (→ shaping-work), file-level technical planning (→ implementation-planning), writing code, test authoring, or PR review.
development
Shape rough ideas into clear, actionable work definitions. Use this skill whenever someone has an unstructured idea that needs to become a concrete work definition — feature requests, bug reports, PRDs, customer feedback, Slack threads, stakeholder asks, or vague "we should do X" statements. Trigger phrases include "shape this", "scope this", "write a PRD", "define this work", "turn this into a ticket", "flesh this out", "spec this out", "what should we build for X", "I have an idea for...", or any rough input that needs structure before implementation can begin.