skills/qa/SKILL.md
Verify the running thing works. Browser walks for web, request replay for APIs, shell smoke for CLIs, consumer builds for libraries, tool-call replay for MCP. "Tests pass" is not QA. Use when: "run QA", "verify the feature", "test this", "check the app", "exploratory test", "QA this PR", "smoke test", "manual testing", "capture evidence". For generated repo QA skills, use /create-repo-skill qa. Trigger: /qa.
npx skillsauth add phrazzld/agent-skills qaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every app has a QA path. The question this skill answers first is not "how do I drive a browser?" — it's "what shape is this app, and what does verifying it actually look like here?" A CLI is QA'd with shell invocations and exit-code audits. An API is QA'd by replaying requests against a preview deploy. A Next.js app is QA'd by walking golden paths in a browser. A library is QA'd by installing it into a sandbox consumer. An MCP server is QA'd by replaying tool calls through the harness that registered it. None of these are optional; all of them are QA.
The skill's job is to route to the right shape, then either defer to a project-local QA skill that encodes this repo's actual paths, or run a quick protocol on the fly.
Route to /hardening acceptance when QA relies on examples whose values should
matter: Gherkin scenarios, API fixtures, CLI transcripts, golden files,
seeded workflows, screenshots with asserted data, or contract examples. QA can
still pass without it when the check directly drives the live behavior, but the
report must name whether acceptance mutation was run, waived as not useful, or
left as residual risk.
Every QA pass/fail report includes:
See harnesses/shared/AGENTS.md (Completion Evidence) for the shared evidence
core; this phase keeps QA-specific local fields.
## Completion Gate
- Exact end-user behavior verified: user or operator behavior exercised through the running surface.
- App shape and live path chosen: browser, API, CLI, library, MCP, hybrid, or other concrete surface.
- Value proposition exercised: specific promised outcome the QA walk covered.
- Persona outcome observed: persona-specific success or failure observed in the run.
- Live repo evidence read: package/config/docs/routes/commands/tests used to choose the QA path.
- Acceptance source: oracle, ticket, spec, fixture, route, command, or explicit absence.
- Evidence that proves it: screenshot, trace, transcript, request replay, or artifact path.
- Artifact/evidence location: committed evidence path, temp artifact path, screenshot, transcript, or log.
- Exact command/path/route exercised: command, URL, route, file path, or tool call actually run.
- Repo-fit check: local QA convention or repo contract followed.
- Adjacent-tests justification: why adjacent tests are sufficient, or why they are not runtime proof.
- Acceptance mutation / hardening: mutation/hardening run, recommendation, or waiver reason.
- Observability / instrumentation debt: named post-ship signal exists, was added, or is logged as debt.
- Residual risk: unverified path, uncovered persona, or none with reason.
For internal libraries or harness changes, replace end-user behavior with the developer/operator behavior under test. Adjacent unit tests are supporting evidence only; QA must name the running surface it actually exercised. Adjacent tests are enough only when they invoke the exact changed public surface; otherwise QA must say what live path remains unverified. If the changed behavior has no named post-ship signal, log, receipt, benchmark, or evidence artifact, report it as instrumentation debt rather than hiding it inside generic residual risk.
When .harness-kit/work/ledger.jsonl is available, /qa calls
scripts/work-ledger.py append with phase_started at QA start,
next_action_changed when evidence is captured, blocker_added for failed or
unverified critical paths, and phase_completed when QA passes or is waived.
Each event links evidence refs and any trace refs already known.
You are the executive orchestrator.
Delegation floor applies: probe the roster first; dispatch two or more
providers for substantive work; direct solo only for mechanical, emergency,
user-forbidden, or fewer-than-two-providers cases. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use one lane to drive the running surface and another to attack the evidence, edge cases, and product judgment behind the pass claim.
Before anything else, answer: what kind of app is this, and what does "verify it works" look like for that kind? The shape determines the path; the path determines the tools.
Signals to read (any of these; stop when you have enough):
package.json — "bin" field → CLI. "main" + no bin + no
framework deps → library. next / remix / astro / vite +
framework routes → browser web app. express / fastify /
hono + no SSR pages → API service.playwright.config.*, cypress.config.* → browser harness already
wired; use it when the path is browser.mcp/, servers/mcp/, @modelcontextprotocol/* in deps → MCP server.Dockerfile, fly.*.toml, vercel.json, .github/workflows/*deploy*
→ deploy target hints; helps pick "which preview do I hit?"Cargo.toml ([[bin]] vs [lib]), pyproject.toml (scripts vs
package), go.mod (cmd/ tree), etc., for non-JS stacks.Map the shape to the path:
| App shape | QA path |
|---|---|
| Browser web app (Next.js, Remix, SvelteKit, SPA) | Start dev server; walk golden paths; scan console errors; scan network panel for 4xx/5xx; optionally Playwright if playwright.config.* exists |
| API / serverless / backend service | Replay representative requests (curl, HTTPie, .http file, Postman export) against a preview deploy or local server; spot-check JSON contract; enumerate error statuses; verify auth paths |
| CLI | Shell-driven smoke: --help, key invocations with representative flags, malformed-input paths; audit exit codes; audit error messages for clarity |
| Library / SDK | Install into a sandbox consumer project (npm pack && npm i <tarball>, cargo add --path, pip install -e .); exercise the public API; check type surface; verify no runtime import of dev-only deps |
| MCP server / agent tool | Register with the harness (claude mcp add / Codex /plugins); replay tool calls; inspect responses; confirm error paths return structured failures, not crashes |
| Hybrid (e.g. Next.js app + MCP server + CLI) | Pick the path per surface touched by the change; do not pretend one path covers all |
If the shape is ambiguous, name both candidates and ask — do not silently pick one.
| Intent | Action |
|---|---|
| "scaffold" first arg, or "scaffold qa" / "generate qa skill" | Prefer /create-repo-skill qa; references/scaffold.md is the legacy template |
| Commit-triggered, PR-triggered, or outer-loop-triggered user-like scenario evidence | Run Step 0, then use references/per-commit-lane.md with the shape-specific driver |
| Project-local QA skill exists (.agent/skills/qa/ or .claude/skills/qa/ bridged to shared root) | Defer — it already encodes this repo's shape and paths |
| No project-local skill; need to verify something right now | Run the quick protocol below, routed by Step 0's shape |
Shaped by Step 0. Pick the matching sub-protocol.
Set evidence root before capture:
source scripts/lib/evidence.sh 2>/dev/null || true
EVIDENCE_DIR="$(evidence_dir_create 2>/dev/null || printf '.evidence/manual/%s/\n' "$(date -u +%Y-%m-%d)")"
mkdir -p "$EVIDENCE_DIR"
Use EVIDENCE_DIR for screenshots, transcripts, request captures, and notes.
Only fall back to a temporary directory when the target is not in a git repo
and has no evidence helper.
$EVIDENCE_DIR — screenshots on anomaly,
accessibility snapshot on ambiguity.For browser tool selection (Playwright MCP, Chrome MCP, agent-browser),
read references/browser-tools.md. For evidence capture conventions
across tools, read references/evidence-capture.md.
.http files, Postman
collection, README examples, or the endpoint list you mapped).$EVIDENCE_DIR: request/response pairs as
.json or .http transcripts, plus a short findings note.<bin> --help / <bin> <subcommand> --help — confirm help
text matches the code's actual flags.$EVIDENCE_DIR: terminal transcripts
(script -q or tee'd stdout/stderr).npm pack, cargo build --release,
python -m build, etc.).$EVIDENCE_DIR/consumer).tsc --noEmit in the consumer;
Rust: consumer cargo check; Python: mypy against a stub).claude mcp add <name> <cmd>
for Claude Code; equivalent /plugins flow for Codex).$EVIDENCE_DIR: tool-call transcripts
(input JSON, output JSON, server logs).When a repo vendors or specializes this skill, the local guidance should name the single QA path that applies to THIS codebase, with the actual commands, URLs, route list, endpoint list, or tool-call list embedded.
Do not read "no playwright.config.* in the repo" as "skip QA."
That reading is the canonical Norman-bug in this skill's history. The
correct reading is: "Playwright isn't the QA path here; name the one
that is." Every repo has a QA path. Find it. Write it. If you cannot
name the path, that is a signal to ask the user, not to ship a generic
rewrite.
/qa scaffold./deliver expects a scaffolded skill. If /deliver invokes
/qa and lands on this generic fallback, scaffold before
continuing — the delivery gate wants repo-specific coverage.references/scaffold.md — project-local QA skill generator.references/per-commit-lane.md — commit/PR/outer-loop scenario evidence
contract; consulted after Step 0 resolves the app shape.references/browser-tools.md — long-form browser-automation guide;
consulted only when Step 0 resolves to "browser web app".references/evidence-capture.md — cross-tool evidence conventions
(screenshots, transcripts, GIFs, network logs).development
Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.
testing
Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.
data-ai
Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.
testing
Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.