skills/qa/SKILL.md
Verify the running thing works. Browser walks for web, request replay for APIs, shell smoke for CLIs, consumer builds for libraries, tool-call replay for MCP. "Tests pass" is not QA. Use when: "run QA", "verify the feature", "test this", "check the app", "smoke test", "exploratory test", "capture evidence". Trigger: /qa.
npx skillsauth add phrazzld/spellbook qaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every app has a QA path. The first question is not "how do I drive a
browser?" — it's "what shape is this app, and what does verifying it look
like here?" If the repo has its own QA/verification skill, defer to it: it
encodes the actual routes, commands, and golden paths. If it doesn't, that
absence is a harness gap — run the protocol below AND flag the gap to
/groom so the repo grows one.
Read the signals (package.json bin/framework deps, playwright.config.*,
Cargo.toml bin vs lib, cmd/ trees, MCP deps, deploy configs) and pick:
| Shape | QA path |
|---|---|
| Browser app | Start dev server or hit preview; walk the golden paths the change touched; watch console + network panel for errors |
| API / service | Replay representative requests against local/preview; check status, contract shape, and error paths (bad auth, malformed body) |
| CLI | --help accuracy, happy-path invocations from the docs, malformed-input paths; audit exit codes and error-message clarity |
| Library / SDK | Build the distributable, install into a throwaway consumer, exercise the changed public API, check the type surface |
| MCP / agent tool | Register with a harness, replay each affected tool call, confirm errors come back structured rather than crashing the server |
| Hybrid | One path per surface the change touched — one path does not cover all |
Ambiguous shape: name both candidates and ask; don't silently pick.
The canonical misread: "no playwright config" does not mean "skip QA." It means Playwright isn't the path — name the one that is. If you can't name a path, ask; never ship a generic shrug.
Drive the changed surface specifically — happy path first, then the edges the change plausibly broke. Capture evidence as you go (screenshot on anomaly, terminal transcript, request/response pairs) under the repo's evidence convention or a dated scratch dir; link the specific artifact in the report, not just a directory name.
When the verification leans on examples whose values matter (golden files, fixtures, seeded data, asserted screenshots), spot-check that a wrong value would actually fail — mutate one and watch it catch. Weak oracles that pass on anything are the most expensive kind of green.
Classify findings: P0 blocks ship, P1 fix before merge, P2 log and move on.
A pass report names: the exact surface exercised (command/URL/route/tool call), what was observed, the evidence artifact, what was NOT covered, and whether a post-ship signal exists for this behavior (if nothing would page or log when it breaks, say so — that's instrumentation debt, not a footnote). For AI-feature surfaces, a post-ship signal means behavior-level classifiers — hallucination, tool failure, refusal, user frustration — not just exception logging; stack traces don't fire when an agent confidently does the wrong thing. When the same agent drove the app and judges the result, have a fresh subagent attack the pass claim before signing off: what path would embarrass us in production?
references/browser-tools.md,
references/evidence-capture.md.tools
Enumerates the peer AI agent CLIs installed on this machine (codex, claude, pi, opencode, cursor-agent, grok, agy, hermes, thinktank) and how to invoke each headlessly. A capability map, not a quota: useful for fresh-context adversarial review on a different model family, second opinions, competing attempts, and wide benches. Use when: "ask codex", "ask another model", "second opinion", "cross-model review", "what AI tools do I have", "other agents", "different model family", "adversarial critique from another provider". Trigger: /roster.
development
Run lane cards on Fly Sprites: remote, isolated, scale-to-zero sandboxes for heavy or parallel agent work. Golden-checkpoint provisioning so lanes start on a ready sprite with zero setup tokens. Use when: "run this on a sprite", "remote lane", "offload to a sandbox", "dispatch to sprites", "bake a sprite", "sprite fleet", heavy/long-running/parallel sub-agent work that should not run on this machine. Trigger: /sprites, /sprite-lane.
testing
Compose and launch roster-backed specialist lanes with prompt-native lane cards and receipts. Use when: "dispatch agents", "use subagents", "compose a team", "run provider lanes", "make lane cards". Trigger: /dispatch, /subagents, /lanes.
tools
Fast session-start repository orientation from live local evidence. Use when: "orient yourself", "start of session", "new session", "where are we", "catch me up before acting", "what should I do next", after compaction, after switching worktrees, or before choosing a Harness Kit workflow. Trigger: /orient, /ground, /session-start.