plugins/flow-next/codex/skills/flow-next-qa/SKILL.md
Live-app real-user QA pass derived from the spec. Drives the running app via flow-next-drive, derives scenarios from the spec's AC / R-IDs / boundaries, files structured P0/P1/P2 findings with evidence, and ends with a YES/NO ship verdict receipt. Triggers on /flow-next:qa with a spec id. FORBIDDEN from marking PASS by reading source — the verdict rests on captured evidence from the live app, never on agent narration.
npx skillsauth add gmickel/gmickel-claude-marketplace flow-next-qaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
flow-next's review surface today is all static: impl-review, spec-completion-review, quality-auditor, code-review. Nothing drives the running app like an unforgiving real user. /flow-next:qa fills that gap — it drives the deployed app (via fn-51 flow-next-drive), files structured P0/P1/P2 findings with evidence, and ends with a YES/NO ship verdict emitted as a proof-of-work receipt.
The differentiator vs spec-less QA tools is the spec is the source of intent: flow-next derives test scenarios directly from the spec — acceptance criteria → scenarios, R-IDs → coverage, boundaries → what NOT to test, decision context → expected behavior. The host already encodes intent instead of reconstructing it. The QA discipline (P0/P1/P2 taxonomy, evidence rules, session hygiene) is a lean borrow from Ray Fernando's running-bug-review-board skill (Apache-2.0 — credited in CHANGELOG); flow-next stays lean (no 18-reference port, ≤500-line skill cap).
Read workflow.md for the full phase-by-phase execution (discover → derive → prepare → execute → file → verdict).
QA must NEVER mark PASS (SHIP) by reading source code. A live-app QA pass is the gap that all other flow-next review already covers statically. The verdict rests on captured evidence from the running app — screenshots, console dumps, observed state — never on agent narration, never on "the code looks correct", never on inferring behavior from the diff. If no live app is reachable (no deploy or no driver), the outcome is BLOCKED (could not verify), not PASS. This rule is load-bearing — it is what makes the skill a real-user QA pass rather than a second static review.
CRITICAL: flowctl is BUNDLED — NOT installed globally. which flowctl will fail (expected). Define once; subsequent blocks (here and in workflow.md) use $FLOWCTL. Subagents that run in fresh context fall back to the repo-local copy:
FLOWCTL="$HOME/.codex/scripts/flowctl"
[ -x "$FLOWCTL" ] || FLOWCTL=".flow/bin/flowctl"
Ask the user via plain text. Render the options below as a numbered list 1. … N., followed by a final option N+1. Other — type your own answer. Print the question, then the numbered list, then stop and wait for the user's next message before continuing. Parse the reply as: a bare number 1–N+1 → that option; the literal text of an option label → that option; free text after Other → custom answer.
Inline skill (no context: fork) — runs on the host agent, not a forked subagent, because the prepare phase must ask the user for undocumented facts (target URL / test account — info-only, never a confirm gate) and a forked subagent cannot ask the user back (Claude Code issues #12890, #34592). The host asks via plain-text numbered prompt.
Parse $ARGUMENTS. The first non-flag token is the spec id (required). The value-taking caller overrides the downstream phases honor — --target <url> (Phase 3.1), --receipt <path> (Phase 6.3), and --base <ref> (§1.2 base-branch override) — must consume their operand here (both --flag value and --flag=value forms, mirroring make-pr's --base), or the operand falls through to the *) arm and is mis-assigned as SPEC_ID (Phase 1 then rejects the URL/path as "Not a spec"). They populate QA_TARGET_URL / QA_RECEIPT_OVERRIDE / QA_BASE_REF — the exact variables Phases 3.1 / 6.3 / §1.2 read. Other flags (viewport, autonomy) are reserved for later tasks; the skeleton shifts them harmlessly.
RAW_ARGS="$ARGUMENTS"
SPEC_ID=""
set -- $RAW_ARGS
while [[ $# -gt 0 ]]; do
case "$1" in
--target) QA_TARGET_URL="$2"; shift 2 ;; # Phase 3.1 caller override
--target=*) QA_TARGET_URL="${1#--target=}"; shift ;;
--receipt) QA_RECEIPT_OVERRIDE="$2"; shift 2 ;; # Phase 6.3 receipt path
--receipt=*) QA_RECEIPT_OVERRIDE="${1#--receipt=}"; shift ;;
--base) QA_BASE_REF="$2"; shift 2 ;; # §1.2 base-branch override
--base=*) QA_BASE_REF="${1#--base=}"; shift ;;
--) shift; break ;;
-*) echo "Unknown flag: $1 (reserved for a later task)" >&2; shift ;;
*) [[ -z "$SPEC_ID" ]] && SPEC_ID="$1"; shift ;;
esac
done
export QA_TARGET_URL QA_RECEIPT_OVERRIDE QA_BASE_REF # carry the resolved overrides into workflow.md Phases 3.1 / 6.3 / §1.2
When SPEC_ID is empty, the discover phase resolves it (branch-match, or by asking the user via plain-text numbered prompt as an info prompt) — never silently default.
Ralph mode (FLOW_RALPH=1 or REVIEW_RECEIPT_PATH set) is detected in workflow.md §AUTONOMY — the skill is aware but not Ralph-blocked (R11). The deep autonomy routing (autonomous when target URL + accounts are configured; receipt path resolution) is owned by a downstream task; the skeleton only lays the section anchor.
A skill is not a function. QA does NOT "call" flow-next-drive. The host agent reads fn-51's workflow + references and executes the universal driving flow itself — observe → snapshot fresh refs → act → verify → capture. fn-51 owns the driver ladder and all actuation prose; QA owns scenario authoring, evidence capture, and the verdict. Never duplicate CDP / agent-browser / Computer-Use prose here — point at fn-51's references:
plugins/flow-next/skills/flow-next-drive/SKILL.mdplugins/flow-next/skills/flow-next-drive/references/ (agent-browser.md, chrome-devtools-mcp.md, playwright.md, computer-use.md, …)Per scenario, record an evidence tuple: {driver_rung, target_url, viewport, screenshot_path, console_path}. fn-51's SKILL.md (:83) explicitly defers the QA workflow — scenario authoring, bug filing, verdict — downstream to this skill; the seam is designed, QA orchestrates and fn-51 actuates.
FLOW_RALPH/REVIEW_RECEIPT_PATH exit-2 guard at the top of the skill.Execute the phases in workflow.md in order:
qa_verdict receipt. (Owned by a downstream task.)This task (fn-53.1) stands up the skeleton (all six phase anchors) plus the working discover and derive phases, and proves the thesis end-to-end (derive ≥1 scenario → dispatch through the fn-51 contract → record an evidence tuple, or a BLOCKED proof receipt when no live target exists). Phases 3-6 are filled by serial downstream tasks editing their own disjoint section anchors.
testing
Live-app real-user QA pass derived from the spec. Drives the running app via flow-next-drive, derives scenarios from the spec's AC / R-IDs / boundaries, files structured P0/P1/P2 findings with evidence, and ends with a YES/NO ship verdict receipt. Triggers on /flow-next:qa with a spec id. FORBIDDEN from marking PASS by reading source — the verdict rests on captured evidence from the live app, never on agent narration.
testing
Project a flow-next spec to a tracker issue (Linear first, GitHub next) and reconcile body/status/comments two-way — projection, not coordination. The spec stays the source of truth; the tracker is a co-editable mirror. Use to configure the bridge (discovery ceremony), link a spec to an issue (flow-first push or tracker-first "grab issue X and spec it"), push/pull/reconcile, or unlink. Triggers on /flow-next:tracker-sync, "sync to linear", "push this spec to the tracker", "grab issue X and spec it", "link this spec to the issue", "reconcile with the tracker". NOT /flow-next:sync (that is plan-sync, a different skill).
development
Drive any UI surface like a real user - a web app, a Chromium-backed desktop app (Electron / WebView2, reached over CDP), or a genuinely native app (macOS AppKit/SwiftUI, or a non-CDP webview) reached via Computer Use. Detects the surface, picks the best available driver, degrades gracefully. Use to navigate sites, verify deployed UI, test web or desktop apps, capture baseline screenshots, drive a sign-in flow, scrape data, fill forms, run an e2e check, or inspect current page state. Triggers on "check the page", "verify UI", "test the site", "test this app", "drive the app", "automate this desktop app", "read docs at", "look up API", "visit URL", "browse", "screenshot", "scrape", "e2e test", "login flow", "capture baseline", "see how it looks", "inspect current", "before redesign", "Electron app", "native app".
testing
Project a flow-next spec to a tracker issue (Linear first, GitHub next) and reconcile body/status/comments two-way — projection, not coordination. The spec stays the source of truth; the tracker is a co-editable mirror. Use to configure the bridge (discovery ceremony), link a spec to an issue (flow-first push or tracker-first "grab issue X and spec it"), push/pull/reconcile, or unlink. Triggers on /flow-next:tracker-sync, "sync to linear", "push this spec to the tracker", "grab issue X and spec it", "link this spec to the issue", "reconcile with the tracker". NOT /flow-next:sync (that is plan-sync, a different skill).