core/capabilities/review/auto-qa/SKILL.md
Automatic sub-agent code verification after quality gates pass. Independent context window. Checks spec-implementation alignment, test coverage, error handling, boundary conditions, and integration consistency. 60 seconds, code-only, advisory.
npx skillsauth add xoai/sage auto-qaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick independent verification of implementation against spec via sub-agent delegation. Runs as Gate 8 in the quality gates sequence — positioned after Gate 5 (verification), alongside Gate 6 (browser) and Gate 7 (design). 60 seconds max. Code-only. Advisory — never blocks.
Auto-QA runs as Gate 8 in the quality-gates workflow. The gate sequence triggers it — the agent does not decide whether to run it.
Activation conditions (checked by quality-gates workflow):
Task tool is available. If not (e.g., Antigravity), skip silently. No self-review fallback.
Scope is Standard or Comprehensive. Lightweight tasks skip.
Config allows it. Check .sage/config.yaml for auto_qa.
If auto_qa: false, skip with one-line note:
"Auto-QA disabled. Run the QA command for manual testing."
If auto_qa: true or absent (default), proceed.
If ANY condition is false → skip silently (except config disabled which shows one-line note).
Gate 8 runs AFTER Gates 1-5 pass. Gates catch structural issues; Gate 8 catches semantic issues (spec drift, missing handlers, boundary gaps, integration mismatches).
60 seconds max. If the sub-agent doesn't respond within 60 seconds, skip with: "Auto-QA timed out. Run the QA command manually for full testing."
Before spawning the sub-agent, gather:
.sage/work/*/spec.md.sage/work/*/plan.md*test*, *spec*, *.test.*,
*.spec.* in the changed file list or in test directories.Pass all four to the sub-agent prompt.
You are a QA reviewer. You were NOT involved in writing this code.
Review the implementation with fresh eyes. Be specific. Be brief.
CRITICAL: You are READ-ONLY. Do NOT modify any files. Do NOT use
Edit or Write tools. Do NOT modify specs, plans, or code. Your job
is to REPORT findings, not fix them. The user decides what to do.
INPUTS:
- Spec: {SPEC_PATH}
- Plan: {PLAN_PATH}
- Changed files: {FILE_LIST}
- Test files: {TEST_FILE_LIST}
CHECK THESE 5 THINGS:
1. SPEC-IMPLEMENTATION ALIGNMENT: Read each acceptance criterion in
the spec. Does corresponding code exist for each? Is anything
implemented that the spec doesn't mention?
2. TEST-CRITERIA COVERAGE: Do tests verify the spec's acceptance
criteria? Or do they only test implementation details? Flag any
acceptance criterion with no test.
3. MISSING ERROR HANDLING: Scan changed files for: API calls without
catch, form submissions without validation, state mutations
without guards, async operations without loading/error states.
Be specific — name the file, the line, the risk.
4. BOUNDARY CONDITIONS: Read the spec's boundaries and limits.
Is each boundary enforced in code? Flag stated limits with no
code enforcement.
5. INTEGRATION CONSISTENCY: If multiple modules were changed, do
their interfaces match? Response shapes, event contracts, shared
types. Flag mismatches.
6. CODING PRINCIPLES: Do changed files follow universal quality
principles? Check for: magic numbers, swallowed errors, unclear
names, functions doing multiple things, unnecessary global state,
missing input validation. Flag clear violations, not style
preferences.
CLASSIFY each finding:
- CRITICAL: Will break in production. Must fix.
- MAJOR: Will cause problems. Should fix before shipping.
- MINOR-substantive: Improvement opportunity. Affects readability,
maintainability, or future behavior. Can fix later.
- MINOR-cosmetic: Style/naming/formatting with equally valid
alternatives. No behavior change.
FORMAT (strict):
VERDICT: PASS | NEEDS FIXES | FAIL
CRITICAL: [list or "None"]
MAJOR: [list or "None"]
MINOR-substantive: [list or "None"]
MINOR-cosmetic: [list or "None"]
Be concise. Every finding must name a specific file and what's wrong.
No generic observations. No praise. Just findings.
⚡ Running implementation QA (sub-agent)...
✓ Auto-QA: PASS — no issues found.
Proceeding to completion.
⚡ Running implementation QA (sub-agent)...
⚠ Auto-QA found {N} issues:
MAJOR: {file:line — specific finding}
MINOR: {file:line — specific finding}
[R] Fix issues — address findings before completing
[P] Proceed — ship as-is, I'll track these
[D] Discuss — let's talk about these findings
Pick R/P/D, or tell me what to change.
⚡ Running implementation QA (sub-agent)...
🔴 Auto-QA found a CRITICAL issue:
CRITICAL: {file:line — specific finding with impact}
[R] Fix — this will break in production
[D] Discuss — let's look at this
[P] Proceed anyway — I understand the risk
[P] Proceed is ALWAYS available. The user decides, not the gate.
When the user picks [R] Fix:
Max 2 re-checks. This keeps the loop bounded.
After every auto-QA (any verdict), prepend to .sage/decisions.md:
### YYYY-MM-DD — Auto-QA: implementation
Verdict: {PASS|NEEDS FIXES|FAIL}. {findings summary if any}.
User chose: {R|P|D}. (auto-qa sub-agent)
Present ALL findings to the user exactly as the sub-agent returned them. Do NOT remove, downgrade, or dismiss findings.
Blocked rationalizations:
development
Branch-per-initiative git discipline for all delivery workflows. Defines branch naming by workflow, the propose-confirm creation protocol, dirty-tree and detached-HEAD handling, the always user-gated merge protocol, worktree support for parallel sessions, and abandonment cleanup. Activates only in git repositories — silently inactive everywhere else. Use when starting /build, /fix, /architect, or /build-x at Standard+ scope, when resuming an initiative, when offering a merge at a completion checkpoint, or when the user wants a second concurrent initiative.
development
Drives task-by-task execution from an approved plan with quality gates between each task. Reads the plan, finds the next incomplete task, dispatches implementation, validates, updates progress, and continues. Use after a plan is approved and the user says "go", "start building", "execute the plan", or "implement the feature".
testing
Preserves and restores context across agent sessions using plan file checkboxes as source of truth. Use when starting a new session, resuming previous work, ending a session, or when the user says "continue from last time", "what was I doing", or "save progress".
tools
Captures agent mistakes, corrections, and discovered gotchas so they are not repeated. Use when: (1) a command or operation fails unexpectedly, (2) the user corrects the agent, (3) the agent discovers non-obvious behavior through debugging, (4) an API or tool behaves differently than expected, (5) a better approach is found for a recurring task. Also searches past learnings before starting tasks to avoid known pitfalls. Activate alongside the sage-memory skill — they share the same MCP backend but serve different purposes (sage-memory = codebase knowledge, sage-self-learning = agent mistakes and gotchas).