plugins/spec-plugin/skills/validate-execution/SKILL.md
Validate a version's implementation against its Definition of Done. For code projects: runs automated tests against the live application. For non-code projects: reviews deliverables against acceptance criteria. Runs incrementally on re-runs. Ends with human validation guidance.
npx skillsauth add jaisonerick/spec-plugin validate-executionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Your task: as the single live QA for the whole execution, write the validation specs and run them continuously as engineers hand work over — not in one batch at the end.
How this runs:
code_branch, run the relevant cases, reply PASS/findings to the engineer, CC the team lead on failures). Coverage accumulates in specs/<version>/qa/ across the version.specs/<version>/qa/, but do not commit — the team lead commits the spec workspace.specs/ in CWD.specs/<version>.md — focus on Definition of Donespecs/<version>/architecture.mdspecs/<version>/stories.mdspecs/<version>/qa/code_repo at code_branch, set up per the setup-playbook (specs/<version>/setup-playbook.md). Pull code_branch as engineers merge their stories so you always validate the latest integrated state.specs/architecture.mdFor code projects: Each Definition of Done item that can be verified programmatically should have test cases. Keep it lean — ~10-15 test cases per version.
Write specs at specs/<version>/qa/NNN-spec-name.md:
# QA Spec NNN: Spec Title
**Area**: API | Integration | UI | Health | User Flow
**Prerequisites**: What must be running
## Setup
Steps to prepare the environment.
## Test Cases
### TC-001: Test case title
**Definition of Done item**: (which DoD item this covers)
**Steps**:
1. Concrete action (e.g., "POST /api/resource with body: {...}")
2. Verify result
**Expected**: What should happen
**Severity**: critical | major | minor
## Human Review Checklist
- [ ] Visual/UX items to verify manually
For non-code projects: Each Definition of Done item becomes a review criterion. Validation is done by reading and evaluating deliverables.
Write specs at specs/<version>/qa/NNN-spec-name.md:
# QA Spec NNN: Spec Title
**Area**: Completeness | Quality | Accuracy | Format
**Deliverables to review**: (list of files/documents)
## Review Criteria
### RC-001: Criterion title
**Definition of Done item**: (which DoD item this covers)
**What to check**:
1. Read [document/section]
2. Verify [specific quality or content requirement]
**Expected**: What a passing deliverable looks like
**Severity**: critical | major | minor
## Human Review Checklist
- [ ] Items requiring subjective judgment
Write index at specs/<version>/qa/specs.md.
Check for prior run results. If specs have ## Run Results, this is a re-run after fixes.
Incremental mode: Only re-run failed/skipped items. Add a ### Re-run: <date> subsection.
Verify ALL required services are up and reachable. If ANY service is down, STOP. Report via SendMessage.
Verify startup commands are documented. If missing, STOP and mark BLOCKED.
For each test case:
curl or httpie via BashDon't trust green unit suites — exercise the real thing. Past runs shipped contract bugs (a missing _live_ env infix, expires_in vs expires_at) that 100+ green mocked tests hid. Use the work-modes probe-contract primitive to run the real classes in a REPL and observe the actual request/response shape; use verify-symbol to prove a method/field/endpoint truly exists. A live staging env is nice but not required — a REPL against the real code is enough.
For each review criterion:
Append or update ## Run Results in each spec file:
## Run Results
### Run: <date>
| ID | Title | Result | Notes |
|----|-------|--------|-------|
| TC-001 | Test title | PASS | |
| TC-002 | Test title | FAIL | Expected X, got Y |
**Summary**: X passed, Y failed, Z skipped
**Failures requiring fixes**:
- TC-002: <clear description of what's wrong>
Report via SendMessage:
You don't write the human-validation guide — the PO does, in the final review. When your accumulated validation passes, report to the team lead that QA is green and hand over your evidence so the PO can frame the human handoff:
specs/<version>/qa/The PO assembles these into the human-validation guide; the version ships only when the human confirms.
Append to the spec file:
## Validation Findings
### Issues Found
- What didn't meet criteria
### Missing or Incomplete
- Gaps in deliverables
### Patterns Observed
- Recurring issues
development
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
tools
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
development
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
testing
Post-version retrospective that captures lessons learned, fixes documentation drift, and proposes skill improvements. Analyzes PROGRESS.md, story logs, QA results, and commit/change history to identify struggle patterns and knowledge worth preserving. Run after a version is shipped.