Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jaisonerick/validate-execution

Name: validate-execution
Author: jaisonerick

plugins/spec-plugin/skills/validate-execution/SKILL.md

npx skillsauth add jaisonerick/spec-plugin validate-execution

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Your task: as the single live QA for the whole execution, write the validation specs and run them continuously as engineers hand work over — not in one batch at the end.

How this runs:

You write validation specs from the Definition of Done (Phase 1), then verify each story when its engineer hands it over (pull code_branch, run the relevant cases, reply PASS/findings to the engineer, CC the team lead on failures). Coverage accumulates in specs/<version>/qa/ across the version.
You do not produce the human-validation guide — the PO does, from your accumulated findings, in the final review. You feed evidence; the PO frames it for the human.
Spec-workspace git: write your specs and findings under specs/<version>/qa/, but do not commit — the team lead commits the spec workspace.

Phase 1 — Load Context

Locate specs. If the orchestrator specified a specs repo path in your prompt, read specs from there. Otherwise, look for specs/ in CWD.
Read the version spec: specs/<version>.md — focus on Definition of Done
Read the version architecture: specs/<version>/architecture.md
Read the stories index: specs/<version>/stories.md
Read the project spec — check Project Context for project type and code repository path
Check if validation specs exist at specs/<version>/qa/
Code checkout: maintain your own checkout/worktree of code_repo at code_branch, set up per the setup-playbook (specs/<version>/setup-playbook.md). Pull code_branch as engineers merge their stories so you always validate the latest integrated state.

If no validation specs exist → Write them first (Phase 1B)

Read each story file to understand what was built/delivered and its acceptance criteria
Read the overall architecture: specs/architecture.md
Design validation specs based on the project type (see below)

For code projects: Each Definition of Done item that can be verified programmatically should have test cases. Keep it lean — ~10-15 test cases per version.

Write specs at specs/<version>/qa/NNN-spec-name.md:

# QA Spec NNN: Spec Title

**Area**: API | Integration | UI | Health | User Flow
**Prerequisites**: What must be running

## Setup
Steps to prepare the environment.

## Test Cases

### TC-001: Test case title
**Definition of Done item**: (which DoD item this covers)
**Steps**:
1. Concrete action (e.g., "POST /api/resource with body: {...}")
2. Verify result

**Expected**: What should happen
**Severity**: critical | major | minor

## Human Review Checklist
- [ ] Visual/UX items to verify manually

For non-code projects: Each Definition of Done item becomes a review criterion. Validation is done by reading and evaluating deliverables.

Write specs at specs/<version>/qa/NNN-spec-name.md:

# QA Spec NNN: Spec Title

**Area**: Completeness | Quality | Accuracy | Format
**Deliverables to review**: (list of files/documents)

## Review Criteria

### RC-001: Criterion title
**Definition of Done item**: (which DoD item this covers)
**What to check**:
1. Read [document/section]
2. Verify [specific quality or content requirement]

**Expected**: What a passing deliverable looks like
**Severity**: critical | major | minor

## Human Review Checklist
- [ ] Items requiring subjective judgment

Write index at specs/<version>/qa/specs.md.

If validation specs exist → Determine mode

Check for prior run results. If specs have ## Run Results, this is a re-run after fixes.

Incremental mode: Only re-run failed/skipped items. Add a ### Re-run: <date> subsection.

Phase 2 — Environment Setup (Code Projects)

Mandatory Pre-Flight Check

Verify ALL required services are up and reachable. If ANY service is down, STOP. Report via SendMessage.

Documentation Check

Verify startup commands are documented. If missing, STOP and mark BLOCKED.

Start & Verify

Start services following documented commands
Verify health endpoints respond
Seed test data if needed

Phase 2 — Review Setup (Non-Code Projects)

Locate all deliverables referenced by the stories
Verify all deliverables exist (report missing ones as failures)
Note the language setting from the project spec

Phase 3 — Execute Validation

For Code Projects

For each test case:

Execute steps exactly as written:
- API calls: curl or httpie via Bash
- Browser flows: Chrome DevTools MCP tools
- CLI commands: run via Bash
- Database checks: query directly
Compare actual vs expected — record clearly

Don't trust green unit suites — exercise the real thing. Past runs shipped contract bugs (a missing _live_ env infix, expires_in vs expires_at) that 100+ green mocked tests hid. Use the work-modes probe-contract primitive to run the real classes in a REPL and observe the actual request/response shape; use verify-symbol to prove a method/field/endpoint truly exists. A live staging env is nice but not required — a REPL against the real code is enough.

For Non-Code Projects

For each review criterion:

Read the deliverable referenced by the criterion
Evaluate against the criterion — does it meet the standard?
Record the finding — pass, fail (with specific issues), or needs-improvement

Phase 4 — Report Results

Append or update ## Run Results in each spec file:

## Run Results

### Run: <date>

| ID | Title | Result | Notes |
|----|-------|--------|-------|
| TC-001 | Test title | PASS | |
| TC-002 | Test title | FAIL | Expected X, got Y |

**Summary**: X passed, Y failed, Z skipped
**Failures requiring fixes**:
- TC-002: <clear description of what's wrong>

Phase 5 — Determine Next Step

If failures exist → Report to orchestrator

Report via SendMessage:

Which items failed and why
Suggested fix areas
Whether failures are CRITICAL (blocking) or MINOR (deferrable)

If all pass → Hand evidence to the PO

You don't write the human-validation guide — the PO does, in the final review. When your accumulated validation passes, report to the team lead that QA is green and hand over your evidence so the PO can frame the human handoff:

A summary of what passed (by DoD item), with pointers to the run records in specs/<version>/qa/
Anything that needs human judgment (visual craft, UX, subjective quality) that you deliberately did not assert
Known limitations you observed (e.g., from "Simplified in this version")

The PO assembles these into the human-validation guide; the version ships only when the human confirms.

Phase 6 — Document Findings

Append to the spec file:

## Validation Findings

### Issues Found
- What didn't meet criteria

### Missing or Incomplete
- Gaps in deliverables

### Patterns Observed
- Recurring issues

jaisonerick/validate-execution

plugins/spec-plugin/skills/validate-execution/SKILL.md

Validate a version's implementation against its Definition of Done. For code projects: runs automated tests against the live application. For non-code projects: reviews deliverables against acceptance criteria. Runs incrementally on re-runs. Ends with human validation guidance.

development

Updated Jun 5, 2026

$ install --global

skillsauth

npx skillsauth add jaisonerick/spec-plugin validate-execution

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 5, 2026, 7:01 AM19.7s1 file scanned

SKILL.md

name:: validate-execution
description:: Validate a version's implementation against its Definition of Done. For code projects: runs automated tests against the live application. For non-code projects: reviews deliverables against acceptance criteria. Runs incrementally on re-runs. Ends with human validation guidance.
argument-hint:: [version, e.g. v0.1-core-push]

Your task: as the single live QA for the whole execution, write the validation specs and run them continuously as engineers hand work over — not in one batch at the end.

How this runs:

You write validation specs from the Definition of Done (Phase 1), then verify each story when its engineer hands it over (pull code_branch, run the relevant cases, reply PASS/findings to the engineer, CC the team lead on failures). Coverage accumulates in specs/<version>/qa/ across the version.
You do not produce the human-validation guide — the PO does, from your accumulated findings, in the final review. You feed evidence; the PO frames it for the human.
Spec-workspace git: write your specs and findings under specs/<version>/qa/, but do not commit — the team lead commits the spec workspace.

Phase 1 — Load Context

Locate specs. If the orchestrator specified a specs repo path in your prompt, read specs from there. Otherwise, look for specs/ in CWD.
Read the version spec: specs/<version>.md — focus on Definition of Done
Read the version architecture: specs/<version>/architecture.md
Read the stories index: specs/<version>/stories.md
Read the project spec — check Project Context for project type and code repository path
Check if validation specs exist at specs/<version>/qa/
Code checkout: maintain your own checkout/worktree of code_repo at code_branch, set up per the setup-playbook (specs/<version>/setup-playbook.md). Pull code_branch as engineers merge their stories so you always validate the latest integrated state.

If no validation specs exist → Write them first (Phase 1B)

Read each story file to understand what was built/delivered and its acceptance criteria
Read the overall architecture: specs/architecture.md
Design validation specs based on the project type (see below)

For code projects: Each Definition of Done item that can be verified programmatically should have test cases. Keep it lean — ~10-15 test cases per version.

Write specs at specs/<version>/qa/NNN-spec-name.md:

# QA Spec NNN: Spec Title

**Area**: API | Integration | UI | Health | User Flow
**Prerequisites**: What must be running

## Setup
Steps to prepare the environment.

## Test Cases

### TC-001: Test case title
**Definition of Done item**: (which DoD item this covers)
**Steps**:
1. Concrete action (e.g., "POST /api/resource with body: {...}")
2. Verify result

**Expected**: What should happen
**Severity**: critical | major | minor

## Human Review Checklist
- [ ] Visual/UX items to verify manually

For non-code projects: Each Definition of Done item becomes a review criterion. Validation is done by reading and evaluating deliverables.

Write specs at specs/<version>/qa/NNN-spec-name.md:

# QA Spec NNN: Spec Title

**Area**: Completeness | Quality | Accuracy | Format
**Deliverables to review**: (list of files/documents)

## Review Criteria

### RC-001: Criterion title
**Definition of Done item**: (which DoD item this covers)
**What to check**:
1. Read [document/section]
2. Verify [specific quality or content requirement]

**Expected**: What a passing deliverable looks like
**Severity**: critical | major | minor

## Human Review Checklist
- [ ] Items requiring subjective judgment

Write index at specs/<version>/qa/specs.md.

If validation specs exist → Determine mode

Check for prior run results. If specs have ## Run Results, this is a re-run after fixes.

Incremental mode: Only re-run failed/skipped items. Add a ### Re-run: <date> subsection.

Phase 2 — Environment Setup (Code Projects)

Mandatory Pre-Flight Check

Verify ALL required services are up and reachable. If ANY service is down, STOP. Report via SendMessage.

Documentation Check

Verify startup commands are documented. If missing, STOP and mark BLOCKED.

Start & Verify

Start services following documented commands
Verify health endpoints respond
Seed test data if needed

Phase 2 — Review Setup (Non-Code Projects)

Locate all deliverables referenced by the stories
Verify all deliverables exist (report missing ones as failures)
Note the language setting from the project spec

Phase 3 — Execute Validation

For Code Projects

For each test case:

Execute steps exactly as written:
- API calls: curl or httpie via Bash
- Browser flows: Chrome DevTools MCP tools
- CLI commands: run via Bash
- Database checks: query directly
Compare actual vs expected — record clearly

For Non-Code Projects

For each review criterion:

Read the deliverable referenced by the criterion
Evaluate against the criterion — does it meet the standard?
Record the finding — pass, fail (with specific issues), or needs-improvement

Phase 4 — Report Results

Append or update ## Run Results in each spec file:

## Run Results

### Run: <date>

| ID | Title | Result | Notes |
|----|-------|--------|-------|
| TC-001 | Test title | PASS | |
| TC-002 | Test title | FAIL | Expected X, got Y |

**Summary**: X passed, Y failed, Z skipped
**Failures requiring fixes**:
- TC-002: <clear description of what's wrong>

Phase 5 — Determine Next Step

If failures exist → Report to orchestrator

Report via SendMessage:

Which items failed and why
Suggested fix areas
Whether failures are CRITICAL (blocking) or MINOR (deferrable)

If all pass → Hand evidence to the PO

A summary of what passed (by DoD item), with pointers to the run records in specs/<version>/qa/
Anything that needs human judgment (visual craft, UX, subjective quality) that you deliberately did not assert
Known limitations you observed (e.g., from "Simplified in this version")

The PO assembles these into the human-validation guide; the version ships only when the human confirms.

Phase 6 — Document Findings

Append to the spec file:

## Validation Findings

### Issues Found
- What didn't meet criteria

### Missing or Incomplete
- Gaps in deliverables

### Patterns Observed
- Recurring issues

Related Skills

jaisonerick/web-design-guidelines

development

VerifiedTrustedCommunity

Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".

SKILL.mdUpdated Jun 5, 2026

jaisonerick/web-design-guidelines

jaisonerick/mcp-builder

tools

VerifiedTrustedCommunity

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

SKILL.mdUpdated Jun 5, 2026

jaisonerick/mcp-builder

jaisonerick/markdown-converter

development

VerifiedTrustedCommunity

Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.

SKILL.mdUpdated Jun 5, 2026

jaisonerick/markdown-converter

jaisonerick/run-retrospective

testing

VerifiedTrustedCommunity

Post-version retrospective that captures lessons learned, fixes documentation drift, and proposes skill improvements. Analyzes PROGRESS.md, story logs, QA results, and commit/change history to identify struggle patterns and knowledge worth preserving. Run after a version is shipped.

SKILL.mdUpdated Jun 5, 2026

jaisonerick/run-retrospective

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jaisonerick/spec-plugin.git

# Copy into Claude Code skills folder (global)
cp -r spec-plugin/plugins/spec-plugin/skills/validate-execution ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jaisonerick/spec-plugin

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT