Code Verification Skill

Verify code against requirements using a main agent / sub-agent loop with structured feedback and automatic retry.

Workflow Overview

Copy this checklist and track progress:

Code Verification Progress:
- [ ] Step 1: Parse verification instructions
- [ ] Step 2: Pre-flight validation
- [ ] Step 3: Verification loop (per instruction)
- [ ] Step 4: Fix attempts (if failed)
- [ ] Step 5: Update checklist with results
- [ ] Step 6: Generate verification report
- [ ] Step 7: Log results to verification-log.jsonl

Step 1: Parse Verification Instructions

Extract each verification instruction into a discrete, testable item:

ID: Unique identifier (e.g., V-001)
Instruction: The requirement text
Test approach: How to verify (file inspection, run tests, lint, type check, etc.)
Files involved: Which files to examine
Requires Browser: Whether the instruction needs Playwright MCP verification
- Auto-detect from keywords: UI, render, display, visible, hidden, show, hide, click, hover, focus, blur, scroll, DOM, element, component, layout, responsive, style, CSS, color, font, screenshot, visual, appearance, console, error, warning, log, network, request, response, accessibility, a11y, ARIA, animation, transition, loading, performance
- Mark as: browser: true or browser: false
Browser Verification Type (if browser: true):
- DOM_INSPECTION - Element presence, visibility, content via accessibility tree snapshots
- SCREENSHOT - Visual appearance, layout verification
- CONSOLE - Browser console errors, warnings, logs
- NETWORK - API requests, responses, status codes (via network interception)
- PERFORMANCE - Load times, Core Web Vitals (via tracing)
- ACCESSIBILITY - ARIA attributes, semantic HTML, accessibility tree analysis

Step 2: Pre-flight Validation

Before the verification loop, confirm each instruction is testable:

Instruction is specific and unambiguous
Success criteria are clear
Required files/resources exist

Flag untestable instructions immediately rather than attempting verification.

Verify that all referenced files and resources actually exist before entering the verification loop. Log any missing files and mark their associated instructions as BLOCKED.

Browser-Specific Pre-Flight

For instructions with browser: true:

HTTP-First Check (before browser tools)

Many "browser" criteria can be satisfied with a simple HTTP check. Before launching browser tools, evaluate if the criterion only requires:
- Page accessibility (HTTP 200 status)
- API response validation
- Redirect verification
- Basic content presence
Attempt HTTP verification first using curl:
```
# Page loads check
curl -sf "{devServer.url}{route}" -o /dev/null && echo "PASS" || echo "FAIL"

# Response contains text
curl -s "{url}" | grep -q "{expected}" && echo "PASS" || echo "FAIL"

# API returns expected status
curl -sf -o /dev/null -w "%{http_code}" "{url}"
```
Curl error handling:
- Use -m 10 (10-second timeout) to prevent hanging on unresponsive services
- If curl returns exit code 7 (connection refused): dev server may not be running
- If curl returns exit code 28 (timeout): service is slow or unreachable
- On any curl failure, fall through to browser verification (do not mark as FAIL yet)
- Check the curl exit code ($?) immediately after execution before interpreting output
If HTTP check passes AND criterion does NOT explicitly require:
- DOM element inspection (selector, visibility)
- Visual appearance verification
- User interaction simulation
- Console log inspection
- Network timing/performance
Then: Mark as PASS (HTTP-first), skip browser verification.

Otherwise: Continue to browser tool fallback chain.
Check browser tool availability (fallback chain)
- Try tools in order: ExecuteAutomation Playwright → Browser MCP → Microsoft Playwright → Chrome DevTools
- Use first available tool for browser verification
If NO tools available (SOFT BLOCK):

Before marking as BLOCKED, attempt HTTP fallback for any remaining criteria. EXECUTE these commands via Bash tool (substitute actual URLs):
- Page loading: curl -sf {url}
- API responses: curl -s {url} | jq .
- Redirects: curl -sI {url}
Only mark as BLOCKED if both browser tools AND HTTP fallback are insufficient.

If still blocked after HTTP fallback:
- Display warning:
```
⚠️  NO BROWSER TOOLS AVAILABLE

This verification includes browser-based criteria but no browser
MCP tools are available, and HTTP fallback is insufficient.

Criteria requiring DOM/visual inspection:
- {list criteria that truly need browser}

Options:
1. Continue anyway (these criteria become manual verification)
2. Stop and configure browser tools first
```
- Use AskUserQuestion to let user choose:
  - "Continue with manual verification" → Mark browser instructions as BLOCKED, continue with non-browser criteria
  - "Stop to configure tools" → Halt verification, provide setup instructions
Verify dev server is running (if browser tools available)
- Check if configured dev server URL responds (e.g., http://localhost:3000)
- If not running, attempt to start using the configured dev server command
- Wait for configured startup time before proceeding
- If unable to start, mark as BLOCKED: "Dev server not accessible at {URL}"
Confirm target route exists (if browser tools available)
- Navigate to the page specified in the instruction using the selected browser tool
- If 404 or error, mark as BLOCKED: "Target route not found: {route}"

Step 3: Sub-Agent Verification Protocol

Spawn a sub-agent to verify each instruction. The sub-agent MUST return structured output:

VERIFICATION RESULT
-------------------
Instruction ID: [ID]
Status: PASS | FAIL | BLOCKED
Location: [file:line or "N/A"]
Severity: BLOCKING | MINOR
Finding: [What was found]
Expected: [What was expected]
Suggested Fix: [Specific fix recommendation]

Sub-agent rules:

Check ONLY the specific instruction assigned
Do not attempt fixes—report findings only
Be precise about location (file, line number, function name)
Distinguish between blocking failures and minor issues

Browser-Enhanced Verification Output

For instructions with browser: true, the sub-agent MUST use Playwright MCP and return:

BROWSER VERIFICATION RESULT
---------------------------
Instruction ID: [ID]
Status: PASS | FAIL | BLOCKED
Type: DOM | VISUAL | CONSOLE | NETWORK | PERFORMANCE | ACCESSIBILITY
URL: [URL] | Viewport: [width]x[height]

Finding: [What was observed]
Expected: [What was expected]

Details: [Type-specific information]
  - DOM: selector, found, visible, content
  - Visual: screenshot path, description
  - Console: errors, warnings, logs
  - Network: endpoint, method, status, response summary
  - Performance: load time, LCP, FID, CLS
  - Accessibility: ARIA, semantic HTML, contrast, keyboard nav

Suggested Fix: [Specific fix recommendation]

Browser Sub-Agent Rules

In addition to standard sub-agent rules, browser verification sub-agents MUST:

Start with an accessibility tree snapshot (browser_snapshot) of the initial state
Use stable selectors (prefer data-testid over complex CSS paths, or use accessibility tree element refs)
Wait for dynamic content to load before inspecting (browser_wait_for_text or browser_wait)
Capture console output before and after actions
Take screenshots (browser_screenshot) when verifying visual appearance
Test at default viewport unless criterion specifies responsive/mobile (use browser_resize to change)

Step 4: Main Agent Fix Protocol

When sub-agent reports FAIL:

Review the finding - Understand what failed and why
Check fix history - Do not repeat a previously attempted fix
Apply targeted fix - Make the minimum change to address the issue
Verify the fix was applied - Read back the edited file to confirm the change took effect. If the file is unchanged, retry the edit with corrected context.
Log the attempt - Record what was changed

Fix attempt tracking

Maintain a fix log per instruction:

FIX LOG: [Instruction ID]
--------------------------
Attempt 1: [Description of change] → [Result]
Attempt 2: [Description of change] → [Result]
...

Strategy escalation

Attempts 1-2: Direct fix based on sub-agent suggestion
Attempt 3: Try alternative approach
Attempts 4-5: Broaden scope, consider architectural changes

If the same failure pattern repeats twice, explicitly try a different strategy.

After applying fix, re-verify the specific criterion:

Re-run the sub-agent check for the failed criterion only
If still failing after 2 fix attempts, mark as FAIL with evidence from both attempts
Do NOT re-run all criteria — only the failing one

Browser-Specific Fix Strategies

| Failure Type | Common Fixes | |--------------|--------------| | DOM/Visibility | Conditional rendering, CSS display/visibility, z-index, prop passing | | Console errors | JS exceptions, missing mocks, env vars, CORS | | Network | Endpoint URLs, auth headers, payload format, CORS config | | Visual | CSS cascade, responsive breakpoints, font loading | | Performance | Bundle size, image optimization, lazy loading, render-blocking | | Accessibility | ARIA attributes, color contrast, heading hierarchy, keyboard handlers |

Step 5: Exit Conditions

Exit the verification loop when ANY condition is met:

| Condition | Action | |-----------|--------| | Sub-agent reports PASS | ✅ Check off instruction | | 5 attempts exhausted | ❌ Mark failed with notes | | Same failure 3+ times | ⚠️ Exit early, flag for review | | Fix introduces regression | ⚠️ Revert, flag for review | | Issue is MINOR severity | ⚠️ Note and continue |

Step 6: Regression Check

After each fix attempt, verify:

The targeted instruction (primary check)
Any previously-passing related instructions (regression check)

If a fix breaks something else, revert and note the conflict. After reverting, verify with git status or by reading the file that the revert was successful before proceeding.

Browser Regression Checks

After each browser-related fix, verify no regressions in: console errors, visual appearance, performance metrics, accessibility. If regression detected, capture before/after state and log in fix history.

Step 7: Generate Verification Report

After all instructions are processed:

VERIFICATION REPORT
===================
Total Instructions: [N]
Passed: [N] ✅
Failed: [N] ❌
Needs Review: [N] ⚠️

DETAILS
-------
[V-001] ✅ [Instruction summary]
[V-002] ❌ [Instruction summary]
  - Failed after 5 attempts
  - Last error: [description]
  - Attempts: [brief log]
[V-003] ⚠️ [Instruction summary]
  - Flagged: Repeated same failure pattern
  - Recommendation: [suggestion]

AUDIT TRAIL
-----------
[Timestamp] V-001: Verified PASS on first check
[Timestamp] V-002: Attempt 1 - Changed X → FAIL
[Timestamp] V-002: Attempt 2 - Changed Y → FAIL
...

BROWSER VERIFICATION (if applicable)
------------------------------------
Browser Checks: [passed]/[total] | Blocked: [N]
Playwright: Available | Unavailable
Dev Server: [URL] | Not Running

Issues Found:
- [V-XXX] {type}: {description}

Screenshots: [list of captured files]

Example

Given a checklist:

[ ] All functions have docstrings
[ ] No unused imports
[ ] Tests pass with >80% coverage

Workflow execution:

Parse into V-001, V-002, V-003
Pre-flight confirms all are testable
Sub-agent checks V-001 → FAIL (missing docstring in utils.py:45)
Main agent adds docstring
Sub-agent re-checks → PASS
Continue to V-002...
Final report shows 3/3 passed

Key Principles

Structured feedback: Sub-agent always returns actionable, located findings
No repeated fixes: Track what was tried to avoid loops
Early exit: Don't burn attempts on unfixable issues
Regression awareness: Fixes shouldn't break other things
Audit everything: The journey matters for debugging

Error Handling

| Situation | Action | |-----------|--------| | Verification instructions cannot be parsed from input | Report "Unable to extract testable instructions" and ask the user to clarify the requirements | | Referenced source files do not exist | Mark the instruction as BLOCKED and list the missing file paths in the report | | Sub-agent returns malformed or empty output | Retry the sub-agent once; if still malformed, mark instruction as BLOCKED with "sub-agent error" | | Fix attempt introduces a regression in a previously-passing instruction | Revert the fix immediately and flag the conflict for manual review | | Dev server fails to start for browser-based criteria | Report the startup error, mark browser criteria as BLOCKED, continue with non-browser criteria |

REMINDER: Sub-agents report findings only — do not attempt fixes.

Code Verification Skill

Verify code against requirements using a main agent / sub-agent loop with structured feedback and automatic retry.

Workflow Overview

Copy this checklist and track progress:

Code Verification Progress:
- [ ] Step 1: Parse verification instructions
- [ ] Step 2: Pre-flight validation
- [ ] Step 3: Verification loop (per instruction)
- [ ] Step 4: Fix attempts (if failed)
- [ ] Step 5: Update checklist with results
- [ ] Step 6: Generate verification report
- [ ] Step 7: Log results to verification-log.jsonl

Step 1: Parse Verification Instructions

Extract each verification instruction into a discrete, testable item:

ID: Unique identifier (e.g., V-001)
Instruction: The requirement text
Test approach: How to verify (file inspection, run tests, lint, type check, etc.)
Files involved: Which files to examine
Requires Browser: Whether the instruction needs Playwright MCP verification
- Auto-detect from keywords: UI, render, display, visible, hidden, show, hide, click, hover, focus, blur, scroll, DOM, element, component, layout, responsive, style, CSS, color, font, screenshot, visual, appearance, console, error, warning, log, network, request, response, accessibility, a11y, ARIA, animation, transition, loading, performance
- Mark as: browser: true or browser: false
Browser Verification Type (if browser: true):
- DOM_INSPECTION - Element presence, visibility, content via accessibility tree snapshots
- SCREENSHOT - Visual appearance, layout verification
- CONSOLE - Browser console errors, warnings, logs
- NETWORK - API requests, responses, status codes (via network interception)
- PERFORMANCE - Load times, Core Web Vitals (via tracing)
- ACCESSIBILITY - ARIA attributes, semantic HTML, accessibility tree analysis

Step 2: Pre-flight Validation

Before the verification loop, confirm each instruction is testable:

Instruction is specific and unambiguous
Success criteria are clear
Required files/resources exist

Flag untestable instructions immediately rather than attempting verification.

Verify that all referenced files and resources actually exist before entering the verification loop. Log any missing files and mark their associated instructions as BLOCKED.

Browser-Specific Pre-Flight

For instructions with browser: true:

HTTP-First Check (before browser tools)

Many "browser" criteria can be satisfied with a simple HTTP check. Before launching browser tools, evaluate if the criterion only requires:
- Page accessibility (HTTP 200 status)
- API response validation
- Redirect verification
- Basic content presence
Attempt HTTP verification first using curl:
```
# Page loads check
curl -sf "{devServer.url}{route}" -o /dev/null && echo "PASS" || echo "FAIL"

# Response contains text
curl -s "{url}" | grep -q "{expected}" && echo "PASS" || echo "FAIL"

# API returns expected status
curl -sf -o /dev/null -w "%{http_code}" "{url}"
```
Curl error handling:
- Use -m 10 (10-second timeout) to prevent hanging on unresponsive services
- If curl returns exit code 7 (connection refused): dev server may not be running
- If curl returns exit code 28 (timeout): service is slow or unreachable
- On any curl failure, fall through to browser verification (do not mark as FAIL yet)
- Check the curl exit code ($?) immediately after execution before interpreting output
If HTTP check passes AND criterion does NOT explicitly require:
- DOM element inspection (selector, visibility)
- Visual appearance verification
- User interaction simulation
- Console log inspection
- Network timing/performance
Then: Mark as PASS (HTTP-first), skip browser verification.

Otherwise: Continue to browser tool fallback chain.
Check browser tool availability (fallback chain)
- Try tools in order: ExecuteAutomation Playwright → Browser MCP → Microsoft Playwright → Chrome DevTools
- Use first available tool for browser verification
If NO tools available (SOFT BLOCK):

Before marking as BLOCKED, attempt HTTP fallback for any remaining criteria. EXECUTE these commands via Bash tool (substitute actual URLs):
- Page loading: curl -sf {url}
- API responses: curl -s {url} | jq .
- Redirects: curl -sI {url}
Only mark as BLOCKED if both browser tools AND HTTP fallback are insufficient.

If still blocked after HTTP fallback:
- Display warning:
```
⚠️  NO BROWSER TOOLS AVAILABLE

This verification includes browser-based criteria but no browser
MCP tools are available, and HTTP fallback is insufficient.

Criteria requiring DOM/visual inspection:
- {list criteria that truly need browser}

Options:
1. Continue anyway (these criteria become manual verification)
2. Stop and configure browser tools first
```
- Use AskUserQuestion to let user choose:
  - "Continue with manual verification" → Mark browser instructions as BLOCKED, continue with non-browser criteria
  - "Stop to configure tools" → Halt verification, provide setup instructions
Verify dev server is running (if browser tools available)
- Check if configured dev server URL responds (e.g., http://localhost:3000)
- If not running, attempt to start using the configured dev server command
- Wait for configured startup time before proceeding
- If unable to start, mark as BLOCKED: "Dev server not accessible at {URL}"
Confirm target route exists (if browser tools available)
- Navigate to the page specified in the instruction using the selected browser tool
- If 404 or error, mark as BLOCKED: "Target route not found: {route}"

Step 3: Sub-Agent Verification Protocol

Spawn a sub-agent to verify each instruction. The sub-agent MUST return structured output:

VERIFICATION RESULT
-------------------
Instruction ID: [ID]
Status: PASS | FAIL | BLOCKED
Location: [file:line or "N/A"]
Severity: BLOCKING | MINOR
Finding: [What was found]
Expected: [What was expected]
Suggested Fix: [Specific fix recommendation]

Sub-agent rules:

Check ONLY the specific instruction assigned
Do not attempt fixes—report findings only
Be precise about location (file, line number, function name)
Distinguish between blocking failures and minor issues

Browser-Enhanced Verification Output

For instructions with browser: true, the sub-agent MUST use Playwright MCP and return:

BROWSER VERIFICATION RESULT
---------------------------
Instruction ID: [ID]
Status: PASS | FAIL | BLOCKED
Type: DOM | VISUAL | CONSOLE | NETWORK | PERFORMANCE | ACCESSIBILITY
URL: [URL] | Viewport: [width]x[height]

Finding: [What was observed]
Expected: [What was expected]

Details: [Type-specific information]
  - DOM: selector, found, visible, content
  - Visual: screenshot path, description
  - Console: errors, warnings, logs
  - Network: endpoint, method, status, response summary
  - Performance: load time, LCP, FID, CLS
  - Accessibility: ARIA, semantic HTML, contrast, keyboard nav

Suggested Fix: [Specific fix recommendation]

Browser Sub-Agent Rules

In addition to standard sub-agent rules, browser verification sub-agents MUST:

Start with an accessibility tree snapshot (browser_snapshot) of the initial state
Use stable selectors (prefer data-testid over complex CSS paths, or use accessibility tree element refs)
Wait for dynamic content to load before inspecting (browser_wait_for_text or browser_wait)
Capture console output before and after actions
Take screenshots (browser_screenshot) when verifying visual appearance
Test at default viewport unless criterion specifies responsive/mobile (use browser_resize to change)

Step 4: Main Agent Fix Protocol

When sub-agent reports FAIL:

Review the finding - Understand what failed and why
Check fix history - Do not repeat a previously attempted fix
Apply targeted fix - Make the minimum change to address the issue
Verify the fix was applied - Read back the edited file to confirm the change took effect. If the file is unchanged, retry the edit with corrected context.
Log the attempt - Record what was changed

Fix attempt tracking

Maintain a fix log per instruction:

FIX LOG: [Instruction ID]
--------------------------
Attempt 1: [Description of change] → [Result]
Attempt 2: [Description of change] → [Result]
...

Strategy escalation

Attempts 1-2: Direct fix based on sub-agent suggestion
Attempt 3: Try alternative approach
Attempts 4-5: Broaden scope, consider architectural changes

If the same failure pattern repeats twice, explicitly try a different strategy.

After applying fix, re-verify the specific criterion:

Re-run the sub-agent check for the failed criterion only
If still failing after 2 fix attempts, mark as FAIL with evidence from both attempts
Do NOT re-run all criteria — only the failing one

Browser-Specific Fix Strategies

Step 5: Exit Conditions

Exit the verification loop when ANY condition is met:

Step 6: Regression Check

After each fix attempt, verify:

The targeted instruction (primary check)
Any previously-passing related instructions (regression check)

If a fix breaks something else, revert and note the conflict. After reverting, verify with git status or by reading the file that the revert was successful before proceeding.

Browser Regression Checks

Step 7: Generate Verification Report

After all instructions are processed:

VERIFICATION REPORT
===================
Total Instructions: [N]
Passed: [N] ✅
Failed: [N] ❌
Needs Review: [N] ⚠️

DETAILS
-------
[V-001] ✅ [Instruction summary]
[V-002] ❌ [Instruction summary]
  - Failed after 5 attempts
  - Last error: [description]
  - Attempts: [brief log]
[V-003] ⚠️ [Instruction summary]
  - Flagged: Repeated same failure pattern
  - Recommendation: [suggestion]

AUDIT TRAIL
-----------
[Timestamp] V-001: Verified PASS on first check
[Timestamp] V-002: Attempt 1 - Changed X → FAIL
[Timestamp] V-002: Attempt 2 - Changed Y → FAIL
...

BROWSER VERIFICATION (if applicable)
------------------------------------
Browser Checks: [passed]/[total] | Blocked: [N]
Playwright: Available | Unavailable
Dev Server: [URL] | Not Running

Issues Found:
- [V-XXX] {type}: {description}

Screenshots: [list of captured files]

Example

Given a checklist:

[ ] All functions have docstrings
[ ] No unused imports
[ ] Tests pass with >80% coverage

Workflow execution:

Parse into V-001, V-002, V-003
Pre-flight confirms all are testable
Sub-agent checks V-001 → FAIL (missing docstring in utils.py:45)
Main agent adds docstring
Sub-agent re-checks → PASS
Continue to V-002...
Final report shows 3/3 passed

Key Principles

Structured feedback: Sub-agent always returns actionable, located findings
No repeated fixes: Track what was tried to avoid loops
Early exit: Don't burn attempts on unfixable issues
Regression awareness: Fixes shouldn't break other things
Audit everything: The journey matters for debugging

Error Handling

REMINDER: Sub-agents report findings only — do not attempt fixes.

Adoption

benjaminshoemaker/code-verification

$ install --global

Security Scan Results

SKILL.md

Code Verification Skill

Workflow Overview

Step 1: Parse Verification Instructions

Step 2: Pre-flight Validation

Browser-Specific Pre-Flight

Step 3: Sub-Agent Verification Protocol

Browser-Enhanced Verification Output

Browser Sub-Agent Rules

Step 4: Main Agent Fix Protocol

Fix attempt tracking

Strategy escalation

Browser-Specific Fix Strategies

Step 5: Exit Conditions

Step 6: Regression Check

Browser Regression Checks

Step 7: Generate Verification Report

Example

Key Principles

Error Handling

Related Skills

benjaminshoemaker/vision-audit

benjaminshoemaker/verify-task

benjaminshoemaker/vercel-preview

benjaminshoemaker/update-target-projects

benjaminshoemaker/code-verification

$ install --global

Security Scan Results

SKILL.md

Code Verification Skill

Workflow Overview

Step 1: Parse Verification Instructions

Step 2: Pre-flight Validation

Browser-Specific Pre-Flight

Step 3: Sub-Agent Verification Protocol

Browser-Enhanced Verification Output

Browser Sub-Agent Rules

Step 4: Main Agent Fix Protocol

Fix attempt tracking

Strategy escalation

Browser-Specific Fix Strategies

Step 5: Exit Conditions

Step 6: Regression Check

Browser Regression Checks

Step 7: Generate Verification Report

Example

Key Principles

Error Handling

Related Skills

benjaminshoemaker/vision-audit

benjaminshoemaker/verify-task

benjaminshoemaker/vercel-preview

benjaminshoemaker/update-target-projects