Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

synaptiai/criterion-verification-map

Name: criterion-verification-map
Author: synaptiai

plugins/flow/skills/criterion-verification-map/SKILL.md

npx skillsauth add synaptiai/synapti-marketplace criterion-verification-map

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Criterion Verification Map

Domain skill that treats acceptance criteria as eval sources. Each criterion produces a concrete, runnable check at plan time — not at verify time. Planning a criterion without a runnable command is a blocking error.

Iron Law

EVERY ACCEPTANCE CRITERION IS AN EVAL SOURCE. At plan time, each criterion must produce a runnable verification command. No criterion is deferred to verify time with "we'll figure out how to test this later." No criterion passes by assumption. No criterion is "too obvious to verify."

Criteria as Eval Sources (Plan Time)

An acceptance criterion is useful only if it can be evaluated mechanically. At plan time, the criterion must be transformed into:

A verification type (classification — see table below)
A runnable command (the exact bash/test/curl/script invocation that will be executed at verify time)
Expected evidence shape (what the command's output must contain to count as PASS)
What the criterion does NOT promise (non-goals scoped to this criterion — prevents scope creep and false-positive verdicts)

If any of the four cannot be filled in at plan time, the criterion is not ready. Escalate through the Spec Validation Gate in the start.md EXPLORE phase, not here.

Criterion Classification Table

| Criterion Type | Signal Words | Verification Method | Evidence Format | |---------------|-------------|-------------------|-----------------| | Behavioral (logic) | "when X then Y", "should return", "must validate" | Unit/integration test | Test runner output (pass/fail + relevant lines) | | API endpoint | "status code", "response", "endpoint", "header" | curl/fetch command | HTTP status + response body snippet | | UI rendering | "displays", "shows", "page", "renders", "layout" | Screenshot + visual analysis | Screenshot path + analysis summary | | Error handling | "error message", "invalid", "fails gracefully" | Test with invalid input | Error output + expected vs actual | | Performance | "within N ms", "rate limit", "timeout" | Benchmark/timing command | Timing output | | Configuration | "config", "environment", "setting" | Build/load test | Build success log | | Data processing | "transforms", "converts", "output matches" | Run with test data | Input/output comparison | | Contract (schema/type) | "schema", "type", "interface", "signature", "shape" | Schema validator / type-check | Validator output or type-check diagnostic |

Plan-Time Task Creation

Per commands/start.md Phase 2 (PLAN), tasks are atomic: implementation, test, and evidence collection are bundled into one task. This skill's role at plan time is to produce the verification command and expected evidence shape that gets embedded into that atomic task's description.

For each acceptance criterion, the atomic task description must include:

Criterion: {full criterion text}
Verification type: {type from classification table}
Verification command: {exact runnable command}
Expected evidence: {what successful output looks like}
Does NOT promise: {non-goals scoped to this criterion}

The atomic task flows through implementation → test → evidence collection within Phase 3 (CODE). Evidence is captured at task-completion time.

Evidence Collection Protocol

During VERIFY phase, for each atomic task's verification command:

TaskUpdate(taskId, status: "in_progress")
Execute the verification command captured at plan time
Capture output as evidence
TaskUpdate(taskId, status: "completed", result: "EVIDENCE_COLLECTED")

Evidence Bundle Format

After all verification commands have run, assemble the evidence bundle — a structured text document that the verdict-judge agent receives. Every criterion block MUST include all fields below. The "Does NOT promise" field is captured at plan time; the three completeness subsections (What was NOT tested, Known limitations, Negative/adversarial cases covered) are populated at verify time and may not be omitted — they force the implementer to state the shape of the evidence honestly so the judge can reason about gaps.

## Evidence Bundle for Issue #{N}

Generated: {timestamp}
Branch: {branch name}
Commits: {count} since branch creation

### Criterion 1: {full criterion text}
- **Type**: {behavioral|api|ui|error|performance|config|data|contract}
- **Verification command**: `{command that was run}`
- **Evidence**:

{raw output from verification command}

- **What the criterion does NOT promise**:
- {non-goal 1 — e.g. "does not guarantee idempotency across retries"}
- {non-goal 2 — e.g. "does not cover the admin flow"}
- {non-goal 3 — e.g. "does not handle concurrent writes"}
- **Screenshot**: {path, if UI type — otherwise omit}
- **What was NOT tested**: {Explicit list of related behaviors, inputs, code paths, environments, or configurations that this evidence does not cover. Never "N/A" — if you cannot think of anything, you have not thought hard enough. State at minimum: untested environments, untested edge inputs, untested concurrency/scale conditions, untested integrations.}
- **Known limitations of this evidence**: {How the evidence could be misleading even though it looks positive. Examples: "test uses a mocked external API," "smoke test only hits the happy path," "screenshot was taken at desktop viewport only," "timing numbers taken on an idle machine, not under load." If the verification command is self-reported (agent-run test output), state that explicitly.}
- **Negative/adversarial cases covered**: {List the specific failure modes, invalid inputs, and abuse cases this evidence demonstrates the system rejects or handles safely. Examples: "rejects empty email with 400," "returns 401 on expired token," "displays error state on network failure." If none were tested, state "none" — do not leave blank — and expect the verdict-judge to treat this as a gap.}

### Criterion 2: {full criterion text}
- **Type**: {type}
- **Verification command**: `{command}`
- **Evidence**:

{output}

- **What the criterion does NOT promise**:
- {non-goal items}
- **What was NOT tested**: {as above}
- **Known limitations of this evidence**: {as above}
- **Negative/adversarial cases covered**: {as above}

{repeat for all criteria — every criterion MUST have all four subsections}

Why "Does NOT Promise" Is a First-Class Field

A verdict judge receiving only criterion text and evidence tends to over-credit passing commands — e.g., a passing unit test is treated as proof the whole behavior is correct, when the test only covered one path. The "does NOT promise" field explicitly fences each criterion's scope so the judge does not inflate a narrow PASS into a broad guarantee. It also gives reviewers and future readers a shared understanding of what was intentionally out of scope.

Populate this field at plan time from the non-goals captured in the EXPLORE phase Specification capture sub-step. Each criterion inherits the global non-goals plus any criterion-specific non-goals discovered during planning.

Completeness Subsections Are Mandatory

The verdict-judge treats any criterion missing "Does NOT promise" or any of the three completeness subsections ("What was NOT tested", "Known limitations of this evidence", "Negative/adversarial cases covered") as having incomplete evidence and will FAIL that criterion. Do not omit them. If a subsection is genuinely empty (e.g., no adversarial cases tested), write "none" explicitly rather than removing the heading.

What the Evidence Bundle Does NOT Include

The bundle is passed to the verdict-judge agent, which must judge independently. Therefore:

NO diff — the judge doesn't see the code changes
NO decision journal — the judge doesn't see the rationale
NO planning notes — the judge doesn't see why approaches were chosen
NO self-review findings — the judge evaluates from spec + evidence only

The judge receives ONLY:

The acceptance criteria (from the issue)
The evidence bundle (from this skill)
The holdout-validation output (from the holdout-validation skill — added in v2.0 to detect self-review claims that don't match file state)

Verification Method Examples

Behavioral (test output):

npm run test -- --grep "user authentication" 2>&1 | tail -20

API endpoint (curl):

curl -s -w "\nHTTP_STATUS:%{http_code}" http://localhost:3000/api/login \
  -H "Content-Type: application/json" \
  -d '{"email":"[email protected]","password":"wrong"}' 2>&1

UI rendering (screenshot path):

Screenshot saved to: .screenshots/login-page-desktop.png
Visual analysis: Login form visible with email and password fields, submit button enabled.

Error handling:

npm run test -- --grep "invalid credentials" 2>&1 | tail -10

Contract (schema/type):

npx tsc --noEmit 2>&1 | tail -20
# or
npx ajv validate -s schemas/payload.json -d fixtures/sample.json

Integration with Start Command

The criterion-verification-map skill is invoked in two phases of /flow:start:

EXPLORE / PLAN phase (plan time) — Classify each criterion, produce a runnable verification command, and capture "does NOT promise" non-goals. These become the Verification command and Does NOT promise fields of the atomic task created in PLAN.
VERIFY phase (verify time) — Execute the verification commands captured at plan time and assemble the evidence bundle.

synaptiai/criterion-verification-map

plugins/flow/skills/criterion-verification-map/SKILL.md

Transform acceptance criteria into plan-time runnable verification commands (behavioral, API, UI, error, config, data, contract types) with expected evidence shapes, then execute at verify time and assemble evidence bundles with honest completeness subsections (untested paths, known limitations, adversarial cases covered). Use when planning implementation against issue acceptance criteria or verifying completeness. This skill MUST be consulted because deferring verification to later causes incomplete PRs, and suppressing evidence gaps prevents the verdict judge from reasoning about gaps.

4 stars

development

Updated May 7, 2026

$ install --global

skillsauth

npx skillsauth add synaptiai/synapti-marketplace criterion-verification-map

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 7, 2026, 4:44 AM132.8s1 file scanned

SKILL.md

name:: criterion-verification-map
description:: Transform acceptance criteria into plan-time runnable verification commands (behavioral, API, UI, error, config, data, contract types) with expected evidence shapes, then execute at verify time and assemble evidence bundles with honest completeness subsections (untested paths, known limitations, adversarial cases covered). Use when planning implementation against issue acceptance criteria or verifying completeness. This skill MUST be consulted because deferring verification to later causes incomplete PRs, and suppressing evidence gaps prevents the verdict judge from reasoning about gaps.
allowed-tools:: Bash, Read, Grep, Glob, TaskCreate, TaskList, TaskUpdate, TaskGet
context:: fork
agent:: general-purpose

Criterion Verification Map

Iron Law

Criteria as Eval Sources (Plan Time)

An acceptance criterion is useful only if it can be evaluated mechanically. At plan time, the criterion must be transformed into:

A verification type (classification — see table below)
A runnable command (the exact bash/test/curl/script invocation that will be executed at verify time)
Expected evidence shape (what the command's output must contain to count as PASS)
What the criterion does NOT promise (non-goals scoped to this criterion — prevents scope creep and false-positive verdicts)

If any of the four cannot be filled in at plan time, the criterion is not ready. Escalate through the Spec Validation Gate in the start.md EXPLORE phase, not here.

Criterion Classification Table

Plan-Time Task Creation

For each acceptance criterion, the atomic task description must include:

Criterion: {full criterion text}
Verification type: {type from classification table}
Verification command: {exact runnable command}
Expected evidence: {what successful output looks like}
Does NOT promise: {non-goals scoped to this criterion}

The atomic task flows through implementation → test → evidence collection within Phase 3 (CODE). Evidence is captured at task-completion time.

Evidence Collection Protocol

During VERIFY phase, for each atomic task's verification command:

TaskUpdate(taskId, status: "in_progress")
Execute the verification command captured at plan time
Capture output as evidence
TaskUpdate(taskId, status: "completed", result: "EVIDENCE_COLLECTED")

Evidence Bundle Format

## Evidence Bundle for Issue #{N}

Generated: {timestamp}
Branch: {branch name}
Commits: {count} since branch creation

### Criterion 1: {full criterion text}
- **Type**: {behavioral|api|ui|error|performance|config|data|contract}
- **Verification command**: `{command that was run}`
- **Evidence**:

{raw output from verification command}

- **What the criterion does NOT promise**:
- {non-goal 1 — e.g. "does not guarantee idempotency across retries"}
- {non-goal 2 — e.g. "does not cover the admin flow"}
- {non-goal 3 — e.g. "does not handle concurrent writes"}
- **Screenshot**: {path, if UI type — otherwise omit}
- **What was NOT tested**: {Explicit list of related behaviors, inputs, code paths, environments, or configurations that this evidence does not cover. Never "N/A" — if you cannot think of anything, you have not thought hard enough. State at minimum: untested environments, untested edge inputs, untested concurrency/scale conditions, untested integrations.}
- **Known limitations of this evidence**: {How the evidence could be misleading even though it looks positive. Examples: "test uses a mocked external API," "smoke test only hits the happy path," "screenshot was taken at desktop viewport only," "timing numbers taken on an idle machine, not under load." If the verification command is self-reported (agent-run test output), state that explicitly.}
- **Negative/adversarial cases covered**: {List the specific failure modes, invalid inputs, and abuse cases this evidence demonstrates the system rejects or handles safely. Examples: "rejects empty email with 400," "returns 401 on expired token," "displays error state on network failure." If none were tested, state "none" — do not leave blank — and expect the verdict-judge to treat this as a gap.}

### Criterion 2: {full criterion text}
- **Type**: {type}
- **Verification command**: `{command}`
- **Evidence**:

{output}

- **What the criterion does NOT promise**:
- {non-goal items}
- **What was NOT tested**: {as above}
- **Known limitations of this evidence**: {as above}
- **Negative/adversarial cases covered**: {as above}

{repeat for all criteria — every criterion MUST have all four subsections}

Why "Does NOT Promise" Is a First-Class Field

Completeness Subsections Are Mandatory

What the Evidence Bundle Does NOT Include

The bundle is passed to the verdict-judge agent, which must judge independently. Therefore:

NO diff — the judge doesn't see the code changes
NO decision journal — the judge doesn't see the rationale
NO planning notes — the judge doesn't see why approaches were chosen
NO self-review findings — the judge evaluates from spec + evidence only

The judge receives ONLY:

The acceptance criteria (from the issue)
The evidence bundle (from this skill)
The holdout-validation output (from the holdout-validation skill — added in v2.0 to detect self-review claims that don't match file state)

Verification Method Examples

Behavioral (test output):

npm run test -- --grep "user authentication" 2>&1 | tail -20

API endpoint (curl):

curl -s -w "\nHTTP_STATUS:%{http_code}" http://localhost:3000/api/login \
  -H "Content-Type: application/json" \
  -d '{"email":"[email protected]","password":"wrong"}' 2>&1

UI rendering (screenshot path):

Screenshot saved to: .screenshots/login-page-desktop.png
Visual analysis: Login form visible with email and password fields, submit button enabled.

Error handling:

npm run test -- --grep "invalid credentials" 2>&1 | tail -10

Contract (schema/type):

npx tsc --noEmit 2>&1 | tail -20
# or
npx ajv validate -s schemas/payload.json -d fixtures/sample.json

Integration with Start Command

The criterion-verification-map skill is invoked in two phases of /flow:start:

EXPLORE / PLAN phase (plan time) — Classify each criterion, produce a runnable verification command, and capture "does NOT promise" non-goals. These become the Verification command and Does NOT promise fields of the atomic task created in PLAN.
VERIFY phase (verify time) — Execute the verification commands captured at plan time and assemble the evidence bundle.

Related Skills

synaptiai/workflow-validation

tools

VerifiedTrustedCommunity

Validate a FlowWorkflow YAML at `plugins/flow/workflows/<id>.workflow.yaml` against `schemas/v1/workflow.schema.json` AND cross-reference the referenced skills/agents exist + every Tier 3 action is confirm-gated + no native /goal or /loop dependency is declared. Use when /flow:workflow validate is invoked, when CI runs the workflow schema gates, or when a new workflow is being authored. This skill MUST be consulted because schema validation alone catches shape errors; cross-reference validation catches the silent-correctness failures (typo'd skill name, Tier 3 escape, /goal dependency) that would otherwise ship to users.

5SKILL.mdUpdated May 23, 2026

synaptiai/workflow-validation

synaptiai/visual-verification

tools

VerifiedTrustedCommunity

Verify UI-facing changes by running a screenshot-analyze-verify loop across configured viewports, with a browser-tool priority cascade (Playwright MCP → Chrome DevTools MCP → CLI fallback → external skill fallback) and bounded iteration. Use after build/runtime verification passes and the diff includes `.tsx`/`.jsx`/`.vue`/`.html`/`.css`/`.scss`/`.svelte` files OR the acceptance criteria mention UI/page/render/display/visual. This skill MUST be consulted because UI changes that pass build and unit tests can still ship blank pages, render-blocking console errors, or broken responsive layouts that no other verification phase catches.

5SKILL.mdUpdated May 7, 2026

synaptiai/visual-verification

synaptiai/team-coordination

data-ai

VerifiedTrustedCommunity

Coordinate agent teams for adversarial review (paired skeptic/verifier per facet, challenge round with disposition vocabulary, consolidated findings with confidence) or parallel implementation (task sizing 5-6 per teammate, non-overlapping files). Enforces independent analysis before shared conclusions. Reference only (`disable-model-invocation: true`); loaded only when `agentTeams: true` in settings.

5SKILL.mdUpdated Apr 15, 2026

synaptiai/team-coordination

synaptiai/code-review-methodology

development

VerifiedTrustedCommunity

Conduct two-stage code review: Stage 1 verifies spec compliance (criterion-to-code mapping), Stage 2 evaluates security, correctness, performance, and maintainability across 6 parallel facets with P1/P2/P3 synthesis and deduplication by file:line. Use when reviewing code changes or pull requests. This skill MUST be consulted because reviewing quality on broken logic is wasted effort, and unmet acceptance criteria must block merge.

5SKILL.mdUpdated Apr 15, 2026

synaptiai/code-review-methodology

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/synaptiai/synapti-marketplace.git

# Copy into Claude Code skills folder (global)
cp -r synapti-marketplace/plugins/flow/skills/criterion-verification-map ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

synaptiai/synapti-marketplace

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT