Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

synaptiai/goal-evaluator

Name: goal-evaluator
Author: synaptiai

plugins/flow/skills/goal-evaluator/SKILL.md

npx skillsauth add synaptiai/synapti-marketplace goal-evaluator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Goal Evaluator

You convert a goal's evidence ledger into a verdict and update the goal's lifecycle. This skill wraps criterion-verification-map (which produces per-AC commands at plan time) and adds the loop-time evaluation: run the commands, capture evidence, judge satisfaction, transition state.

Iron Law

Deterministic checks beat LLM judgment when they apply. The LLM judge runs only when the contract has fuzzy rubric criteria that no command can prove. Always run deterministic checks first; never substitute judge output for a runnable command's exit code.

Inputs

The invoking command/hook MUST pass:

Goal id — <id> such that .flow/goals/<id>.goal.yaml exists with lifecycle.status == active (or waiting_for_user, waiting_for_ci, blocked — evaluator can resurrect these on resume).
Run id — for evidence ledger writes (.flow/runs/<run-id>/evidence/). If absent, evaluator infers from the goal's scope.run_id.
Trigger — manual | stop-hook | command. Affects whether judge subprocess runs (Stop hook in evaluator-loop mode auto-runs judge; manual invocation runs judge per the goal's evaluator.type).

Outputs

Updated .flow/goals/<id>.goal.yaml with new lifecycle.last_evaluation and possibly new lifecycle.status.
New *.evidence.yaml sidecars under .flow/runs/<run-id>/evidence/ for each verification command run.
goal-evaluation artifact appended to the linked decision journal.
Updated AC entries: status transitions from pending → evidence_collected → pass | fail; evidence_ref set to the new sidecar path.

Workflow

Step 1: Load contract + ledger

Read .flow/goals/<id>.goal.yaml. Verify schema. Read existing evidence sidecars under .flow/runs/<run-id>/evidence/ for any AC with evidence_ref already set.

Step 2: Run deterministic checks

For each AC where verification_command is set and must_pass is true OR all-pass evaluation is required:

# Capture stdout + exit code
OUTPUT=$(mktemp)
bash -c "${AC.verification_command}" > "$OUTPUT" 2>&1
EXIT_CODE=$?

Then assemble a FlowEvidence YAML and write via bin/flow-record-evidence.sh:

apiVersion: flow.synapti.ai/v1
kind: FlowEvidence
metadata:
  id: evidence-<AC.id>-eval-<turn>
  goal: <goal-id>
  run_id: <run-id>
  created_at: <now>
evidence:
  type: command_result
  command: <AC.verification_command>
  exit_code: <captured>
  output_ref: <relative path to .txt copy>
  proves:
    - <AC.id>
  limitations:
    - <list from criterion-verification-map's "Does NOT promise" field if present>

Update the AC entry: status: evidence_collected, evidence_ref: <sidecar path>, last_evaluated_at: <now>, last_result: <exit-code or summary>.

Step 3: Deterministic verdict

After all deterministic checks:

All AC with must_pass: true have exit_code == 0 → status candidate = pass.
Any must_pass: true AC with exit_code != 0 → status candidate = fail.
AC missing verification_command (= fuzzy criterion) → status candidate = incomplete (LLM judge required).

Step 4: Path-boundary check

If the goal has constraints.allowed_paths, run git diff --name-only (current branch vs. base). Any modified file outside allowed_paths → emit a path_boundary_check FlowEvidence with proves: [] and the violating filenames; transition status to blocked with reason path_boundary_violation.

Step 5: LLM judge (conditional)

Run the judge subprocess ONLY when:

evaluator.type == hybrid AND deterministic candidate is incomplete (= fuzzy criteria remain), OR
evaluator.type == flow_verdict_judge and the user explicitly invoked /flow:goal evaluate (manual review).

Spawn Agent(goal-evaluator-judge) with:

The goal's outcome + AC table
The just-written evidence sidecars (paths only — the judge reads them itself)
The transcript-level evidence bundle (Bundle format: plugins/flow/references/evidence-bundle-format.md)
The denied_context list (passed verbatim)

The judge returns verdict + confidence + delta + next_step_hint as a structured table.

Step 6: Compose lifecycle update

| Deterministic candidate | Judge verdict | Final lifecycle.status | |---|---|---| | pass (all must_pass green, no fuzzy) | (judge skipped) | achieved | | pass + fuzzy criteria | achieved | achieved | | pass + fuzzy criteria | not_achieved | active (continue) | | fail | (judge may run for context) | active (continue, surface failing AC) | | incomplete | not_achieved | active | | incomplete | blocked (with blocker_type) | blocked | | incomplete | needs_human_review | waiting_for_user | | path_boundary_violation | (judge skipped) | blocked |

Non-terminal transitions (active, blocked, waiting_for_user, waiting_for_ci): Update lifecycle.status, lifecycle.turns_evaluated += 1, lifecycle.last_evaluation = {result, reason, at}. Write back via bin/flow-goal-record.sh immediately.

Terminal transitions (achieved, failed, cancelled) — F10 contract: The skill does NOT write the terminal status itself. Instead, it returns proposed_transition: {to: <achieved|failed|cancelled>, reason: ..., turns_evaluated: ...} in its structured response and leaves the goal's persisted lifecycle.status at its current non-terminal value. The caller is responsible for invoking AskUserQuestion and, on user confirmation, calling bin/flow-goal-record.sh --update-lifecycle to write the terminal status.

The Stop-hook evaluator-loop path is an exception: when the hook calls this skill (or the deterministic path produces a terminal verdict), Tier 2 confirmation cannot run inside the hook (no AskUserQuestion in hook context). The hook persists the verdict via bin/flow-record-verdict.sh and emits a decision: "approve" with a next_step_hint pointing to /flow:goal evaluate <id> — the user explicitly confirms via the command path on the next turn.

Step 7: Record manifest artifact

bin/journal-record.sh --issue {N} --type goal-evaluation \
  --metadata goal_id=<id> \
  --metadata result=<lifecycle.status> \
  --metadata evidence_bundle=<run-dir relative path> \
  --metadata failures=<comma-list of failing AC ids or 'none'>

Step 8: Return the structured verdict to the caller (skill does NOT write)

This skill does NOT write .flow/runs/<run-id>/last-verdict.json. The skill computes the verdict (verdict, confidence, delta, reason, next_step_hint, criterion_results) and returns it to the calling command or hook. The caller is the single owner of verdict persistence.

Callers responsible for the write (one per invocation context):

/flow:goal evaluate <id> (commands/goal.md) — invokes bin/flow-record-verdict.sh after the skill returns. source: "command".
hooks/scripts/flow-goal-evaluator.sh (Stop-hook evaluator-loop mode) — invokes bin/flow-record-verdict.sh via its internal _record_verdict() helper after the judge subprocess returns. source: "evaluator-loop".

Contract:

The skill MUST return verdict + confidence + delta + reason + next_step_hint in its structured response (the format the caller parses for the persistence call).
The skill MUST NOT itself invoke bin/flow-record-verdict.sh. Centralizing persistence in the caller prevents the double-write where the skill wrote first and the command's heredoc immediately overwrote — with the skill's source: "skill" silently lost.
The caller MUST invoke bin/flow-record-verdict.sh and MUST handle helper failure as non-fatal (surface to stderr via ||; do NOT abort the evaluation; the in-memory verdict is still correct, only next-turn delta semantics are lost).

Why this split: Three callers (skill, command, hook) writing through the same helper produced last-writer-wins races. Two callers (command, hook) with no skill-side write is race-free.

Step 9: Stuck detection (Stop-hook evaluator-loop mode only)

If trigger == stop-hook AND the new pass-set hash matches the previous turn's hash for flow.goals.failAfterStuckTurns consecutive turns (default 3), transition status to failed with reason stuck_no_progress. This prevents the evaluator loop from churning indefinitely on a goal that can't make forward progress.

Anti-patterns

❌ Running LLM judge when deterministic checks suffice — costs money, slower, less reliable.
❌ Updating lifecycle.status without writing a goal-evaluation artifact — breaks audit trail.
❌ Marking an AC as pass without an evidence_ref — bypasses the evidence ledger.
❌ Skipping path-boundary check when allowed_paths is set — goals exist to fence scope.

Reuse map

plugins/flow/skills/criterion-verification-map/SKILL.md — AC → verification command shape.
plugins/flow/agents/verdict-judge.md — independence protocol the LLM judge inherits.
plugins/flow/agents/goal-evaluator-judge.md — the specialized judge this skill dispatches.
plugins/flow/bin/flow-record-evidence.sh — atomic evidence sidecar writes.
plugins/flow/bin/flow-goal-record.sh — atomic goal lifecycle updates.
plugins/flow/bin/flow-record-verdict.sh — last-verdict.json producer; Step 8 invokes this.
plugins/flow/references/evidence-bundle-format.md — canonical evidence layout.

synaptiai/goal-evaluator

plugins/flow/skills/goal-evaluator/SKILL.md

Evaluate a FlowGoal against its evidence ledger and update lifecycle status to one of {pass, incomplete, fail, needs_human_review, blocked} by running deterministic verification commands first, then (when stopHookEnforcement=evaluator-loop or explicit /flow:goal evaluate invocation) dispatching the goal-evaluator-judge agent for fuzzy rubric criteria. Use when /flow:goal evaluate is invoked, when the Stop hook fires in evaluator-loop mode, or when /flow:start Phase 4 needs to convert AC evidence into a verdict. This skill MUST be consulted because lifecycle transitions without deterministic evidence enable silent premature completion — the goal contract is only as good as the evaluator that proves or disproves it.

4 stars

development

Updated May 23, 2026

$ install --global

skillsauth

npx skillsauth add synaptiai/synapti-marketplace goal-evaluator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 23, 2026, 4:35 AM21.9s1 file scanned

SKILL.md

name:: goal-evaluator
description:: Evaluate a FlowGoal against its evidence ledger and update lifecycle status to one of {pass, incomplete, fail, needs_human_review, blocked} by running deterministic verification commands first, then (when stopHookEnforcement=evaluator-loop or explicit /flow:goal evaluate invocation) dispatching the goal-evaluator-judge agent for fuzzy rubric criteria. Use when /flow:goal evaluate is invoked, when the Stop hook fires in evaluator-loop mode, or when /flow:start Phase 4 needs to convert AC evidence into a verdict. This skill MUST be consulted because lifecycle transitions without deterministic evidence enable silent premature completion — the goal contract is only as good as the evaluator that proves or disproves it.
allowed-tools:: Bash, Read, Edit, Agent
agent:: general-purpose

Goal Evaluator

Iron Law

Inputs

The invoking command/hook MUST pass:

Goal id — <id> such that .flow/goals/<id>.goal.yaml exists with lifecycle.status == active (or waiting_for_user, waiting_for_ci, blocked — evaluator can resurrect these on resume).
Run id — for evidence ledger writes (.flow/runs/<run-id>/evidence/). If absent, evaluator infers from the goal's scope.run_id.
Trigger — manual | stop-hook | command. Affects whether judge subprocess runs (Stop hook in evaluator-loop mode auto-runs judge; manual invocation runs judge per the goal's evaluator.type).

Outputs

Updated .flow/goals/<id>.goal.yaml with new lifecycle.last_evaluation and possibly new lifecycle.status.
New *.evidence.yaml sidecars under .flow/runs/<run-id>/evidence/ for each verification command run.
goal-evaluation artifact appended to the linked decision journal.
Updated AC entries: status transitions from pending → evidence_collected → pass | fail; evidence_ref set to the new sidecar path.

Workflow

Step 1: Load contract + ledger

Read .flow/goals/<id>.goal.yaml. Verify schema. Read existing evidence sidecars under .flow/runs/<run-id>/evidence/ for any AC with evidence_ref already set.

Step 2: Run deterministic checks

For each AC where verification_command is set and must_pass is true OR all-pass evaluation is required:

# Capture stdout + exit code
OUTPUT=$(mktemp)
bash -c "${AC.verification_command}" > "$OUTPUT" 2>&1
EXIT_CODE=$?

Then assemble a FlowEvidence YAML and write via bin/flow-record-evidence.sh:

apiVersion: flow.synapti.ai/v1
kind: FlowEvidence
metadata:
  id: evidence-<AC.id>-eval-<turn>
  goal: <goal-id>
  run_id: <run-id>
  created_at: <now>
evidence:
  type: command_result
  command: <AC.verification_command>
  exit_code: <captured>
  output_ref: <relative path to .txt copy>
  proves:
    - <AC.id>
  limitations:
    - <list from criterion-verification-map's "Does NOT promise" field if present>

Update the AC entry: status: evidence_collected, evidence_ref: <sidecar path>, last_evaluated_at: <now>, last_result: <exit-code or summary>.

Step 3: Deterministic verdict

After all deterministic checks:

All AC with must_pass: true have exit_code == 0 → status candidate = pass.
Any must_pass: true AC with exit_code != 0 → status candidate = fail.
AC missing verification_command (= fuzzy criterion) → status candidate = incomplete (LLM judge required).

Step 4: Path-boundary check

Step 5: LLM judge (conditional)

Run the judge subprocess ONLY when:

evaluator.type == hybrid AND deterministic candidate is incomplete (= fuzzy criteria remain), OR
evaluator.type == flow_verdict_judge and the user explicitly invoked /flow:goal evaluate (manual review).

Spawn Agent(goal-evaluator-judge) with:

The goal's outcome + AC table
The just-written evidence sidecars (paths only — the judge reads them itself)
The transcript-level evidence bundle (Bundle format: plugins/flow/references/evidence-bundle-format.md)
The denied_context list (passed verbatim)

The judge returns verdict + confidence + delta + next_step_hint as a structured table.

Step 6: Compose lifecycle update

Step 7: Record manifest artifact

bin/journal-record.sh --issue {N} --type goal-evaluation \
  --metadata goal_id=<id> \
  --metadata result=<lifecycle.status> \
  --metadata evidence_bundle=<run-dir relative path> \
  --metadata failures=<comma-list of failing AC ids or 'none'>

Step 8: Return the structured verdict to the caller (skill does NOT write)

Callers responsible for the write (one per invocation context):

/flow:goal evaluate <id> (commands/goal.md) — invokes bin/flow-record-verdict.sh after the skill returns. source: "command".
hooks/scripts/flow-goal-evaluator.sh (Stop-hook evaluator-loop mode) — invokes bin/flow-record-verdict.sh via its internal _record_verdict() helper after the judge subprocess returns. source: "evaluator-loop".

Contract:

The skill MUST return verdict + confidence + delta + reason + next_step_hint in its structured response (the format the caller parses for the persistence call).
The skill MUST NOT itself invoke bin/flow-record-verdict.sh. Centralizing persistence in the caller prevents the double-write where the skill wrote first and the command's heredoc immediately overwrote — with the skill's source: "skill" silently lost.
The caller MUST invoke bin/flow-record-verdict.sh and MUST handle helper failure as non-fatal (surface to stderr via ||; do NOT abort the evaluation; the in-memory verdict is still correct, only next-turn delta semantics are lost).

Why this split: Three callers (skill, command, hook) writing through the same helper produced last-writer-wins races. Two callers (command, hook) with no skill-side write is race-free.

Step 9: Stuck detection (Stop-hook evaluator-loop mode only)

Anti-patterns

❌ Running LLM judge when deterministic checks suffice — costs money, slower, less reliable.
❌ Updating lifecycle.status without writing a goal-evaluation artifact — breaks audit trail.
❌ Marking an AC as pass without an evidence_ref — bypasses the evidence ledger.
❌ Skipping path-boundary check when allowed_paths is set — goals exist to fence scope.

Reuse map

plugins/flow/skills/criterion-verification-map/SKILL.md — AC → verification command shape.
plugins/flow/agents/verdict-judge.md — independence protocol the LLM judge inherits.
plugins/flow/agents/goal-evaluator-judge.md — the specialized judge this skill dispatches.
plugins/flow/bin/flow-record-evidence.sh — atomic evidence sidecar writes.
plugins/flow/bin/flow-goal-record.sh — atomic goal lifecycle updates.
plugins/flow/bin/flow-record-verdict.sh — last-verdict.json producer; Step 8 invokes this.
plugins/flow/references/evidence-bundle-format.md — canonical evidence layout.

Related Skills

synaptiai/workflow-validation

tools

VerifiedTrustedCommunity

Validate a FlowWorkflow YAML at `plugins/flow/workflows/<id>.workflow.yaml` against `schemas/v1/workflow.schema.json` AND cross-reference the referenced skills/agents exist + every Tier 3 action is confirm-gated + no native /goal or /loop dependency is declared. Use when /flow:workflow validate is invoked, when CI runs the workflow schema gates, or when a new workflow is being authored. This skill MUST be consulted because schema validation alone catches shape errors; cross-reference validation catches the silent-correctness failures (typo'd skill name, Tier 3 escape, /goal dependency) that would otherwise ship to users.

5SKILL.mdUpdated May 23, 2026

synaptiai/workflow-validation

synaptiai/visual-verification

tools

VerifiedTrustedCommunity

Verify UI-facing changes by running a screenshot-analyze-verify loop across configured viewports, with a browser-tool priority cascade (Playwright MCP → Chrome DevTools MCP → CLI fallback → external skill fallback) and bounded iteration. Use after build/runtime verification passes and the diff includes `.tsx`/`.jsx`/`.vue`/`.html`/`.css`/`.scss`/`.svelte` files OR the acceptance criteria mention UI/page/render/display/visual. This skill MUST be consulted because UI changes that pass build and unit tests can still ship blank pages, render-blocking console errors, or broken responsive layouts that no other verification phase catches.

5SKILL.mdUpdated May 7, 2026

synaptiai/visual-verification

synaptiai/team-coordination

data-ai

VerifiedTrustedCommunity

Coordinate agent teams for adversarial review (paired skeptic/verifier per facet, challenge round with disposition vocabulary, consolidated findings with confidence) or parallel implementation (task sizing 5-6 per teammate, non-overlapping files). Enforces independent analysis before shared conclusions. Reference only (`disable-model-invocation: true`); loaded only when `agentTeams: true` in settings.

5SKILL.mdUpdated Apr 15, 2026

synaptiai/team-coordination

synaptiai/code-review-methodology

development

VerifiedTrustedCommunity

Conduct two-stage code review: Stage 1 verifies spec compliance (criterion-to-code mapping), Stage 2 evaluates security, correctness, performance, and maintainability across 6 parallel facets with P1/P2/P3 synthesis and deduplication by file:line. Use when reviewing code changes or pull requests. This skill MUST be consulted because reviewing quality on broken logic is wasted effort, and unmet acceptance criteria must block merge.

5SKILL.mdUpdated Apr 15, 2026

synaptiai/code-review-methodology

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/synaptiai/synapti-marketplace.git

# Copy into Claude Code skills folder (global)
cp -r synapti-marketplace/plugins/flow/skills/goal-evaluator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

synaptiai/synapti-marketplace

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT