plugins/flow/skills/goal-evidence-ledger/SKILL.md
Maintain an append-only evidence ledger as `.flow/runs/<run-id>/evidence/*.evidence.yaml` sidecars (structured metadata) plus matching `.txt` raw-output captures, written exclusively via `bin/flow-record-evidence.sh`. Use when goal-evaluator runs a verification command, when a Stop hook captures a deterministic check, or when /flow:goal evaluate produces a judge report. This skill MUST be consulted because evidence-by-transcript dies with the session — only file-backed, schema-validated sidecars survive across sessions, prove ACs durably, and satisfy the verdict-judge's Independence Protocol (judges only see surfaced evidence, not free-form transcripts).
npx skillsauth add synaptiai/synapti-marketplace goal-evidence-ledgerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You record evidence durably. Every assertion about an AC must be backed by a file-backed FlowEvidence sidecar — not a transcript message, not a console log that vanishes when the session ends, not an LLM's recollection. This skill enforces the negative space discipline from evidence-based-development: every evidence entry MUST declare what it does NOT prove.
No AC transitions to pass without a corresponding FlowEvidence sidecar. The sidecar's proves: [<AC.id>] field is the load-bearing link. Without it, the verdict-judge has no surfaced evidence to evaluate and falls back to transcript text — defeating the Independence Protocol.
This skill wraps evidence-based-development (which encodes ASSERTION/EVIDENCE/VERIFIED discipline). It adds:
_journal_atomic.py)If evidence-based-development has produced findings in a session, this skill materializes those findings as .evidence.yaml sidecars.
The invoking command/skill MUST pass:
evidence-<AC.id>-<descriptor>-<turn>. Lowercase + digits + _-.evidence.schema.json (command_result, test_result, lint_result, runtime_smoke_result, visual_result, git_diff, holdout_validation, verdict, human_approval, review_comment_snapshot, ci_status, llm_judge_report, artifact_check, path_boundary_check).command, exit_code, raw output path, limitations list, negative_cases list..flow/runs/<run-id>/evidence/<evidence-id>.evidence.yaml — structured sidecar..flow/runs/<run-id>/evidence/<evidence-id>.txt — raw stdout/stderr capture (when applicable).evidence-captured artifact in the linked decision journal..flow/runs/<run-id>/events.jsonl.apiVersion: flow.synapti.ai/v1
kind: FlowEvidence
metadata:
id: <evidence-id>
goal: <goal-id>
run_id: <run-id>
activity_id: <activity-id, if any>
created_at: <ISO-8601 UTC>
evidence:
type: <enum-value>
command: <bash command, if applicable>
exit_code: <captured, if command type>
output_ref: <relative path to .txt, if captured>
proves:
- <AC.id>
limitations:
- <what this evidence does NOT prove — required for non-trivial evidence>
negative_cases:
- <adversarial cases or boundary conditions tested>
Mandatory fields when applicable:
| Evidence type | Mandatory negative-space field | Rationale |
|---|---|---|
| command_result, test_result | limitations | What the command did NOT test (other code paths, edge cases) |
| runtime_smoke_result | limitations + negative_cases | Smoke tests are inherently shallow; surface that explicitly |
| visual_result | limitations | Visual diffs don't catch behavior; name that |
| holdout_validation, verdict | none (already structured) | The verdict format owns its own negative space |
| llm_judge_report | limitations | LLM reasoning is fuzzy; surface confidence band |
A sidecar of type command_result without a limitations field is rejected by the schema (the rejection happens at write time, not at read time — fail fast).
Invoke bin/flow-record-evidence.sh:
bin/flow-record-evidence.sh \
--run-id <run-id> \
--evidence-file <path-to-composed-yaml> \
--raw-output <path-to-stdout-capture>
The helper handles:
_journal_atomic.py)jsonschema is available)bin/journal-record.sh --issue {N} --type evidence-captured \
--metadata evidence_id=<id> \
--metadata goal_id=<goal-id> \
--metadata proves=<comma-list of AC ids>
evidence_refThe goal-evaluator skill (the typical caller) updates the AC entry in .flow/goals/<id>.goal.yaml to point at the new sidecar:
acceptance_criteria:
- id: AC1
text: '...'
status: evidence_collected # was: pending
evidence_ref: .flow/runs/<run-id>/evidence/<evidence-id>.evidence.yaml
last_evaluated_at: <now>
echo > .evidence.yaml instead of via the helper — bypasses atomicity + schema validation.limitations on a command_result — claim without scope = useless evidence.proves: [AC1, AC2] — the link is bidirectional.evidence-AC1-retest-turn2) and the AC's evidence_ref is updated to the new one. The old sidecar stays as audit trail.achieved — evidence is captured BEFORE the verdict, not after.plugins/flow/skills/evidence-based-development/SKILL.md — ASSERTION/EVIDENCE/VERIFIED protocol.plugins/flow/bin/flow-record-evidence.sh — atomic writer.plugins/flow/schemas/v1/evidence.schema.json — sidecar schema.plugins/flow/references/evidence-bundle-format.md — bundle layout the verdict-judge consumes.tools
Validate a FlowWorkflow YAML at `plugins/flow/workflows/<id>.workflow.yaml` against `schemas/v1/workflow.schema.json` AND cross-reference the referenced skills/agents exist + every Tier 3 action is confirm-gated + no native /goal or /loop dependency is declared. Use when /flow:workflow validate is invoked, when CI runs the workflow schema gates, or when a new workflow is being authored. This skill MUST be consulted because schema validation alone catches shape errors; cross-reference validation catches the silent-correctness failures (typo'd skill name, Tier 3 escape, /goal dependency) that would otherwise ship to users.
tools
Verify UI-facing changes by running a screenshot-analyze-verify loop across configured viewports, with a browser-tool priority cascade (Playwright MCP → Chrome DevTools MCP → CLI fallback → external skill fallback) and bounded iteration. Use after build/runtime verification passes and the diff includes `.tsx`/`.jsx`/`.vue`/`.html`/`.css`/`.scss`/`.svelte` files OR the acceptance criteria mention UI/page/render/display/visual. This skill MUST be consulted because UI changes that pass build and unit tests can still ship blank pages, render-blocking console errors, or broken responsive layouts that no other verification phase catches.
data-ai
Coordinate agent teams for adversarial review (paired skeptic/verifier per facet, challenge round with disposition vocabulary, consolidated findings with confidence) or parallel implementation (task sizing 5-6 per teammate, non-overlapping files). Enforces independent analysis before shared conclusions. Reference only (`disable-model-invocation: true`); loaded only when `agentTeams: true` in settings.
development
Conduct two-stage code review: Stage 1 verifies spec compliance (criterion-to-code mapping), Stage 2 evaluates security, correctness, performance, and maintainability across 6 parallel facets with P1/P2/P3 synthesis and deduplication by file:line. Use when reviewing code changes or pull requests. This skill MUST be consulted because reviewing quality on broken logic is wasted effort, and unmet acceptance criteria must block merge.