plugins/flow/skills/autonomous-workflow/SKILL.md
Execute development workflows through Explore-Plan-Code-Verify phases with task-driven tracking, Tier 1/2/3 action classification, decision journaling, and bounded debug loops. Use when executing any development workflow autonomously or orchestrating multi-step implementation tasks. This skill MUST be consulted because skipping phases causes rework, and unbounded verification loops cause agents to loop forever on unsolvable problems.
npx skillsauth add synaptiai/synapti-marketplace autonomous-workflowInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Foundation skill governing how Claude executes development workflows autonomously.
NO SKIPPING PHASES. Explore before Plan, Plan before Code, Code before Verify. Every phase produces an artifact.
Jumping to code without exploration is the #1 cause of rework. Jumping to "done" without verification is the #1 cause of bugs reaching review.
Every multi-step workflow follows this loop:
goToDefinition to trace code paths from issue keywords to implementation, and findReferences to assess the impact of planned changes — this enhances text-based grep searches with semantic understanding.hover to understand types and signatures of existing code before modifying it.lsp.diagnosticsAsQuality), collect them as an additional quality signal — errors are P1, warnings are P2. LSP diagnostics complement, never replace, CLI-based checks.
b. Runtime: Build the project, start it, verify at runtime. If anything fails, enter the debug-fix-retest loop (bounded by closedLoop.maxDebugIterations).
c. Review: Self-review with fix-forward — fix P1/P2 findings immediately, don't just report them.
d. Verdict: Independent judgment — dispatch verdict-judge agent (when verdict.enabled) with acceptance criteria + evidence bundle. The judge has no access to code-writing rationale, diff, or decision journal. It evaluates outcomes, not process. Each criterion receives PASS/FAIL/NEEDS-HUMAN-REVIEW. FAIL verdicts trigger fix loops; NEEDS-HUMAN-REVIEW escalates to user.Use Task tools as first-class workflow primitives:
| Tool | When | |------|------| | TaskCreate | Start of PLAN phase — one task per deliverable | | TaskUpdate(in_progress) | Before starting work on a task | | TaskUpdate(completed) | After task passes verification | | TaskList | At checkpoints to confirm progress | | TaskGet | Before working on a task to get full context |
Tasks have clear subjects (imperative form) and descriptions with acceptance criteria.
A task may NOT be marked completed (TaskUpdate(taskId, status: "completed")) until ALL of the following conditions are met:
All tests pass — both existing tests and any new tests written for this task. If any test fails, the task enters the debug-fix-retest loop and remains in_progress until tests pass or the user is escalated to.
Verification evidence captured — the verification command from the task description has been run and its output recorded as evidence for this task's acceptance criterion. Evidence must be collected at task-completion time, not deferred to VERIFY phase.
No out-of-context files — all files modified during this task have been classified. Any out-of-context files must be resolved (moved to a separate commit, removed, or explicitly approved by the user) before the task completes.
TDD cycle completed — when settings.json → testing.tddMode is enforce (the default), the full RED-GREEN-REFACTOR cycle must be observed:
If any condition is not met, TaskUpdate(completed) is blocked. The workflow must not advance to the next task. This gate is the primary quality enforcement point — the VERIFY phase provides independent confirmation, not first-pass verification.
| Tier | Actions | Behavior | |------|---------|----------| | Tier 1 (Autonomous) | Commits, branch creation, file edits, staging | Execute without asking. Local and reversible. | | Tier 2 (Journal) | Push, PR creation, issue assignment | Execute and log to decision journal. Team-visible but recoverable. | | Tier 3 (Confirm) | Merge, release, force operations | Always require human confirmation. Non-negotiable. |
Tier configuration is in settings.json under tiers. Actions can be promoted (journal→confirm) but never demoted (confirm→journal).
When a command or skill says "use the AskUserQuestion tool", you MUST invoke the AskUserQuestion tool — do not substitute plain text output. The tool provides structured selectable options that plain text cannot replicate. Supply contextual options appropriate to the situation.
{journal-dir}/issue-{N}.md at branch creationJournal dir defaults to .decisions/, configurable in settings.
Anti-estimation guard for journal entries: journal entries MUST NOT include calendar-time estimates (weeks, days, hours, sprints, ETAs, "by Friday"). Use t-shirt sizing (S/M/L) only when the user has explicitly asked for size context. Describe work in terms of artifacts and tool calls, not wall-clock duration. See skills/llm-operator-principles/SKILL.md. This guard exists because journal entries are the most common surface where calendar-time framings leak through and anchor downstream deferral.
Dispatch independent operations in a single message:
Quality check loops have max iterations from settings.json. These ceilings are safety nets against true infinite loops, NOT planned stop points — see skills/llm-operator-principles/SKILL.md:
qualityCheckMaxIterations without convergence is a signal to re-check understanding (are two findings in tension? are you fixing the wrong thing?), not a budget to stop at. Continue iterating until convergence.| Trigger | Action |
|---------|--------|
| Genuine non-convergence (same findings persist 3+ iterations AND ceiling reached) | File a six-field Proactive-Autonomy escalation per skills/llm-operator-principles/SKILL.md § Genuine non-convergence. Do NOT silently exit the loop. |
| Plan has >10 tasks for a single issue | Decompose the issue first. One PR should not span 10 tasks. |
| EXPLORE phase yields contradictory signals | Stop. Ask the user for clarification before planning. |
| >5 files modified without staging or committing | Stop. What you have should be committable. If not, the tasks are too large. |
The debug-fix-retest loop is mandatory — do NOT report failures and move on, DO fix them yourself.
Minimum verification by project type:
markdown-only, config-only, dependency-bump-only — see runtime-verification skill): Static checks only, with the specific evidence the whitelist requires. Any other skip requires a Proactive-Autonomy escalation.The loop is bounded by closedLoop.maxDebugIterations (default 5). After max iterations, escalate to user — never silently skip.
| Missing | Fallback |
|---------|----------|
| No agent teams | Single-session sequential |
| No quality commands | Attempt to discover them first (Skill(capability-discovery)), then proceed with runtime verification only |
| No LSP server | Fall back to grep-based references and CLI-only diagnostics. No error — LSP is additive. |
| No gh CLI | Warn, continue with git-only |
| No decision journal | Proceed without logging, note in PR |
| Excuse | Response | |--------|----------| | "I already know what to do, skip EXPLORE" | Then exploring should take 10 seconds. Do it. | | "The plan is obvious, no need to TaskCreate" | Untracked work is invisible work. Create the tasks. | | "Just one more fix, then I'll verify" | Verify now. The loop exists because one-more-fix never ends. | | "This is too simple for the full loop" | Simple tasks, same phases. Just faster. | | "Runtime verification isn't possible" | It is, for any project that does something. Build it, run it, check it. | | "Tests pass, so it works" | Tests verify what's tested. Runtime verifies what's real. | | "I can't start the server" | Fix why. Server startup failure IS a bug. | | "Self-review is enough" | Self-review checks code quality. The verdict checks requirements. Both are needed. | | "I wrote the tests, so the criteria are met" | Tests prove the code does what you thought was wanted. The verdict proves it does what was actually wanted. |
Agents are teammates, not tools waiting for instructions. The operating principle is:
Every escalation to a human MUST follow this structure. Omitting fields is not permitted.
| Field | Purpose | |-------|---------| | Situation | What happened — the specific state or finding that requires a decision | | What I tried | What you attempted before escalating — research, alternatives considered, commands run | | Options | 2-3 concrete paths forward, each with trade-offs. Label one "(Recommended)" | | My recommendation | Which option you recommend and why — never leave this blank | | Blocking? | Yes (blocks the current command), Soft (advisory), or No (informational). Do NOT use calendar-time language. | | Risk if wrong | What happens if the chosen option turns out to be the wrong call, and who is affected |
markdown-only, config-only, or dependency-bump-only whitelistmaxDebugIterations or fixForwardMaxIterations have been fully exhausted AND the agent has re-checked whether findings are in tension or being misunderstood. Approaching a ceiling is not a trigger; the ceiling is a safety net, not a budget. See skills/llm-operator-principles/SKILL.md.skills/llm-operator-principles/SKILL.md and references/escalation-format.md.fixForwardMaxIterations, reviewCycleLimit, qualityCheckMaxIterations are safety nets, not budgets. Iteration 7 of 10 is the middle of the safety margin, not "the ceiling."| Anti-Pattern | Description | The Right Way |
|-------------|-------------|---------------|
| Lazy Verification | Tests pass does not equal works. "Theoretically works" is not "actually works." The proof is running it — build it, start it, hit the endpoint, check the output. | Run the code. Capture the output. Show the evidence. |
| Lazy Escalation | Asking the user without trying first. Open-ended questions with no research, no options, no recommendation. "What should I do?" is never acceptable. | Try to resolve it yourself. If you still need input, use the six-field template with your recommendation. |
| Punt-to-User | "What should I do?" or "How should we proceed?" without options. Agents are teammates, not tools waiting for instructions. Every escalation must include 2-3 options with a recommended path. | Present structured options. Label one "(Recommended)." Explain the trade-offs. |
| Silent Deferral | Downgrading findings to avoid escalation, or routing findings to follow-up issues instead of fixing them. A finding that matters enough to mention matters enough to act on. | Fix it. Finding triage is NEVER a valid escalation trigger; see skills/llm-operator-principles/SKILL.md. Default mode does not create follow-up issues for findings. |
| Triage Escalation | Drafting a six-field escalation about a P1/P2/P3 finding to ask whether to fix it. | Stop. Fix the finding. Escalations are for true decisions (product, architecture, irreversible actions), not for work the agent can do. |
| Convergence Surrender | "We hit iteration 3 of fix-forward, escalating remaining findings." | Continue iterating. Iteration ceilings are safety nets, not budgets. The default ceilings (10) are designed to make the LLM converge well before the ceiling, not to stop at it. |
| Calendar Anchoring | "Multi-week effort," "defer to next sprint," "ETA: end of quarter." | Re-frame in tool calls. The LLM operator does not have weeks or sprints. Use t-shirt sizing (S/M/L) only when the user explicitly asks for size. |
tools
Validate a FlowWorkflow YAML at `plugins/flow/workflows/<id>.workflow.yaml` against `schemas/v1/workflow.schema.json` AND cross-reference the referenced skills/agents exist + every Tier 3 action is confirm-gated + no native /goal or /loop dependency is declared. Use when /flow:workflow validate is invoked, when CI runs the workflow schema gates, or when a new workflow is being authored. This skill MUST be consulted because schema validation alone catches shape errors; cross-reference validation catches the silent-correctness failures (typo'd skill name, Tier 3 escape, /goal dependency) that would otherwise ship to users.
tools
Verify UI-facing changes by running a screenshot-analyze-verify loop across configured viewports, with a browser-tool priority cascade (Playwright MCP → Chrome DevTools MCP → CLI fallback → external skill fallback) and bounded iteration. Use after build/runtime verification passes and the diff includes `.tsx`/`.jsx`/`.vue`/`.html`/`.css`/`.scss`/`.svelte` files OR the acceptance criteria mention UI/page/render/display/visual. This skill MUST be consulted because UI changes that pass build and unit tests can still ship blank pages, render-blocking console errors, or broken responsive layouts that no other verification phase catches.
data-ai
Coordinate agent teams for adversarial review (paired skeptic/verifier per facet, challenge round with disposition vocabulary, consolidated findings with confidence) or parallel implementation (task sizing 5-6 per teammate, non-overlapping files). Enforces independent analysis before shared conclusions. Reference only (`disable-model-invocation: true`); loaded only when `agentTeams: true` in settings.
development
Conduct two-stage code review: Stage 1 verifies spec compliance (criterion-to-code mapping), Stage 2 evaluates security, correctness, performance, and maintainability across 6 parallel facets with P1/P2/P3 synthesis and deduplication by file:line. Use when reviewing code changes or pull requests. This skill MUST be consulted because reviewing quality on broken logic is wasted effort, and unmet acceptance criteria must block merge.