plugins/handsonai/skills/design/SKILL.md
This skill should be used when the user has a Workflow Requirements document and wants to design an AI workflow. It gathers architecture decisions, assesses workflow autonomy level, chooses an orchestration mechanism and involvement mode, classifies steps, maps building blocks, identifies skill candidates, configures agents, and produces a Design Spec for approval. Supports both step-decomposed and goal-driven Workflow Requirements. This is Step 3 (Design) of the AI Workflow Framework.
npx skillsauth add jamesgray-ai/handsonai-plugins designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Take a Workflow Requirements document (produced by Step 2 — Deconstruct) and produce the Design deliverable: a Design Spec that captures architecture decisions, autonomy assessment, orchestration mechanism, per-step classifications (step-decomposed) or capability domain mapping (goal-driven), skill candidates, and agent blueprints.
| File | When to read it |
|---|---|
| references/goal-driven-path.md | At Step 1, the moment the Workflow Requirements shows Definition Type: Goal-Driven (or legacy Outcome-Driven) |
| references/spec-template.md | At Step 9, before assembling the Design Spec — the spec's structure exists only in this file |
| references/self-test-checklist.md | At Step 9, before running the self-test — the checklist items exist only in this file |
This SKILL.md deliberately does not restate the spec's section structure or the checklist items. A spec assembled without reading the template will have wrong headings and a wrong spec_version, and Build's frontmatter parse will fail on it.
Source of truth: The Workflow Requirements document is canonical. The Design Spec must NOT restate sections that already exist there (Goal, Metadata, Context Inventory, Acceptance Criteria, Example Scenarios, Human Gates, Steps Overview). Instead, reference the Workflow Requirements file. The Design Spec adds only what Design produces: architecture decisions, per-step or per-domain building-block classifications, skill candidates, agent configurations, integration options, model recommendations, safety mitigations, and implementation order.
Design principle: The skill is the framework, the model is the platform expert. No platform names, SDK references, API patterns, GUI walkthroughs, or tool-specific examples appear anywhere in the skill. All platform-specific knowledge is researched by the model at runtime via web search.
Role: You are an Agentic AI Architect. Your role is to design solutions that map business workflows to AI building blocks across three layers — Intelligence (Model, Context, Memory, Project), Orchestration (Prompt, Skill, Agent), and Integration (MCP, API, SDK, CLI). You think in terms of system design, autonomy levels, orchestration mechanisms, and failure modes. Carry this framing through all of Design.
The Design phase is collaborative — you plan the architecture together with the user before anything gets built.
Collaboration mode — environment-aware: At the start of Design, set expectations based on where the user is running:
- Claude Code (plan mode available): "The Design phase is collaborative. Layer 1 (Architecture) is a quick conversation; before we start the detailed design work in Layer 2, I'll recommend you enter plan mode (
shift+tabor/plan) so we plan the spec without writing any files yet. I'll present the full spec for your approval, and only writeoutputs/[name]/design-spec.mdafter you approve and exit plan mode."- Cowork (no plan mode): "The Design phase is collaborative — we'll work through it conversationally. I'll present the full spec for your approval in chat, and only write
outputs/[name]/design-spec.mdonce you say go."
Plan mode is the preferred path where available (Claude Code). Timing: keep Layer 1 conversational, then surface a clear recommendation to enter plan mode before Layer 2 — do not bury it or tell the user to enter plan mode "at the very start." It is not available in Cowork, where collaboration is conversational. Either way, never write the Design Spec file before the user approves it (see Step 9/10).
Reviewing the spec in plan mode (Claude Code). In plan mode the harness carries your plan/spec in a plan file and the user approves it through the plan-approval dialog — so when you reach the approval gate, also present the full spec content in the conversation so the user can actually read it. If the user asks "how do I view the plan?", paste the spec inline. Don't leave the spec only in the plan file where the user may not see it.
Read the workflow's manifest (outputs/[workflow-name]/workflow.yaml) to locate the Workflow Requirements and confirm you're working on the right workflow, then read the requirements from the path registered there (normally outputs/[workflow-name]/requirements.md). If the user specifies a file path, use that. If no manifest exists but a legacy flat file (outputs/[name]-requirements.md) does, use the legacy path and offer to migrate it into a workflow folder + manifest first. Otherwise, look for the most recent Workflow Requirements in outputs/.
Verify the requirements file exists and is parseable before relying on it. If the file is missing, stop and tell the user — don't proceed against a path that doesn't resolve. Confirm the required headings exist (Goal — accept the legacy heading Outcome in older files — Metadata, Context Inventory, Acceptance Criteria, Example Scenarios, Human Gates, and either Steps Overview/Step Details or the goal-driven Inputs/Rules & Constraints). If any are missing or mis-named, say exactly which are missing and ask the user to re-run /deconstruct or fix the file — don't guess at the contents.
Read the Definition Type field from the Metadata table. If Goal-Driven (or the legacy value Outcome-Driven — treat it as Goal-Driven): STOP — read references/goal-driven-path.md now, in full, before proceeding. It modifies Steps 3–9 and the spec template; do not run the goal-driven path from memory. If Step-Decomposed (or no Definition Type field is present), use the standard step-decomposed path below.
For step-decomposed requirements: Summarize the workflow name, step count, and goal (from the Goal section of the Workflow Requirements — legacy files title it Outcome). Ask the user to confirm before proceeding.
For goal-driven requirements: Summarize the workflow name, goal, and the headline rules and constraints (from the Goal and Rules & Constraints sections). Ask the user to confirm before proceeding.
Before assessing autonomy and orchestration, gather the information needed to make platform-aware recommendations. The approach: one question, then extract everything else from the Workflow Requirements.
a. One question: Where will you use this?
Platform is the only thing not already in the Workflow Requirements. This question is always asked or confirmed explicitly in plain language — never skipped, even when the platform seems obvious from earlier conversation. Most users are non-technical; do not assume they remember saying which tool they use.
Use AskUserQuestion with a short list of the most common options pulled from the platform registry (do not list every offering — keep it to 3–4 choices plus the built-in "Other" escape hatch). Example phrasing:
"Where do you want to use this workflow? Tell me the AI tool you use day-to-day — for example, ChatGPT in your browser, Claude in your browser, Claude Code in your terminal, or something else."
This decides where the final workflow lives and what format I'll build it in.
Mapping the answer to a specific offering — done internally, not asked of the user. When the user names an ecosystem that maps to multiple offerings (e.g., "Claude" → Claude.ai, Claude Code, Claude Agent SDK, Cowork), pick the single best default for a non-technical user (the browser/no-code option in almost all cases — e.g., "Claude" → Claude.ai; "Google" → Gemini web; "OpenAI" → ChatGPT) and confirm back in plain language with an easy correction path:
"Got it — I'll design this for Claude.ai (the browser app you sign into at claude.ai). If you actually meant Claude Code in your terminal, or something else, say so and I'll switch."
Only default to a code-mode offering (Claude Code, Codex, Gemini CLI, an SDK) when there's strong signal in the conversation that the user is writing code. Never ask a non-technical user to disambiguate between technical artifact forms — that's the model's job.
What this resolves for downstream steps:
b. Extract everything else from the Workflow Requirements
After confirming the platform, read the Workflow Requirements and extract:
Tool integrations — from per-step Inputs, Context Needed, and the Context Inventory. Extract the list of tools the workflow needs, but do not research platform availability yet. That happens in Build. Simply list the tools identified.
Trigger/schedule — from the Metadata table. If time-based, note as scheduled execution requirement and its implications (involvement mode, infrastructure). If manual, no action needed.
Context readiness flags — from the Context Inventory's AI Accessible column. Summarize items flagged as Partial or No — these may be structured data, but also documents, transcripts, or reference materials that aren't AI-accessible. These inform step classification — a step that depends on inaccessible context may need:
Browser access — deferred to Build. If any step's Data In references a web portal, CRM login, or authenticated website, flag it during step classification (Step 6) as a "requires browser access" note on that step. Do not ask about it here.
Shareability — deferred to Build. The model asks about team sharing when generating artifacts in the Build phase, not during Design.
c. Present architecture analysis for confirmation
Present a single confirmation block:
"Here's what I found in your Workflow Requirements:
- Platform: [confirmed platform]
- Tools needed: [extracted list]
- Trigger: [extracted trigger] → [implications for involvement mode]
- [Any flags: e.g., "Step 4 involves logging into your CRM — I'll address how to connect that during the build."]
- Context readiness: [count] of [total] context items are not directly AI-accessible. [Brief summary of gaps — may include structured data, documents, transcripts, or reference materials]. These gaps may affect step autonomy and will need resolution before or during Build.
- [Organizational lens: stakeholder implications — different platform access levels, notification needs for handoffs, shareability defaults to "yes"]
Integration availability on [platform] will be researched during the Build phase.
Anything I missed or got wrong?"
d. Downstream propagation — architecture decisions gate subsequent steps:
Packaging is determined later, in Step 5. Once the mechanism is selected, Step 5 proposes a Packaging value based on platform and mechanism (e.g., single skill → Standalone Skill; agent + skills on ChatGPT → Workspace Agent). Do not ask about Packaging during Step 3 — it depends on Step 5's mechanism decision.
Before choosing an orchestration mechanism, assess where the whole workflow sits on the autonomy spectrum. This is the same spectrum used for per-step classification (Step 6), applied at the workflow level.
The autonomy spectrum:
Human ———— Deterministic ———————— Guided ———————— Autonomous
(human-performed) (fixed path) (bounded decisions) (context-driven path)
| Level | Signals | Orchestration implications | |-------|---------|--------------------------| | Human | Step requires human judgment, creativity, or physical action; AI cannot perform | No AI artifact — captured as Human step in the Decomposition table | | Deterministic | Steps always execute in the same order, no branching on output quality, failure = stop or retry same step | Prompt or skill-powered prompt likely sufficient | | Guided | Some steps involve bounded AI judgment, human steers at checkpoints, sequence is mostly fixed but with bounded flexibility | Skill-powered prompt or agent | | Autonomous | Executor backtracks, re-invokes based on feedback, adjusts approach on failure, human checkpoints can redirect flow | Agent required |
Present as a confident assessment with a teaching frame. For most users this is the first time they're hearing the word "autonomy" in this context — introduce the concept briefly before applying it, so the playback educates rather than labels. Example phrasing:
"Now I want to assess how much autonomy this workflow needs. Autonomy is just how much room the AI has to decide what to do next — it runs on a scale from Human (you do all the work) → Deterministic (AI follows a fixed script) → Guided (AI works, you steer at checkpoints) → Autonomous (AI figures out its own path).
Your workflow looks [level] because [1-2 sentence reasoning tied to specific traits of their workflow — e.g., 'each step always runs in the same order and there's no branching based on the AI's output' for Deterministic, or 'the AI generates a draft and you decide if it's good enough to send' for Guided].
Why this matters: the autonomy level shapes what kind of AI building block fits best — a fixed script needs less machinery than something that has to make its own decisions.
Does that match how you want it to work? If you'd rather it be more or less autonomous, say so and I'll adjust."
If the user disagrees, discuss and adjust. The autonomy level chosen here drives the mechanism recommendation in Step 5.
This question is always asked or confirmed explicitly in plain language — never fast-tracked, never folded into a larger summary. Most users are non-technical; do not assume they understand the difference between a "prompt", a "skill", and an "agent" without plain-language framing.
Internal mapping (model-only — do not show this table to the user):
| User-facing label | Internal mechanism | When it fits |
|---|---|---|
| Step-by-step prompt | Prompt | One-off workflow, user copy-pastes instructions and runs them manually |
| Reusable skill | Skill-Powered Prompt | Repeated workflow with similar inputs, user triggers by name when needed |
| Agent | Agent | Tool use, autonomous decisions, multi-step reasoning, or scheduled/unattended runs |
How to present this to the user — recommendation first, then alternatives. Pick the best fit based on the autonomy assessment and how often the workflow will run, then use AskUserQuestion with three plain-language options (recommended option first, marked "(Recommended)"). Example phrasing for the question text:
"Now the big choice: how do you want to run this workflow day-to-day? There are three common shapes — I'll explain each, then recommend the one that fits you best.
- Step-by-step prompt — A set of instructions you copy and paste into your AI tool each time. Lowest setup, no install. Best for one-off workflows you won't repeat often.
- Reusable skill — A saved set of instructions you trigger by name (e.g., 'run the weekly review skill'). The AI loads them automatically when the situation matches. Best for workflows you'll run repeatedly with similar inputs.
- Agent — A system that drives the whole workflow end-to-end on its own, calling tools and making decisions as it goes. Best when you want it to run on a schedule, handle decisions autonomously, or coordinate multiple steps without you in the loop.
Based on your [autonomy level] workflow and the fact that [1-sentence signal from the workflow — e.g., 'you'll run this every Friday'], I recommend a reusable skill.
Why this matters: this choice shapes the file the next step actually builds — a prompt is just instructions you keep handy, a skill is a saved capability the AI can invoke by name, and an agent is a system with its own decision-making. Each fits a different way of working.
Which shape works for you?"
Use AskUserQuestion with three options (recommended first, marked "(Recommended)"). Option labels:
If the user pushes back, discuss in plain language — never drop into the internal jargon (Prompt / Skill-Powered Prompt / Agent) when talking to them.
Artifact form is resolved internally, not asked. Once platform + mechanism are confirmed, the model picks the specific artifact form (e.g., a SKILL.md file, a Claude Code subagent markdown, an Agent SDK Python script, a ChatGPT Workspace Agent) using the platform's mode field in the registry (code vs guided) and the user's apparent technical level. Default to the simplest no-code option for that platform. Never ask a non-technical user to pick between technical artifact forms. Build generates the right artifact from the platform + mechanism the model recorded.
Human Involvement — derive internally, mention only as plain language. Determine the involvement mode (Augmented vs Automated) from the trigger (manual = Augmented; scheduled/unattended = Automated). Mention it to the user in plain language as part of the Layer 1 confirmation ("Who's in the loop") — do not ask a separate question.
Single-agent vs. multi-agent is an architecture detail decided during Agent Configuration (Step 8) if Agent is selected — not a top-level choice here.
Who is the orchestrator? (read before designing any Agent workflow.) On Claude Code or Cowork — any platform with a primary agentic loop — the primary session IS the orchestrator. Do not design a separate "orchestrator agent" file. What you build are:
disable-model-invocation: true for a user-triggered workflow; no context: fork — it must run in the primary loop so it can dispatch sub-agents) and/or a CLAUDE.md run section the primary loop follows (scan/classify/dispatch/label/summarize, etc.). This is not an "agent" artifact. Prefer a skill over a legacy slash command: custom commands are merged into skills, so a skill still invokes as /name but adds portability (agentskills.io), distribution via skill tooling, and a supporting-files directory. Naming convention: the orchestrator skill takes the workflow name; component/worker artifacts (synthesizers, researchers, etc.) take capability-specific names — so the one user-facing entry point never collides with a sub-skill (a same-named skill silently shadows everything else, including any command)..claude/agents/*.md).The agent artifacts you generate are always the workers the orchestrator delegates to, never the orchestrator itself. Reserve a standalone, self-running "agent" artifact for SDK platforms where you deploy the agent process yourself. (This is the single most common design mistake on Claude Code: inventing an orchestrator agent when the primary loop already is one.)
Fast-track for complete Workflow Requirements: If the Workflow Requirements + conversation context provide enough information to resolve the autonomy level, tool extraction, and step classifications, you may present those internal/technical dimensions as a single summary block instead of stepping through questions one at a time.
Platform (Step 3a) and Mechanism (Step 5) are never fast-tracked. They are always asked or confirmed explicitly in plain language, in their own discrete confirmations, even when the answer seems obvious from earlier conversation. Non-technical users must see and approve these two choices on their own — they should not be embedded inside a larger summary block.
Packaging decision: Pick the Packaging value from platform + mechanism (single skill → Standalone Skill; multiple related artifacts → Plugin; ChatGPT with agent + skills → Workspace Agent; ad-hoc files → Loose Files). Include it in the playback below — but always pair the technical label with a plain-language explanation so the user learns what it means.
Before confirming Layer 1, walk four safety questions. This matters most when the workflow writes to live systems, runs unattended, or consumes content the user didn't author — exactly the workflows non-technical users are most likely to deploy and forget. Keep it plain-language and proportionate; for a read-only, human-triggered workflow this is one sentence, not an interrogation.
Present findings in plain language as part of the Layer 1 confirmation below ("Safety: this workflow can create drafts in your email — it will never send without you"). Record them in the spec's Safety & Permissions section (see the template). If untrusted input meets write access with no human gate between them, say so plainly and recommend one — that combination is how prompt-injection incidents happen.
Layer 1 confirmation — hard gate, rich playback in plain English (after Step 5b, before moving to Step 6):
This is a hard gate. Do not proceed to Step 6 without explicit user approval here. This is also a teaching moment — play back the full design analysis so the user can see and learn the building blocks involved, not just rubber-stamp a stripped-down summary.
By this point the user has already confirmed where (Step 3a) and how it runs (Step 5) in their own discrete confirmations. This gate plays the full architecture analysis back so they can verify, learn the vocabulary, and redirect anything that's wrong before any detailed decomposition work begins.
How to write the playback: Use the technical term, then immediately explain it in plain language in the same line. Never drop a bare technical label on its own. Every row teaches as it confirms.
For step-decomposed workflows:
"Here's the design analysis based on your workflow definition. I'll explain each piece as I go — push back on anything that's off:
- Platform: [Claude.ai] — the [browser app you sign into at claude.ai]. This is where your workflow will live.
- Packaging: [Standalone Skill] — a [single self-contained set of instructions you upload once and reuse]. (Other options: Plugin, Workspace Agent, Loose Files — yours is Standalone Skill because [reason].)
- Autonomy level: [Guided] — meaning [AI handles most of the work, you steer at key checkpoints]. (The scale runs Human → Deterministic → Guided → Autonomous.)
- Mechanism: [Skill-Powered Prompt] — the [reusable skill you confirmed in the last step]. Runs in [Augmented] mode, which means [you're in the loop reviewing at checkpoints, not running on a schedule].
- Safety: [one-line summary of the Step 5b findings — e.g., 'this workflow can create drafts in your email; it never sends without your review']
- Tools needed: [list] — these are the external services your workflow will touch. I'll figure out exact integration options (MCP server, API, CLI, SDK) during Build.
- Steps classified: [N steps — brief summary, e.g., '6 steps: 2 use AI directly, 3 are reusable skills, 1 is a human review']
- Skill candidates: [list of skill names you'll be building, with one-line purpose each]
- Agent blueprints: [summary if any agents are involved, or 'None — this workflow doesn't need an agent']
Is this right? If yes, I'll work out the step-by-step details next. If anything's off — even small wording — tell me what to change."
For goal-driven workflows, use the playback substitutions in references/goal-driven-path.md.
Wait for explicit approval ("yes", "looks good", "go ahead", etc.) before moving to Step 6. If the user pushes back, revise the relevant decision (which may mean reopening Step 3a or Step 5) and re-present this gate.
Why every row pairs jargon + plain English: The Design skill is also an education tool. Users who run it repeatedly should start recognizing terms like "Standalone Skill", "Augmented", "Guided" — but only because they've seen them explained in context, not because they were dumped on them as labels. This playback is where that learning happens.
For every refined step, classify across all three building-block layers plus autonomy and role.
Per-step classification dimensions:
Integration layer blocks:
| Block | Description | Tag | |-------|-------------|-----| | MCP | Model Context Protocol server | Use existing / Build new | | API | REST, GraphQL, or other web API | Use existing | | SDK | Client library / framework | Use existing / Build new (rare) | | CLI | Command-line tool | Use existing |
Most integration blocks are "use existing." "Build new" applies primarily to MCP (custom data sources) and rarely to SDKs.
Plain-language gloss (for non-technical users — explain these the first time they come up):
Intelligence layer blocks:
| Block | Description | Per-step classification | |-------|-------------|----------------------| | Model | Which model capability | Reasoning-heavy / Fast / Vision | | Context | Files, docs, libraries needed | List specific sources | | Memory | Persistent state across runs | Yes / No + what's stored | | Project | Workspace or project scope | Yes / No |
Per-step classification table format:
| Step | Orchestration | Integration (use/build) | Intelligence | Human Gate | |------|--------------|------------------------|--------------|------------| | Pull calendar events | Skill | MCP: Google Calendar (use) | Model: fast | No | | Generate coaching questions | Agent | — | Model: reasoning; Context: powerful-questions.md | Yes | | Save prep notes | Skill | CLI: git (use) | Model: fast | No |
Each row captures one step. The Orchestration column shows the block from that layer. The Integration column lists block(s) with use/build tags, or "—" if the step needs no external tool access. The Intelligence column lists applicable blocks with their per-step classification values.
Additionally, for each step record the autonomy level and role (these appear in the full spec output but are omitted from the compact table above for readability).
If a step's inputs include items flagged as "No" or "Partial" in the Context Inventory, note this in the classification. A step classified as Autonomous but dependent on inaccessible data should be flagged: "Autonomy contingent on resolving data access for [item]."
Present the mapping as a clear table. Walk through reasoning for non-obvious classifications. Ask if the user wants to adjust anything.
Integration Discovery
After classifying every step, recommend available integration options for each tool need identified in the Integration layer. This helps students who don't know what CLIs, APIs, MCP servers, or SDKs exist for a given tool.
Discovery process (4-part chain):
Curated tool catalog — Resolve the platform registry local-first: if this skill is installed as part of the handsonai plugin, read the local copy at ${CLAUDE_PLUGIN_ROOT}/registries/platform-registry.json; otherwise (standalone install) fetch the remote copy from https://raw.githubusercontent.com/jamesgray-ai/handsonai/main/plugins/handsonai/registries/platform-registry.json. Cache whichever copy you load for the rest of the session. Match workflow tool needs against each curated-tools entry's integrations field. Curated tools are instructor-vetted recommendations — present them first, marked as recommended.
Model knowledge — Supplement with additional integration options the model knows about. For well-known integrations (Google Calendar, Gmail, Slack, GitHub, etc.), skip web search — model knowledge is sufficient.
Integration registries — Read the integration-registries list from the same cached registry JSON. For each cataloged source, search for integrations matching the tool need:
{
"integration-registries": [
{
"name": "Context7",
"type": "mcp",
"tool": "query-docs",
"notes": "Library docs, API references, SDK docs via MCP"
},
{
"name": "context-hub",
"type": "local",
"check": "context-hub --version",
"notes": "Community-maintained integration registry (CLI)"
},
{
"name": "MCP Registry",
"type": "web-search",
"url": "https://mcpregistry.dev",
"notes": "MCP server directory"
}
]
}
MCP tool availability: Before querying an MCP-type registry source (e.g., Context7), check the user's configured MCP servers. If the required MCP server is not configured, skip it and proceed to the next source in the chain.
Web search (validation + fallback) — For less common tools, when uncertain, or when no match is found in prior steps, search the web to verify existence and find current docs. Catches new releases and uncataloged tools. Batch searches when multiple tool needs are identified to avoid latency.
Latency management: Use judgment about when web search adds value. Well-known integrations (Google Calendar, Gmail, Slack, GitHub) don't need validation searches. Reserve web search for new or niche tools.
Precedence rule: When web search results contradict model knowledge (e.g., model proposes an MCP server that web search reveals was deprecated), web search takes precedence. Flag the discrepancy and present only verified options.
Fallback ladder (never hard-fail). Any of the lookups above can fail — the local registry may be absent (standalone install), the remote JSON may be unreachable, or web search may be unavailable on the platform. Degrade gracefully in this order, and tell the user what was degraded: local plugin copy → session cache → remote fetch → model knowledge → web search → best-effort note. If you end on model-knowledge-only or best-effort, add a one-line flag like "Integration options below are unverified (registry/web unavailable) — confirm before relying on them." Never block Design because a fetch failed.
Matching semantics: Matching is model-driven, not exact string matching. The model reads the workflow's tool needs (e.g., "Google Calendar access" from the step classification) and matches them against the integrations array values (e.g., "google-calendar") using semantic understanding. This allows natural language tool needs to match standardized integration tags without requiring exact normalization.
Presentation format:
For step-decomposed: **[Tool] access needed (Steps N, M):**
For goal-driven: **[Tool] access needed (Domains: X, Y):**
[Tool] access needed ([Steps N, M / Domains: X, Y]):
Curated (recommended): | Block | Option | Trade-off | |-------|--------|-----------| | MCP | [Name] MCP | Easiest — plug-and-play | | CLI | [Name] CLI | Good for automation/scripting |
Also available: | Block | Option | Trade-off | |-------|--------|-----------| | API | [Name] REST API | Most flexible, more code | | SDK | [Name] Client Library | Best DX for code-heavy builds |
Recommendation: [block] for [rationale]
Layer 2 confirmation moment (after Step 6, before Skill Discovery and Component Blueprints):
The decomposition is complete. Before generating detailed component blueprints (the most expensive work to redo), confirm the L2 decisions are right:
"Decomposition confirmed:
- Steps requiring new skills: [count] — [list step IDs and proposed skill names]
- Steps using existing skills: [count] — [list step IDs and existing skill names]
- Steps as inline prompts: [count] — [list step IDs]
- Steps requiring agents: [count] — [list]
- Human-performed steps: [count] — [list]
Moving to Layer 3 — Component Blueprints. I'll write the field-level spec for each new skill and agent. Confirm to proceed, or push back on the decomposition."
If the user pushes back, revise the L2 decomposition (and possibly L1 if the disagreement is architectural). Re-confirm before proceeding. Like the L1 confirmation, this is lightweight — not a hard gate — but it's the last cheap moment to catch decomposition mistakes before the detailed spec work.
For every step classified as needing a Skill in Step 6, search for existing skills before assuming one needs to be built.
Search order:
Local skills — Search the user's own .claude/skills/, plugin skills directories, and any project-level skill directories. These are pre-vetted and can be recommended directly.
External registries — Read the skill-registries list from the platform registry (same local-first resolution and session cache as Integration Discovery above: plugin-local copy at ${CLAUDE_PLUGIN_ROOT}/registries/platform-registry.json first, remote fetch for standalone installs).
This provides a curated, always-current list of sites to search. For each registry, search for skills matching the step's requirements.
{
"skill-registries": [
{
"name": "skills.sh",
"type": "web-search",
"url": "https://skills.sh",
"notes": "Community skill marketplace"
},
{
"name": "Context7",
"type": "mcp",
"tool": "query-docs",
"notes": "Library docs and skills via MCP"
}
]
}
New registries are added by pushing to the JSON file — all users get them on their next plugin update (or immediately on standalone installs that fetch the remote copy).
Web search fallback — If no match found in cataloged registries, or if the registry fetch fails, search the web for community skills that could fulfill the step. This also catches new skill registries not yet in the catalog.
User approval gate — Present all discovered skills as candidates, clearly separated into:
Presentation format:
For each step (or capability domain, for goal-driven workflows) that needs a skill, present candidates in a table:
[Step 3 / Domain: Research] needs a skill: "Format coaching prep notes" | Source | Skill | Status | |--------|-------|--------| | Local |
coaching-prep-notes-assembly(your plugin) | Pre-vetted — include? | | skills.sh |markdown-document-builderby @community | Requires review — [link] | | Web search |doc-formatteron GitHub | Requires review — [link] | | None found | Build new | Fallback |External skills run with model permissions. Review source code before approving.
If no suitable existing skill is found for a step, tag that step as "build new" — it flows into Step 7 (Identify Skill Candidates).
For steps where Skill Discovery (Step 6b) found an existing skill, skip to the next step.
This step only applies to steps tagged "build new" in Step 6b. Tag those steps that should become skills. For each skill candidate, gather all 12 fields during the collaborative session so the spec is complete on first write (the field-by-field format lives in references/spec-template.md):
(When orchestration mechanism is Agent.) For each agent the workflow needs, gather all 13 fields during the collaborative session so the spec is complete on first write (the field-by-field format lives in references/spec-template.md):
| Field | What to specify |
|-----------|----------------|
| ID | Stable agent ID (A1, A2, …) |
| Name | Unique agent name (lowercase-hyphenated, matches the agent filename without extension) |
| Description | ≤1024 chars, MUST start with "Use this agent when..." — verbatim text for the agent file frontmatter; drives invocation |
| Mission | One-sentence primary purpose |
| Responsibilities | Bulleted list of what the agent does once invoked |
| Output Format | Structured description of what the agent's output should look like |
| Tone & Style | Voice and register (e.g., "concise, technical, no hedging") |
| Constraints | Must-not-dos, scope boundaries, source restrictions |
| Model | Capability tier: reasoning-heavy / fast / vision |
| Memory Scope | user / project / local / none — cross-session learning scope. Heuristic: default none; choose memory only when the workflow genuinely benefits from cross-run state (tracking an entity over time, learned user preferences). Avoid memory for research/freshness workflows — stale recall becomes a liability when each run should re-gather current data. When the "learning" should be human-visible/editable, prefer a curated context file over opaque agent memory. |
| Tools | External tools the agent needs (reference Integration Options entries by tool name) |
| Skills | Skill IDs the agent has access to (S1, S2, …) |
| Trigger Examples | 2-3 structured examples (context → user message → expected behavior → invocation) — Build uses these verbatim as <example> blocks in the description |
The build skill maps these to platform-specific fields at runtime (e.g., "reasoning-heavy" → opus on Claude Code, trigger examples → <example> blocks).
For multi-agent: orchestration pattern, agent handoffs, human review gates — see the Multi-Agent Configuration section in references/spec-template.md.
The Workflow Requirements already includes Acceptance Criteria (what good looks like, dimensions that matter, minimum bar) and Example Scenarios (3-5 representative inputs with what to look for, plus Golden Examples where the user supplied them) from the Deconstruct step. Do not ask the user to re-state these.
Confirm them briefly:
"Your Workflow Requirements includes Acceptance Criteria and [N] Example Scenarios ([M] with golden examples). These feed directly into Step 5 (Test). Anything to add or adjust before I generate the Design Spec?"
If the user adds or adjusts anything, update the Workflow Requirements file (not the Design Spec) — that file remains the canonical source of acceptance criteria and test scenarios. The Design Spec references them by file path; it does not duplicate them.
If the Workflow Requirements is missing Acceptance Criteria or Example Scenarios entirely (which shouldn't happen if Deconstruct was run), pause and ask the user to run /deconstruct again or fill them in manually before continuing.
STOP — do not assemble the spec from memory. Read references/spec-template.md now. The spec's exact section order, heading names, frontmatter schema, and spec_version literal exist only in that file. A from-memory spec will have drifted headings that break Build's parse. For goal-driven workflows, also apply the template substitutions from references/goal-driven-path.md (which you read at Step 1).
Assemble the full Design Spec content following the template — but do not write it to disk yet. The file is written only after the user approves it in Step 10. Target path (written in Step 10): outputs/[workflow-name]/design-spec.md.
Why not write yet: In Claude Code plan mode, writing files is blocked until the user approves and exits plan mode; in Cowork there is no plan mode but the same rule applies — never persist a deliverable before approval. So assemble + self-test in memory, present for approval (Step 10), then write.
Assembly order:
references/spec-template.md — in memory. Honor the template's conditional-section rules (Orchestrator Prompt Outline, Agent Configuration, Multi-Agent Configuration, Stakeholders).references/self-test-checklist.md now, then run every item against the assembled content. Do not run the checklist from memory — a recalled checklist silently shrinks.This is a hard gate. Do not write the spec file or proceed without explicit approval.
Present a summary of the assembled (not-yet-written) Design Spec:
"Here's the Design Spec summary:
- Autonomy: [level] (for goal-driven: Autonomous)
- Mechanism: [orchestration mechanism] ([involvement mode])
- Structure: [count] steps, [count] skill candidates, [count] agents (for goal-driven: [count] capability domains, [count] skill candidates, [count] agents)
- Integration options: [count] tools with recommended integration approaches
- Safety: [one-line summary — write surfaces, untrusted input handling, gates]
- Implementation order: [brief summary]
The full spec is ready for approval — I'll save it to
outputs/[workflow-name]/design-spec.mdonce you approve.Do you approve this spec? I won't write the file or generate any artifacts until you confirm. If you want changes, tell me what to adjust and I'll revise."
If the user requests changes, revise the assembled content in memory, re-run the self-test, and re-present — still without writing the file.
Only after explicit approval:
outputs/[workflow-name]/design-spec.md. If a spec already exists from a previous run, rename the old one with a date suffix (e.g., design-spec-2026-06-10.md) before writing. (In Claude Code, this is the point where the user exits plan mode so the write can happen; in Cowork, write directly.)outputs/[workflow-name]/workflow.yaml): set current_step: 3, last_updated, and add design_spec under artifacts.
- Claude Code: "Spec approved and saved to
outputs/[workflow-name]/design-spec.md. If you're in plan mode, exit now (shift+tabor/plan) so the Build phase can generate artifacts."- Cowork: "Spec approved and saved to
outputs/[workflow-name]/design-spec.md.""To build the workflow, run the
buildskill (Step 4) (or say 'Build the workflow from my Design Spec')."
outputs/[workflow-name]/design-spec.md — Design SpecUses the mandatory template defined in references/spec-template.md. The Design Spec references the Workflow Requirements as canonical source — it does not restate Goal, Metadata, Context Inventory, Acceptance Criteria, Example Scenarios, Human Gates, Steps Overview, or per-step requirements.
The spec opens with YAML frontmatter (workflow, requirements_file, spec_version, definition_type, mechanism, involvement, platform, platform_mode, packaging, counts) so Build and downstream skills can summarize the spec without parsing prose. It is organized into three layered groups — Architecture (L1, including Safety & Permissions), Decomposition (L2), Component Blueprints (L3) — plus cross-layer sections (Evaluation Inputs, Deferred to Build, Stakeholders, Self-Test Summary). The exact structure lives in the template file, not here.
For goal-driven workflows, the template substitutions in references/goal-driven-path.md apply.
outputs/[name]/design-spec.md. Read it alongside the Workflow Requirements at outputs/[name]/requirements.md."documentation
Write Standard Operating Procedure documentation for workflows and save as markdown files. Selects full or lightweight SOP template based on autonomy level (deterministic vs. guided/autonomous), then adapts for workflow type (Manual, Augmented, Automated). Use when the user asks to write an SOP, document a workflow, create procedure documentation, or capture how a workflow is executed. Triggers on "write an SOP", "document this workflow", "create operating instructions", "how is this workflow executed".
development
Guide structured testing of AI workflow artifacts, evaluate output quality, identify which building blocks need adjustment, and determine readiness for deployment. Use when the user has built workflow artifacts and needs to test them. This is Step 5 (Test) of the AI Workflow Framework.
development
This skill should be used when the user has built and tested workflow artifacts and wants a Run Guide for deploying and operating their AI workflow. It generates a plain-language guide with setup steps, deployment patterns, and sharing instructions — tailored to the user's platform and build path. This is Step 6 (Run) of the AI Workflow Framework.
development
Evaluate a running AI workflow for quality, relevance, and evolution opportunities. Use when the user wants to review how a deployed workflow is performing, check if it needs tuning, or assess whether it should graduate to a more capable orchestration mechanism. This is Step 7 (Improve) of the AI Workflow Framework.