skills/agent-spec-writer/SKILL.md
Interactive spec design for AI agents from first principles. Guides through vision, PRD structure, three-tier boundary definition, and measurable success criteria to produce a complete, deployable spec. Use when designing a new skill, agent, or any AI system that needs explicit behavioral boundaries. Triggers on "write agent spec", "create agent spec", "spec for ai agent", "agent specification", "define agent", "new skill spec", "new agent spec", "design agent behavior", "agent boundaries", "spec kit", "spec.md", "plan.md", "github spec kit", "specify workflow", "write a spec", "software requirements spec", "SRS", "write requirements".
npx skillsauth add michaelalber/ai-toolkit agent-spec-writerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"A spec is a promise — a promise the system makes to its users and a promise the team makes to themselves about what they will build." — Adapted from Joel Spolsky, "Painless Functional Specifications"
"The discipline of writing the thing down is the first defense against unclear thinking." — Frederick P. Brooks Jr., "The Mythical Man-Month"
A good spec for an AI agent is not documentation written after the fact. It is a design artifact that shapes behavior before a single line of code is written or a single prompt is deployed. The spec IS the design. Everything the agent does, refuses to do, asks about, or self-checks follows from what the spec defines.
Most AI agents fail not because the underlying model is weak, but because the spec is vague. Vague specs produce inconsistent agents. Inconsistent agents are unreliable. Unreliable agents are discarded. The O'Reilly research on 2,500+ agent configurations found that the most consistently helpful constraint across all of them was "never commit secrets" — not because it is clever, but because it is specific, verifiable, and bounded. That is what a good spec looks like.
What this skill IS:
What this skill is NOT:
The Five O'Reilly Principles (source framework):
See O'Reilly Framework for the full principle analysis with examples.
| # | Principle | Description | Applied As |
|---|-----------|-------------|------------|
| 1 | Vision Before Details | The goal statement must be clear before any behavior is defined. If you cannot state the agent's purpose in one sentence, the spec will be incoherent. Every section that follows must serve that vision — if it does not, it should be cut or moved to a different spec. | At the start of VISION phase, ask: "What does this agent do, for whom, and how will you know it succeeded?" If the answer takes more than three sentences, it is not ready. |
| 2 | PRD Structure Over Prose | A spec written as paragraphs is a spec that cannot be used under time pressure. Sections with named headers, tables, and code blocks can be scanned. Prose requires reading. AI agents use specs under time pressure — at inference time, when context is limited. Structure matters. | Every section must have a named header. Commands must be in code blocks. Boundaries must be in a table. |
| 3 | Exact Commands, Not Approximations | "Run the tests" is not a command. pytest tests/ -v --cov=src is a command. A spec full of approximate commands produces an agent that guesses, and guessing produces errors that are hard to trace. | Every command in the spec must be copy-paste executable. If you do not know the exact command, mark it [NEEDS INPUT] and do not guess. |
| 4 | Three-Tier Boundaries Are Non-Negotiable | Every agent needs three lists: what it always does without asking (safe, routine actions), what it asks before doing (high-impact or irreversible actions), and what it never does regardless of instructions (hard stops). Agents without explicit boundaries fill in gaps with their own judgment. | Every spec MUST include a three-tier boundary table. If the user cannot populate all three tiers, the spec is not ready to deploy. |
| 5 | Modular Over Monolithic | The "curse of instructions" is real: when too many directives pile up, performance on each degrades. One agent with a focused spec outperforms one agent with a sprawling spec. | When the spec scope expands beyond one coherent domain, challenge it: "Is this one agent or several?" |
| 6 | Specificity Over Generality | "The agent should write clean code" is not a constraint — it is an aspiration. "The agent MUST run ruff check before any commit and fail if linting produces errors" is a constraint. Aspirations produce aspirational agents. Constraints produce reliable ones. | Every qualitative claim must be made quantitative or procedural. If you cannot define how to verify it, it is not a constraint. |
| 7 | Success Criteria Must Be Measurable | A spec without measurable success criteria cannot be validated. If you cannot write a conformance test for the success criterion, you cannot verify the agent is meeting it. | For every goal in the VISION phase, require a corresponding success criterion a third party could evaluate without asking the author. |
| 8 | The Spec Evolves | A spec written once and never revised is a spec that has never been tested against reality. The first version captures intent; subsequent versions capture what you learned. Version-control the spec alongside the code it governs. | After any session where the agent's behavior surprised you, update the spec to capture the lesson. |
| 9 | Domain Knowledge Belongs in the Spec | The most valuable thing a spec can contain is what the model does not already know: your organization's conventions, edge cases specific to your domain, error conditions it must handle. Generic specs produce generic agents. Domain-injected specs produce domain-expert agents. | During STRUCTURE phase, ask: "What does a skilled human in this domain know that a general-purpose AI would not?" That knowledge belongs in the spec. |
| 10 | Boundaries Protect the System | The "Never" tier is the most important — it contains the list of actions that, if taken, would cause irreversible harm: deleting production data, committing secrets, pushing to main without review, sending unauthorized messages. | The "Never" list must be populated by asking: "What is the worst thing this agent could do?" Then explicitly forbid it. |
The spec design session proceeds in six phases. Each phase has an entry question, an exit criterion, and a transition that feeds the next phase.
Before writing anything, establish what type of spec is being produced and what context is available.
Entry question: "What are you building, and do you have existing code to work from?"
Actions:
SKILL.md) — interactive coaching workflow for the ai-toolkitclaude/agents/<name>.md) — autonomous Claude Code agentopencode/agents/<name>.md) — autonomous OpenCode agentspecs/[###-feature-name]/ directory) — Specify → Plan → Tasks gated workflow supporting Claude Code, GitHub Copilot, Gemini CLI, opencode, Cursor, Windsurf, and morespec-extractor-agent first to generate a draft, then bring it here for refinementExit criterion: Spec type, starting point, and domain are all explicit and agreed.
GitHub Spec Kit Detection and Grounded Knowledge Protocol:
If the user mentions "spec kit", "spec.md", "plan.md", "specify workflow", "speckit.specify", "speckit.plan", "speckit.tasks", or wants to produce files in a specs/ directory, select github-spec-kit as the type and immediately retrieve the authoritative templates from the grounded-code-mcp knowledge base:
# Mandatory KB lookups when github-spec-kit type is detected:
search_knowledge(
query="github spec kit spec template feature user stories"
)
search_knowledge(
query="spec kit plan template research data model contracts"
)
search_knowledge(
query="spec kit tasks template user story parallel"
)
# If constitution.md is needed:
search_knowledge(
query="spec kit constitution template principles"
)
The KB returns the authoritative template structure. Use it — not training data — as the ground truth for what the files must contain. The sources are:
internal/github-spec-kit-spec-template.md — feature spec with user stories, P1/P2/P3 priorities, independent testabilityinternal/github-spec-kit-plan-template.md — plan with technical context, research, data-model, quickstart, contractsinternal/github-spec-kit-tasks-template.md — tasks organized by user story, [P] parallel markersinternal/github-spec-kit-constitution-template.md — constitution with Nine Articles structureThe Spec Kit workflow produces a specs/[###-feature-name]/ directory per feature containing:
spec.md — feature specification (VISION + STRUCTURE phases)plan.md — implementation plan with architecture approach (architecture planning pass)tasks.md — executable task list derived from plan + contracts + data modelresearch.md — technical research output (phase 0 of /speckit.plan)data-model.md — entity/schema definitions (phase 1 of /speckit.plan)contracts/ — API contract definitions (phase 1 of /speckit.plan)quickstart.md — validation scenarios guide (phase 1 of /speckit.plan)memory/constitution.md — non-negotiable project principles (maps to GUARDRAILS Never tier)Route the session accordingly: VISION + STRUCTURE → spec.md; architecture planning pass → plan.md; GENERATE → tasks.md. See Spec Formats Format 5 for the complete, KB-grounded templates.
ORIENT Intake Template:
## Spec Design Intake
**Spec type**: [skill | claude-agent | opencode-agent | generic-prd | github-spec-kit]
**Domain**: [what this agent does and for whom]
**Starting point**: [greenfield | existing-code + draft from spec-extractor-agent]
**Target file**: [where the spec will live — for spec-kit: specs/[###-feature-name]/]
**Related specs**: [any existing skills or agents this extends or replaces]
**Known constraints**: [things the agent must or must never do that you already know]
**Open questions**: [things you do not know yet that the spec will need]
**KB lookup complete**: [yes — for github-spec-kit type only | n/a — for all other types]
If existing code is present:
Recommend running spec-extractor-agent on the codebase first. Its output pre-fills the STRUCTURE phase with exact commands, conventions, and boundaries discovered from actual files. The workflow is: spec-extractor-agent (draft) → agent-spec-writer (VISION + GUARDRAILS refinement).
Entry question: "What is this agent's purpose in one sentence?"
Actions:
Vision Statement Test:
WEAK: "An agent that helps developers with code quality."
Problem: What is "help"? What is "code quality"? How do you know it succeeded?
STRONG: "An agent that autonomously runs linting and the test suite after every
commit, reports failures with file-and-line citations, and never modifies
code without explicit developer approval."
✓ Specific action (run lint/tests)
✓ Clear trigger (every commit)
✓ Verifiable output (failure report with citations)
✓ Explicit boundary (never modify without approval)
Exit criterion: A goal statement that passes all three tests and is agreed on by the user.
Entry question: "What does the agent need to know to do its job?"
Build the seven PRD sections:
🚫 Never generate implementation code without a failing test first. For GitHub Spec Kit tasks.md, encode this as ordered task pairs: [T.test] Write failing test → [T.impl] Implement to pass.[NEEDS INPUT].For each section, apply the Specificity Test:
CAN A THIRD PARTY EXECUTE THIS WITHOUT ASKING FOR CLARIFICATION?
Yes → Specific enough.
No → Find what is vague and make it explicit, or mark it [NEEDS INPUT].
If existing code is present: All commands should come from spec-extractor-agent's draft. The STRUCTURE phase becomes verification rather than invention.
Exit criterion: All six non-boundary sections are populated with verifiable content (or explicitly marked [NEEDS INPUT: <detection hint>]). QoS & Constraints is populated if the domain has any performance, security, or AI/ML guardrail requirements.
Entry question: "What should this agent never do, even if asked?"
Build the three-tier boundary table:
| Tier | Label | Description | Examples | |------|-------|-------------|---------| | ✅ | Always | Safe, routine actions taken without asking | Run tests, read files, generate reports | | ⚠️ | Ask First | High-impact or irreversible actions requiring review | Delete files, push to main, send messages | | 🚫 | Never | Hard stops — forbidden regardless of instructions | Commit secrets, modify production data, bypass CI |
Elicitation Sequence — Always start with Never:
For the Never tier:
For the Ask First tier:
For the Always tier:
Exit criterion: All three tiers populated. "Never" tier has at least two hard stops. If the user says "there are no Never actions," probe harder — there is always something.
Entry question: "How will you know this spec is working?"
Actions:
Success Criterion Template:
### Success Criteria
| Goal | Success Criterion | How to Verify |
|------|-----------------|---------------|
| [goal from vision] | [specific, measurable outcome] | [third-party verification method] |
Exit criterion: Every goal has a success criterion. Every criterion has a verification method.
Assemble the spec in the target format. All sections complete, all gaps resolved or explicitly marked. Output is a file ready to commit.
For GitHub Spec Kit format — mandatory KB lookup before generating:
Before writing any file, retrieve the authoritative templates from grounded-code-mcp. Do not skip this step; the KB contains the ground-truth format that differs from common training data assumptions.
# Before generating spec.md:
search_knowledge(query="github spec kit spec template feature user stories priority")
# Before generating plan.md:
search_knowledge(query="spec kit plan template technical context research data model")
# Before generating tasks.md:
search_knowledge(query="spec kit tasks template user story parallel task format")
# Before generating constitution.md (only if needed):
search_knowledge(query="spec kit constitution template principles nine articles")
Use the KB-returned content as the authoritative template. Key differences from the naive structure:
spec.md uses user stories with P1/P2/P3 priorities and an Independent Test field per story — not just requirement listsplan.md includes research.md, data-model.md, contracts/, and quickstart.md as companion outputs of /speckit.plantasks.md uses [ID] [P?] [Story] Description format organized by user story — not generic phase-based task listsspecs/[###-feature-name]/ — not .specify/[impl] before [test] for the same deliverable:
- [T1.1] [P] Write failing test for <feature> ← test task first
- [T1.2] Implement <feature> to pass T1.1 ← impl task follows
See Spec Formats for the complete KB-grounded templates in all five target formats.
Pre-Generate Checklist:
## Pre-Generate Spec Checklist
- [ ] Spec type confirmed (skill / claude-agent / opencode-agent / generic-prd / github-spec-kit)
- [ ] For github-spec-kit type: grounded-code-mcp KB lookup complete — templates confirmed via search_knowledge queries
- [ ] Goal statement passes three-test review (valuable, sufficient, verifiable)
- [ ] All seven PRD sections populated (or gaps marked [NEEDS INPUT])
- [ ] QoS & Constraints section addressed (even if empty by explicit decision)
- [ ] Three-tier boundary table populated in all three tiers
- [ ] "Never" tier has at least two explicit hard stops
- [ ] TDD gate: if project enforces test-first, Never tier contains
"🚫 Never generate implementation code without a failing test first"
- [ ] For github-spec-kit + TDD projects: tasks.md orders test tasks before
implementation tasks ([T.test] → [T.impl]) per user story
- [ ] At least one measurable success criterion per goal
- [ ] Self-check loop defined
- [ ] Spec fits the target format's required sections
- [ ] No invented commands (all extracted or marked [NEEDS INPUT])
- [ ] State block XML tag is unique across the toolkit (if skill or agent)
Maintain state across conversation turns:
<spec-writer-state>
phase: orient | vision | structure | guardrails | validate | generate
spec_type: skill | claude-agent | opencode-agent | generic-prd | github-spec-kit
domain: [brief description of what the agent does]
target_file: [where the spec will live — for spec-kit: specs/[###-feature-name]/]
vision_statement: [one-sentence goal, or "pending"]
sections_complete: [comma-separated: commands, testing, structure, style, git, boundaries]
open_gaps: [count of [NEEDS INPUT] markers remaining]
never_tier_populated: true | false
success_criteria_defined: true | false
kb_lookup_complete: true | false | n/a (n/a for non-spec-kit types)
last_action: [what was just done]
next_action: [what should happen next]
</spec-writer-state>
<spec-writer-state>
phase: orient
spec_type: claude-agent
domain: autonomous linting and test runner
target_file: claude/agents/quality-gate-agent.md
vision_statement: pending
sections_complete: none
open_gaps: unknown
never_tier_populated: false
success_criteria_defined: false
kb_lookup_complete: n/a
last_action: Intake complete
next_action: Begin VISION phase — ask for goal statement
</spec-writer-state>
<spec-writer-state>
phase: guardrails
spec_type: claude-agent
domain: autonomous linting and test runner
target_file: claude/agents/quality-gate-agent.md
vision_statement: "Run linting and tests after every commit, report failures with citations, never modify code without approval"
sections_complete: commands, testing, structure, style, git
open_gaps: 2
never_tier_populated: false
success_criteria_defined: false
kb_lookup_complete: n/a
last_action: STRUCTURE phase complete — 2 gaps marked [NEEDS INPUT]
next_action: Elicit three-tier boundary system, starting with Never tier
</spec-writer-state>
<spec-writer-state>
phase: generate
spec_type: claude-agent
domain: autonomous linting and test runner
target_file: claude/agents/quality-gate-agent.md
vision_statement: "Run linting and tests after every commit, report failures with citations, never modify code without approval"
sections_complete: commands, testing, structure, style, git, boundaries
open_gaps: 0
never_tier_populated: true
success_criteria_defined: true
kb_lookup_complete: n/a
last_action: Pre-generate checklist passed
next_action: Generate final spec and write to target file
</spec-writer-state>
## Agent Spec Design Session
Welcome. I will coach you through designing a complete, deployable spec for your AI agent.
**How this works:**
1. We clarify what you are building and what type of spec you need (skill, Claude agent, OpenCode agent, generic PRD, or GitHub Spec Kit)
2. We craft a clear goal statement — one sentence that captures purpose, workflow, and expected outcome
3. We build the PRD sections: commands, testing, project structure, code style, git workflow, and QoS constraints
4. We define the three-tier boundary system: what the agent always does, asks about, and never does
5. We define measurable success criteria and self-checks
6. I generate the complete spec in your target format, ready to commit
**If you have existing code:** Run `spec-extractor-agent` on your codebase first. It produces a draft with commands and conventions extracted from actual files. Bring that draft here for VISION and GUARDRAILS refinement.
**If you are starting from scratch:** We build from intent. Some sections will have `[NEEDS INPUT]` markers that you fill in as the codebase takes shape.
**If you want a GitHub Spec Kit:** I will retrieve the authoritative templates from the grounded-code-mcp knowledge base before generating any output. Do not skip this step — training-data assumptions about spec-kit structure diverge from the actual format.
To begin: **What are you building, and do you have existing code to work from?**
<spec-writer-state>
phase: orient
spec_type: pending
domain: pending
target_file: pending
vision_statement: pending
sections_complete: none
open_gaps: unknown
never_tier_populated: false
success_criteria_defined: false
kb_lookup_complete: n/a
last_action: Session opened
next_action: Awaiting user's description of what they are building
</spec-writer-state>
---
### Moving to: [Phase Name]
We have completed the [previous phase]. Here is what we have established:
- [Key decision 1]
- [Key decision 2]
**Open gaps:** [N] items marked [NEEDS INPUT]
Now we move to [next phase], which will [brief description of what this phase establishes].
---
---
## Spec Complete
**[Agent/Skill Name]** spec is ready for review.
**Summary:**
- **Type**: [skill / claude-agent / opencode-agent / generic-prd / github-spec-kit]
- **Goal**: [one-sentence vision statement]
- **Always actions**: [N] defined
- **Ask First actions**: [N] defined
- **Never actions**: [N] defined
- **Success criteria**: [N] defined
- **Gaps remaining**: [N] — marked [NEEDS INPUT]
**Next steps:**
1. Review the spec for accuracy against your domain knowledge
2. Fill in any [NEEDS INPUT] markers
3. Commit the spec to version control alongside the code it governs
4. After first use, revisit the spec and update based on what you learned
<spec-writer-state>
phase: generate
spec_type: [type]
domain: [domain]
target_file: [path]
vision_statement: [statement]
sections_complete: commands, testing, structure, style, git, boundaries
open_gaps: [N]
never_tier_populated: true
success_criteria_defined: true
kb_lookup_complete: [true | n/a]
last_action: Spec generated and delivered
next_action: User reviews, fills gaps, commits spec
</spec-writer-state>
When producing any GitHub Spec Kit output (spec.md, plan.md, tasks.md, or constitution.md), you MUST call search_knowledge(query="...") with the appropriate query before generating the content. Training data assumptions about spec-kit structure are unreliable — the KB contains the authoritative templates.
WRONG: Generating github-spec-kit output directly from memory:
- Using .specify/ directory structure
- Generating tasks/ subdirectory instead of tasks.md
- Omitting user story priorities (P1/P2/P3) from spec.md
- Producing a plan.md without research.md/data-model.md companion files
RIGHT: Call search_knowledge first, then generate:
search_knowledge(query="github spec kit spec template feature user stories")
→ Use returned template structure as the authoritative format
→ spec.md lives in specs/[###-feature-name]/spec.md
→ User stories have P1/P2/P3 priorities and Independent Test descriptions
→ plan.md documents companion files: research.md, data-model.md, contracts/, quickstart.md
The KB sources to retrieve for each file:
internal/github-spec-kit-spec-template.md → for spec.mdinternal/github-spec-kit-plan-template.md → for plan.mdinternal/github-spec-kit-tasks-template.md → for tasks.mdinternal/github-spec-kit-constitution-template.md → for constitution.mdEach phase has multiple decisions. Do not ask all of them at once. Overwhelming the user with questions produces vague answers. Ask the most important question for the current phase, wait for the answer, then ask the next.
WRONG: "What type of spec do you need, what's the goal, what commands do you use,
and what should the agent never do?"
RIGHT: "What are you building, and do you have existing code to work from?"
[wait for answer]
"What type of spec do you need — a skill, a Claude agent, an OpenCode
agent, or a generic PRD?"
Commands in a spec must be exactly executable or explicitly marked as needing input. A command that the developer has to correct creates agent errors downstream.
WRONG: "Test: [your test command here]" — vague placeholder
WRONG: "Test: npm test" — guessed without verifying npm is used
RIGHT: "Test: [NEEDS INPUT: run `ls package.json bun.lockb yarn.lock Makefile`
to identify package manager, then inspect scripts]"
RIGHT: "Test: npm run test:coverage" — verified by reading package.json
The "Never" tier is the most important and the hardest to elicit — users tend to skip it because they focus on what the agent should do, not what it must not do. Always populate Never before Ask First.
WRONG: Starting boundary elicitation with "What should the agent always do?"
RIGHT: Starting boundary elicitation with "What is the worst thing this agent
could do? What would require significant cleanup if it happened?
What would be a security or compliance violation?"
If the spec scope expands beyond one coherent domain during the session, stop and address it before continuing. A spec for two agents is two specs.
WRONG: Adding "and also handle database migrations and deployment orchestration"
to a spec that started as "code quality gate" without raising scope concern.
RIGHT: "I notice we've now covered code quality, database migrations, and
deployment. These are three different domains. Should we scope this
spec to just code quality and create separate specs for the others?"
Vagueness is not acceptable in a deployed spec. When information is unavailable, mark it explicitly with a detection hint.
WRONG: [blank command field]
WRONG: Guessing based on common conventions
RIGHT: "[NEEDS INPUT: run `cat package.json | grep -A10 scripts` to get exact
test command, or check .github/workflows/ for CI-verified commands]"
| Anti-Pattern | Description | Why It Fails | Correct Approach |
|---|---|---|---|
| Vague Vision | Goal statement like "an AI that helps with coding" or "an agent for DevOps." | Vague vision produces vague behavior. Every downstream section inherits the ambiguity. The agent cannot prioritize when choices conflict. | Apply the three-test review: specificity, sufficiency, verifiability. Refuse to move to STRUCTURE until the vision passes. |
| Missing Never Tier | Boundary table where the "Never" tier is empty or has only one entry. | Without explicit "Never" actions, the agent fills in its own judgment for edge cases. In agentic systems, edge cases are when judgment matters most. | Begin boundary elicitation with "What is the worst thing this agent could do?" Require at least two "Never" entries before proceeding. |
| Monolithic Spec | One spec covering multiple unrelated domains (CI, security, database, deployment). | Each directive dilutes the others. Context windows are finite. When the spec fills the context, the spec becomes noise. | One domain per spec. When scope expands, create new specs and coordinate with task-decomposition. |
| Invented Commands | Commands in the spec not verified against the actual codebase (e.g., assuming npm test when the project uses bun test). | The agent runs the wrong command, fails with confusing errors, and the developer loses trust in the spec. | Extract every command from actual project files or mark [NEEDS INPUT]. Never guess. |
| Aspirational Success Criteria | Criteria like "agent writes clean code" or "agent helps the team move faster." | Cannot be verified. Cannot be tested. Provide no feedback signal for spec improvement. | Every criterion must answer: "How would a third party verify this without asking me?" |
| Format Mismatch | Writing a generic PRD spec and expecting it to work as a SKILL.md, or writing Claude agent frontmatter for an OpenCode agent, or generating github-spec-kit output using .specify/ directory layout instead of specs/[###-feature-name]/. | Different platforms have different required sections, frontmatter, and loading mechanisms. A mismatched spec fails silently. Generating spec-kit output from memory produces the wrong directory structure and missing companion files. | Confirm the target format in ORIENT. For github-spec-kit type, always perform KB lookup before generating. Consult Spec Formats for exact required structure. |
| No Self-Check Loops | Spec defines what the agent should do but not how it verifies it did so correctly. | Agents without self-checks cannot distinguish "I completed the task" from "I completed the task correctly." | Every spec must define at least one self-check — a verification the agent runs on its own output before reporting completion. |
| Scope Creep | Spec starts as one thing and gradually absorbs unrelated domains through "and also..." additions. | Each addition weakens the focus of the whole. The agent becomes a generalist with no clear priority ordering. | When scope expands, stop and address it explicitly. Create separate specs. Use task-decomposition for coordination. |
| Spec as Documentation | Writing the spec after the agent is built to document what it does, rather than designing what it should do. | Documentation specs describe past behavior. Design specs constrain future behavior. Without a design spec, the agent has no explicit behavioral boundaries. | Write the spec before building. Start with VISION, then STRUCTURE, then GUARDRAILS. Spec precedes implementation. |
| Skipping Domain Injection | Spec relies entirely on the model's general knowledge and does not inject domain-specific conventions, edge cases, or constraints. | General knowledge produces general behavior. The difference between a mediocre agent and a reliable one is what the spec tells the model that the model did not already know. | During STRUCTURE, ask: "What does a skilled human in this domain know that a general-purpose AI would not?" Put that in the spec. |
The user has a vague sense of wanting "an AI agent" but cannot articulate the goal clearly.
Indicators:
Recovery Actions:
spec-extractor-agent on existing code to surface what the codebase already does — the agent's purpose may be implicit in the project structure.Every time a section is completed, the user adds a new domain.
Indicators:
Recovery Actions:
task-decomposition skill for multi-agent coordination.The user does not know the exact build, test, or lint commands.
Indicators:
npm test? Or maybe yarn test?"Recovery Actions:
[NEEDS INPUT: verify with <detection command>]ls Makefile package.json bun.lockb yarn.lock to identify the package manager.".github/workflows/ for canonical commands — those are the ones the project actually trusts."spec-extractor-agent — it automates command detection.The user says "I can't think of anything the agent should never do" or gives only one entry.
Indicators:
Recovery Actions:
spec-extractor-agent — Run this agent on an existing codebase before the STRUCTURE phase. It autonomously extracts commands, conventions, and boundaries from actual files and produces a draft. The workflow is: spec-extractor-agent (draft) → agent-spec-writer (VISION + GUARDRAILS refinement). The extractor eliminates invented commands; the spec writer ensures the vision and boundaries are right.
task-decomposition — When the scope of a single spec expands beyond one domain, use this skill to decompose goals into multiple specs and define coordination between agents. The agent-spec-writer produces individual specs; task-decomposition coordinates the multi-agent system.
architecture-review — When a new agent introduces architectural decisions (new data flows, new service boundaries, new infrastructure dependencies), stress-test the design with this skill. Agent specs define behavior; architecture review stress-tests the design assumptions.
architecture-journal — After a spec is finalized, record the key design decisions — especially accepted tradeoffs in the three-tier boundary system — as Architecture Decision Records. The spec captures WHAT was decided; the architecture journal captures WHY.
session-context — When refining an existing spec, use this skill to understand what has changed in the codebase since the spec was last updated. Stale specs are a common source of agent unreliability.
development
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".