skills/personal/spec-coach/SKILL.md
Interactive spec design coach. Guides through vision, PRD structure, three-tier boundary definition, user story quality (INVEST), specification by example, and measurable success criteria to produce a complete, deployable spec. Use when designing a new skill, agent, feature, or any AI system that needs explicit behavioral boundaries. Use for "spec coach", "write agent spec", "create agent spec", "spec for ai agent", "agent specification", "define agent", "new skill spec", "new agent spec", "design agent behavior", "agent boundaries", "spec kit", "spec.md", "plan.md", "github spec kit", "specify workflow", "write a spec", "software requirements spec", "SRS", "write requirements", "design spec".
npx skillsauth add michaelalber/ai-toolkit spec-coachInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"A spec is a promise — a promise the system makes to its users and a promise the team makes to themselves about what they will build." — Adapted from Joel Spolsky, "Painless Functional Specifications"
A good spec for an AI agent is not documentation written after the fact. It is a design artifact that shapes behavior before a single line of code is written. The spec IS the design. Vague specs produce inconsistent agents. Inconsistent agents are discarded.
What this skill IS: A structured coaching conversation that produces a complete, deployable spec. A design session, not a documentation exercise.
What this skill is NOT: A template filler; a substitute for domain expertise; a code generator.
The Five O'Reilly Principles (source framework):
See O'Reilly Framework for the full principle analysis.
| # | Principle | Description | Applied As |
|---|-----------|-------------|------------|
| 1 | Vision Before Details | The goal statement must be clear before any behavior is defined. | At the start of VISION phase, require: "What does this agent do, for whom, and how will you know it succeeded?" |
| 2 | PRD Structure Over Prose | Sections with named headers, tables, and code blocks can be scanned at inference time. Prose requires reading. | Every section must have a named header. Commands in code blocks. Boundaries in a table. |
| 3 | Exact Commands | "Run the tests" is not a command. pytest tests/ -v --cov=src is. Approximate commands produce guessing. | Every command must be copy-paste executable. Unknown commands are marked [NEEDS INPUT]. |
| 4 | Three-Tier Boundaries | Always / Ask First / Never. Agents without explicit boundaries fill in gaps with their own judgment. | Every spec MUST include a three-tier boundary table. All three tiers must be populated. |
| 5 | Modular Over Monolithic | One agent with a focused spec outperforms one agent with a sprawling spec. | When scope expands beyond one domain, challenge it: "Is this one agent or several?" |
| 6 | Specificity Over Generality | "Write clean code" is an aspiration. "Run ruff check and fail if errors" is a constraint. | Every qualitative claim must be made quantitative or procedural, and verifiable. |
| 7 | Measurable Success Criteria | A spec without measurable success criteria cannot be validated. | For every goal in VISION, require a corresponding criterion a third party can verify. |
| 8 | The Spec Evolves | The first version captures intent; subsequent versions capture what you learned. | After any session where the agent's behavior surprised you, update the spec. |
| 9 | Domain Knowledge Belongs in the Spec | The most valuable content is what the model does not already know: your conventions, edge cases, domain-specific error conditions. | During STRUCTURE: "What does a skilled human here know that a general-purpose AI would not?" |
| 10 | Boundaries Protect the System | The Never tier is the most important — it contains irreversible harms. | Populate Never by asking: "What is the worst thing this agent could do?" Then explicitly forbid it. |
| Query | When to Call |
|-------|--------------|
| search_knowledge("github spec kit spec template feature user stories") | Mandatory when github-spec-kit type detected — before generating spec.md |
| search_knowledge("spec kit plan template research data model contracts") | Mandatory before generating plan.md |
| search_knowledge("spec kit tasks template user story parallel task format") | Mandatory before generating tasks.md |
| search_knowledge("github spec kit constitution template principles") | Before generating constitution.md (only if needed) |
| search_knowledge("INVEST user story independent valuable testable") | During STRUCTURE phase — confirm user story quality criteria |
Determine: spec type (skill SKILL.md / claude-agent / opencode-agent / generic-prd / github-spec-kit), starting point (greenfield or existing code + spec-extractor-agent draft), and domain.
GitHub Spec Kit detection: If the user mentions "spec kit", "spec.md", "plan.md", "specify workflow", or files in a specs/ directory, select github-spec-kit type and immediately run the mandatory KB lookups above. The KB returns authoritative templates; use them — not training data. Key differences from naive assumptions:
spec.md uses user stories with P1/P2/P3 priorities and an Independent Test fieldplan.md includes research.md, data-model.md, contracts/, quickstart.md as companion outputstasks.md uses [ID] [P?] [Story] Description format by user story — not generic phase-based listsspecs/[###-feature-name]/ — not .specify/If existing code is present, recommend running spec-extractor-agent first. Its draft pre-fills STRUCTURE with exact commands and conventions from actual files.
Exit criterion: Spec type, starting point, and domain are all explicit and agreed.
Ask for a one-sentence goal capturing purpose, primary workflow, and expected outcome. Apply the three-test review: (1) If the agent did only this, would it be valuable? (2) If it did nothing else, would the goal be met? (3) Can a third party verify success? Refine until all three pass.
Weak: "An agent that helps developers with code quality." — "help" is undefined, "quality" is undefined, success is unverifiable.
Strong: "An agent that runs linting and the test suite after every commit, reports failures with file-and-line citations, and never modifies code without explicit developer approval." — specific action, clear trigger, verifiable output, explicit boundary.
Exit criterion: A goal statement that passes all three tests.
Build the seven PRD sections:
🚫 Never generate implementation without a failing test first[NEEDS INPUT]For each section: can a third party execute this without asking for clarification? Yes → specific enough. No → make it explicit or mark [NEEDS INPUT].
Story Quality Gate (Spec Kit and feature-scoped PRDs): Apply INVEST per story before accepting:
| Check | Question | If it fails | |---|---|---| | Independent | Can this be built, tested, and demonstrated without other stories? | Split or reorder | | Valuable | Does implementing just this deliver observable value? | Merge with related story | | Testable | Can you write a concrete acceptance test now, without asking anyone? | The story is not ready |
Specification by Example: After drafting acceptance criteria, ask for at least one concrete example per criterion: specific input, action taken, exact expected output. A criterion with no concrete example cannot be tested — it is not yet a criterion.
Definition of Ready: Before leaving STRUCTURE — "Could a developer write a failing test for each criterion right now, without asking anyone?"
Exit criterion: All six non-boundary sections populated (or marked [NEEDS INPUT]). Every user story passes INVEST. Every acceptance criterion has a concrete example.
Build the boundary table:
| Tier | Description | Examples | |------|-------------|---------| | ✅ Always | Safe, routine — taken without asking | Run tests, read files, generate reports | | ⚠️ Ask First | High-impact or irreversible — requires review | Delete files, push to main, send messages | | 🚫 Never | Hard stops — forbidden regardless of instructions | Commit secrets, modify production data, bypass CI |
Elicitation: start with Never — "What is the worst thing this agent could do?" "What would require significant cleanup?" "What would be a security violation?" Then Ask First, then Always.
Exit criterion: All three tiers populated. Never has at least two hard stops.
For each VISION goal, define a measurable success criterion and a verification method. Define the self-check loop — what verifications does the agent run on its own output before declaring done?
Exit criterion: Every goal has a success criterion with a verification method.
For GitHub Spec Kit: Run mandatory KB lookups before generating each file (spec.md, plan.md, tasks.md). TDD task ordering — apply [T.test] → [T.impl] pairs only if the project explicitly enforces test-first development; the spec kit treats test tasks as optional by default.
Pre-Generate Checklist (abbreviated):
[NEEDS INPUT]See Spec Formats for complete KB-grounded templates in all five target formats.
<spec-coach-state>
phase: orient | vision | structure | guardrails | validate | generate
spec_type: skill | claude-agent | opencode-agent | generic-prd | github-spec-kit
domain: [brief description]
target_file: [where the spec will live]
vision_statement: [one-sentence goal, or "pending"]
sections_complete: [comma-separated list]
stories_invest_checked: true | false | n/a
criteria_have_examples: true | false | n/a
open_gaps: [count of [NEEDS INPUT] markers]
never_tier_populated: true | false
success_criteria_defined: true | false
kb_lookup_complete: true | false | n/a
last_action: [what was just done]
next_action: [what should happen next]
</spec-coach-state>
Example:
<spec-coach-state>
phase: guardrails
spec_type: claude-agent
domain: autonomous linting and test runner
target_file: claude/agents/quality-gate-agent.md
vision_statement: "Run linting and tests after every commit, report failures with citations, never modify code without approval"
sections_complete: commands, testing, structure, style, git
stories_invest_checked: n/a
criteria_have_examples: n/a
open_gaps: 2
never_tier_populated: false
success_criteria_defined: false
kb_lookup_complete: n/a
last_action: STRUCTURE phase complete — 2 gaps marked [NEEDS INPUT]
next_action: Elicit three-tier boundary system, starting with Never tier
</spec-coach-state>
## Spec Design Session
To begin: **What are you building, and do you have existing code to work from?**
[Intake: spec type / domain / starting point / target file / known constraints]
---
### Moving to: [Phase Name]
Completed: [previous phase]. Established: [key decisions]. Open gaps: [N].
Next: [what this phase establishes].
---
## Spec Complete — [Agent/Skill Name]
Type: [type] | Goal: [vision statement]
Always: [N] | Ask First: [N] | Never: [N] | Success criteria: [N] | Gaps: [N] [NEEDS INPUT]
Always ground GitHub Spec Kit output against the KB. When producing spec.md, plan.md, tasks.md, or constitution.md, call search_knowledge with the appropriate query before generating content. Training data assumptions about spec-kit structure are unreliable — the KB has the authoritative templates. Never use .specify/ directory structure or generate tasks/ subdirectory instead of tasks.md.
Ask one question at a time. Each phase has multiple decisions. Overwhelming the user with questions produces vague answers. Ask the most important question for the current phase, wait for the answer, then ask the next.
Never invent commands. Commands must be exactly executable or explicitly marked [NEEDS INPUT: run <detection command> to verify]. A guessed command that requires correction destroys trust in the spec.
Always populate the Never tier first. It is the hardest to elicit — users focus on what the agent should do, not what it must not. Start boundary elicitation with "What is the worst thing this agent could do?"
Require concrete examples before accepting acceptance criteria. "Validates input" cannot be tested. Ask: "Give me one concrete example — a specific input value, what the system does, and exactly what it returns." A criterion without a concrete example is not done.
Respect scope limits. When scope expands beyond one domain, name the pattern and address it before continuing. A spec for two domains is two specs.
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| Vague Vision | Every downstream section inherits the ambiguity; agent cannot prioritize when choices conflict. | Apply the three-test review. Refuse to move to STRUCTURE until vision passes. |
| Missing Never Tier | Agent fills in its own judgment for edge cases — exactly when judgment matters most. | Require at least two Never entries before proceeding. |
| Monolithic Spec | Each directive dilutes the others; context windows are finite. | One domain per spec. Create new specs when scope expands. |
| Invented Commands | Agent runs the wrong command, fails, developer loses trust. | Extract every command from actual project files or mark [NEEDS INPUT]. |
| Aspirational Success Criteria | Cannot be verified, tested, or used as improvement feedback. | Every criterion must answer: "How would a third party verify this without asking me?" |
| Criteria Without Examples | Cannot be tested without interpretation; every ambiguous criterion generates a different implementation. | One concrete example per criterion: specific input, action, expected output. |
| INVEST Failures | Unindependent stories create blocking dependencies; untestable stories cannot be verified. | Apply the INVEST check per story during STRUCTURE. |
| Format Mismatch | Different platforms require different sections and loading mechanisms; a mismatched spec fails silently. | Confirm target format in ORIENT. For github-spec-kit, always run KB lookup before generating. |
| No Self-Check Loops | Agent cannot distinguish "completed" from "completed correctly." | Every spec must define at least one self-check the agent runs on its own output. |
| Spec as Documentation | Documentation specs describe past behavior; design specs constrain future behavior. | Write the spec before building. Spec precedes implementation. |
spec-extractor-agent on existing code.task-decomposition.[NEEDS INPUT: run ls Makefile package.json bun.lockb yarn.lock].github/workflows/ for canonical commands."🚫 Never commit secrets or credentials before allowing progression.spec-extractor-agent — Run on an existing codebase before STRUCTURE phase. Its draft pre-fills exact commands, conventions, and boundaries from actual files. Workflow: spec-extractor-agent (draft) → spec-coach (VISION + GUARDRAILS refinement).spec-implement — Consumes specs produced by this skill. Translates acceptance criteria into GIVEN/WHEN/THEN tests and drives TDD implementation. Better criteria here → simpler PARSE phase there.task-decomposition — When scope expands beyond one domain, use this skill to decompose into multiple specs and define coordination between agents.architecture-review — When a new agent introduces architectural decisions (new data flows, service boundaries, infrastructure dependencies), stress-test the design here.architecture-journal — After a spec is finalized, record key design decisions as ADRs. The spec captures WHAT was decided; the journal captures WHY.development
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".