
Use when performing thorough code review with historical context tracking. Triggers: 'thorough review', 'deep review', 'review this branch in detail', 'full code review with report'. More heavyweight than code-review; for quick review, use code-review instead.
Use when reviewing PRs to triage, categorize, or summarize changes requiring human attention. Triggers: 'summarize this PR', 'what changed in PR #X', 'triage PR', 'which files need review', 'PR overview', 'categorize changes', or pasting a PR URL. NOT for: deep code analysis (use advanced-code-review) or quick review (use code-review).
Use when generating, improving, or auditing project documentation. Triggers: 'document this project', 'write docs', 'generate documentation', 'docs are outdated', 'need a README', 'create tutorials', 'API reference docs', 'doc audit', 'documentation review', '/document-project'. Standalone README: use /write-readme. NOT for: code changes (use develop), code review (use code-review), or research (use deep-research).
Behavioral protocol for all code changes. Invoked automatically by develop and test-driven-development. Triggers: 'code quality', 'no shortcuts', 'production quality', 'enforce standards'. NOT for: reviewing others' code (use code-review) or test quality (use fixing-tests).
Use when generating flowcharts, diagrams, dependency graphs, or visual representations of processes, relationships, architecture, or state machines. Triggers: 'diagram this', 'flowchart', 'visualize', 'dependency graph', 'ER diagram', 'state machine diagram', 'class diagram', 'sequence diagram', 'map the relationships', 'draw the architecture', 'how does X connect to Y'. NOT for: simple bullet point explanations, runtime monitoring, or text-only documentation.
Use when generating documents, reports, plans, audits, or deciding where to save output files. Triggers: 'save report', 'write plan', 'where should I put this', 'where does this go', 'output directory', 'save this somewhere'.
Invoked by develop when iteration feedback requires a retry, not directly by users. Prevents repeating the same mistakes across attempts. Also relevant when: 'why did this fail again', 'same error twice', 'what should I do differently', 'keep making the same mistake'.
Subagent trust tier system for handling external and untrusted content. Invoked by dispatching-parallel-agents when external content is involved. Triggers: 'review this PR from external contributor', 'untrusted content', 'third-party code', 'what trust tier', 'quarantine', 'review_untrusted', 'external PR', 'security tier', 'trust boundary', 'session protection'.
Session resume protocol and session repairs handling. Loaded when spellbook_session_init returns resume_available: true, or when session_init returns a repairs array. Triggers: 'resume', 'continue', 'where were we', session resume, session repairs.
Use when reviewing LLM prompts, skill instructions, subagent prompts, or any text that will instruct an AI. Triggers: "review this prompt", "audit instructions", "sharpen prompt", "is this clear enough", "would an LLM understand this", "ambiguity check". Also invoked by instruction-engineering, reviewing-design-docs, and reviewing-impl-plans for instruction quality gates.
Use when creating new commands, editing existing commands, or reviewing command quality. Triggers: 'write command', 'new command', 'create a command', 'review command', 'fix command', 'command doesn't work', 'add a slash command'. NOT for: skill creation (use writing-skills).
Use when creating new skills, editing existing skills, or verifying skills work before deployment. Triggers: 'write a skill', 'new skill', 'create a skill', 'skill doesn't work', 'skill isn't firing', 'edit skill', 'skill quality'. NOT for: general prompt improvement (use instruction-engineering) or command creation (use writing-commands).
Use when writing subagent prompts, skill instructions, or any text where accuracy is critical and hallucination would cause harm. Triggers: 'make this accurate', 'high-stakes prompt', 'this needs to be truthful', 'critical instructions', 'get this right'. NOT for: general prompt improvement (use instruction-engineering) or prompt ambiguity review (use sharpening-prompts).
Use when verifying technical claims in code, docs, or comments before merge. Triggers: 'is this claim correct', 'verify this', 'fact check', 'is this accurate', 'check these assertions', 'are these comments true'. NOT for: checking if AI hallucinated references (use dehallucination).
Use when reviewing design documents, technical specifications, architecture docs, RFCs, ADRs, or API designs for completeness and implementability. Triggers: 'review this design', 'is this spec complete', 'can someone implement from this', 'what's missing from this design', 'review this RFC', 'is this ready for implementation', 'audit this spec'. Core question: could an implementer code against this without guessing?
Meta-audit skill for spellbook development. Spawns parallel subagents to factcheck docs, optimize instructions, find token savings, and identify MCP candidates. Produces actionable report.
Use when exploring design approaches, generating ideas, or making architectural decisions. Triggers: 'explore options', 'what are the tradeoffs', 'how should I approach', 'let's think through', 'sketch out an approach', 'I need ideas for', 'how would you structure', 'what are my options'. Also invoked by develop when design decisions are needed.
Use when you have an implementation plan ready to execute. Triggers: 'run the plan', 'start building', 'execute the tasks', 'implement the steps', 'next task in the plan', 'work through the plan'. Also invoked by develop after planning phase completes. NOT for: creating plans (use writing-plans).
Use when deeply exploring uncertainty, systematically decomposing complex questions, or gaining certainty about multi-faceted problems. Triggers: 'think deeply about', 'explore this recursively', 'I need certainty about', 'decompose this question', 'what am I missing'. Invoked by brainstorming, fact-checking, debugging, and deep-research. NOT for: simple questions with known answers or linear task execution.
Behavioral protocol for reading files or command output of unknown size. Loaded automatically for all file reading operations. Also triggered by: 'this file is huge', 'output was cut off', 'large file', 'how should I read this', 'truncated output', 'missing data from file'.
Use when evaluating skill effectiveness or comparing skill versions. Triggers: 'how are skills performing', 'skill metrics', 'which skills fire correctly', 'skill invocation analysis', 'compare skill versions', 'analyze skill usage'. Also invoked by skill improvement workflows.
Use when preparing context for subagent dispatch or managing token budgets. Triggers: 'prepare context for', 'assemble context', 'token budget', 'context package', 'what context does the subagent need'. Also invoked by develop during planning and execution phases.
Use when writing JavaScript or TypeScript code with asynchronous operations, fixing promise-related bugs, or converting callback/promise patterns to async/await. Triggers: 'promise chain', 'unhandled rejection', 'race condition in JS', 'callback hell', 'Promise.all', 'sequential vs parallel async', 'missing await'. Enforces async/await discipline over raw promises.
Reference for TTS and OS notification configuration. Auto-loads when TTS is enabled (session_init reports TTS active). Also triggered by: 'mute', 'unmute', 'change voice', 'volume', 'notify', 'notification settings', '/tts', '/notify', 'kokoro', 'speak', 'audio feedback'.
Use when auditing whether tests genuinely catch failures, or when user expresses doubt about test quality. Triggers: 'are these tests real', 'do tests catch bugs', 'tests pass but I don't trust them', 'test quality audit', 'green mirage', 'shallow tests', 'tests always pass suspiciously', 'would this test fail if code was broken'. NOT for: fixing broken tests (use fixing-tests).
DEPRECATED: This skill has been absorbed into the develop skill. Use develop instead. The capabilities of autonomous-roundtable (project decomposition, roundtable gating, reflexion on ITERATE) are now available through develop's dialectic_mode and token_enforcement preferences. Set dialectic_mode to "roundtable" in Phase 0.4 for equivalent behavior.
Use when reviewing code. Triggers: 'review my code', 'check my work', 'look over this', 'review PR #X', 'PR comments to address', 'reviewer said', 'address feedback', 'self-review before PR', 'audit this code'. For heavyweight multi-phase analysis, use advanced-code-review instead.
Use when creating GitHub pull requests or issues with template compliance. Triggers: 'create a PR', 'open a pull request', 'file an issue', 'create issue'. Also invoked by finishing-a-development-branch. NOT for: deciding whether to merge or PR (use finishing-a-development-branch).
Use when debugging bugs, test failures, or unexpected behavior. Triggers: 'why isn't this working', 'this doesn't work', 'X is broken', 'something's wrong', 'getting an error', 'exception in', 'stopped working', 'regression', 'crash', 'hang', 'flaky test', 'intermittent failure', or when user pastes a stack trace/error output. NOT for: test quality issues (use fixing-tests), adding new behavior (use develop).
Use when researching complex topics, evaluating technologies, investigating domains, or answering multi-faceted questions requiring web research. Triggers: 'research X', 'investigate Y', 'evaluate options for Z', 'what are the best approaches to', 'help me understand', 'deep dive into', 'compare alternatives', 'look into', 'find out about'. NOT for: exploring design approaches (use brainstorming) or domain modeling (use analyzing-domains).
Use when verifying that AI-generated claims, references, or assertions are grounded in reality. Triggers: 'does this actually exist', 'is this real', 'did you hallucinate', 'verify these references', 'check if this is fabricated', 'reality check', 'ground truth'. Invoked as quality gate by develop and deep-research. NOT for: verifying technical claims in code (use fact-checking).
Use when designing systems with explicit states, transitions, or multi-step flows. Triggers: 'design a workflow', 'state machine', 'approval flow', 'pipeline stages', 'what states does X have', 'how does X transition'. Also invoked by develop when workflow patterns are detected.
Use when building, creating, modifying, or planning any code change. Triggers: "implement X", "build Y", "add feature Z", "create X", "change how X works", "modify Y", "update the Z", "refactor X", "rework Y", "restructure Z", "make X do Y", "let's plan how to", "plan the implementation", "how should we implement", "how would you build", "what's the best way to implement", "I want to...", "We need...", "Would be great to...", "Can we add...", "Let's add...", "Let's build...", "Let's make...", "start a new project". Also for: new projects, repos, templates, greenfield development, refactoring, migrations, multi-file modifications, any code change requiring planning. PREFER THIS OVER plan mode or ad-hoc implementation for ANY substantive code change. NOT for: bug fixes (use debugging), pure research (use deep-research), questions about existing code without intent to change it, or test-only fixes (use fixing-tests).
Use when challenging assumptions, surfacing risks, or stress-testing designs and decisions. Triggers: 'challenge this', 'play devil's advocate', 'what could go wrong', 'poke holes', 'find the flaws', 'what am I missing', 'is this solid', 'red team this', 'what are the weaknesses', 'risk assessment', 'sanity check'. Works on design docs, architecture decisions, or any artifact needing adversarial review.
Use when deciding whether to dispatch subagents, when to stay in main context, or when facing 2+ independent parallel tasks. Triggers: 'should I use a subagent', 'parallelize', 'multiple independent tasks', 'run these at the same time', 'split this up', 'do both at once', 'dispatch template', 'context minimization'.
Use after modifying library skills, library commands, or agents to ensure CHANGELOG, README, and docs are updated
Use when writing MCP tools, API endpoints, CLI commands, or any function that an LLM will invoke. Also use when LLMs misuse tools due to poor descriptions. Triggers: 'document this tool', 'write tool docs', 'MCP tool', 'tool description quality', 'model keeps calling this wrong', 'improve tool description'. For human-facing API docs, standard documentation practices apply instead.
Use when reviewing code changes, auditing new features, or cleaning up. Triggers: 'find dead code', 'find unused code', 'check for unnecessary additions', 'what can I remove', 'is this used anywhere', 'can I delete this', 'orphaned code', 'unused imports'.
Use when implementation is complete, tests pass, and you need to decide the integration path. Triggers: 'done with this branch', 'ready to merge', 'ship it', 'wrap this up', 'how should I integrate this', 'what next after implementation'. NOT for: PR creation mechanics (use creating-issues-and-pull-requests).
Use when tests themselves are broken, test quality is poor, or user wants to fix/improve tests. Triggers: 'test is broken', 'test is wrong', 'test is flaky', 'make tests pass', 'tests need updating', 'green mirage', 'tests pass but shouldn't', 'audit report findings', 'run and fix tests'. NOT for: bugs in production code caught by correct tests (use debugging).
Use when starting a session and wanting creative engagement. Triggers: '/fun', 'use a persona', 'be creative', 'make this fun', 'use a character', 'spice it up'. Session-level mode, not task-level.
Use when eliciting or clarifying feature requirements, defining scope, identifying constraints, or capturing user needs. Triggers: 'what are the requirements', 'define the requirements', 'scope this feature', 'user stories', 'acceptance criteria', 'what should this do', 'what problem are we solving', 'what are the constraints'. Also invoked by develop during discovery.
Use when crafting, improving, or reviewing prompts, system prompts, skill instructions, or any text that instructs an LLM. Triggers: 'write a prompt', 'prompt engineering', 'improve this prompt', 'design a system prompt', 'write skill instructions', 'craft agent instructions'. Also invoked by writing-skills. NOT for: auditing existing prompts for ambiguity (use sharpening-prompts).
Use when testing theories during debugging, or when chaos is detected. Triggers: "let me try", "maybe if I", "what about", "quick test", "see if", rapid context switching, multiple changes without isolation. Enforces one-theory-one-test discipline. Invoked automatically by debugging, scientific-debugging, systematic-debugging before any experiment execution.
Use when merging parallel worktrees back together after parallel implementation, combining parallel development tracks, or unifying branches from dispatched parallel agents. Triggers: 'merge worktrees', 'combine parallel branches', 'integrate parallel work', 'all tracks complete', 'bring everything together'.
Triggers after completing substantive work (finishing a todo, returning from subagent, applying non-obvious convention, receiving user correction). Also: 'what should we capture', 'reusable pattern', 'should this be a skill', 'AGENTS.md update', 'knowledge gap'. Behavioral skill loaded at natural pause points.
Use when improving project discoverability, attracting users/contributors, or presenting open source work. Triggers: 'write a README', 'improve README', 'get more users', 'get more contributors', 'add badges', 'create a logo', 'set up issue templates', 'audit this project', 'project presence', 'make this discoverable', 'why isn't anyone using this', 'prepare for launch', 'repo presentation', 'open source marketing', 'attract contributors', 'project storefront'. Also triggers on: naming a project, writing taglines, GitHub metadata, community infrastructure, signs of life.
Use when implementation is done and you need a structured pre-PR review workflow. Triggers: 'ready for review', 'review my changes before PR', 'pre-merge check', 'is this ready', 'submit for review'. NOT for: post-merge review (use code-review) or deciding how to integrate (use finishing-a-development-branch).
Use when git merge or rebase fails with conflicts, you see 'unmerged paths' or conflict markers (<<<<<<< =======), or need help resolving conflicted files. Triggers: 'merge conflict', 'fix the conflicts', 'conflicting changes', 'resolve conflicts', 'can't merge'.
System skill loaded before dispatching any PR review subagent. Ensures correct file version selection based on branch and worktree state. Not invoked directly by users. Required by: code-review, advanced-code-review, distilling-prs when reviewing PRs.
Use when auditing skills, commands, hooks, and MCP tools for security vulnerabilities. Triggers: 'security audit', 'scan for vulnerabilities', 'check security', 'audit skills', 'audit MCP tools', 'is this safe', 'check for injection', 'OWASP'. NOT for: general code review (use code-review --audit).
Use when user explicitly requests test-driven development. Triggers: 'TDD', 'write tests first', 'red green refactor', 'test-first', 'start with the test'. Also invoked by develop and executing-plans for implementation tasks. NOT for: full feature work (use develop, which includes TDD internally).
Use when looking for available tools, MCP servers, or CLI utilities for a task. Triggers: 'what tools do I have', 'is there an MCP for this', 'what's available', 'find a tool for', 'discover tooling', 'what CLI tools exist'. NOT for: documenting existing tools (use documenting-tools).
Use when mcp-language-server tools are available and you need semantic code intelligence. Triggers: 'find definition', 'find references', 'who calls this', 'rename symbol', 'type hierarchy', 'go to definition', 'where is this used', 'where is this defined', 'what type is this'. Provides navigation, refactoring, and type analysis via LSP.
System skill loaded at session start to initialize skill routing. Not invoked directly by users. Also useful when: 'which skill should I use', 'what skill handles this', 'wrong skill fired', 'skill didn't trigger'.
Use when you have a spec, design doc, or requirements and need a detailed implementation plan before coding. Triggers: 'write a plan', 'create implementation plan', 'plan this out', 'break this down into steps', 'convert design to tasks', 'implementation order'. Also invoked by develop during planning. NOT for: reviewing existing plans (use reviewing-impl-plans).
Use when instruction files (skills, prompts, CLAUDE.md) are too long or need token reduction while preserving capability. Triggers: 'optimize instructions', 'reduce tokens', 'compress skill', 'make this shorter', 'too verbose', 'this skill is too big', 'over the line limit', 'trim this down'.
[DEPRECATED] Use project-level AGENTS.md files instead. Previously used for first-session codebase onboarding and persistent glossary creation.
Use when session returns mode.type='tarot', user says '/tarot', or requests roundtable dialogue with archetypes. Triggers: '/tarot', 'use tarot mode', 'roundtable with archetypes', 'tarot personas'. Session-level mode, not task-level.
Test selection strategy and scope guidance. Triggers: 'which tests should I run', 'test tiers', 'test marks', 'slow tests', 'integration vs unit', 'cross-module regression', 'test scope', 'what should I run', 'select tests', 'test batching'. NOT for: writing tests (use test-driven-development) or fixing broken tests (use fixing-tests).
Use when about to claim discovery during debugging. Triggers: "I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.
Use when reviewing implementation plans before execution. Triggers: 'is this plan solid', 'review the plan', 'check before I start building', 'anything missing from this plan', 'will this plan work', 'audit the implementation plan'. NOT for: reviewing design documents (use reviewing-design-docs) or creating plans (use writing-plans).
Use when entering unfamiliar domains, modeling complex business logic, or when terms/concepts are unclear. Triggers: 'what are the domain concepts', 'define the entities', 'model this domain', 'DDD', 'ubiquitous language', 'bounded context'. Also invoked by develop during research phase.
Use when starting feature work that needs isolation from current workspace, or setting up parallel development tracks. Triggers: 'worktree', 'separate branch', 'isolate this work', 'don't mess up current work', 'work on two things at once', 'parallel workstreams', 'new branch for this', 'keep my current work safe'.
Triggers: 'branch diff', 'what changed on this branch', 'merge base', 'stacked branches', 'branch-context.sh', 'what does this branch do', 'PR description', 'changelog', 'branch comparison', 'diff since', 'what work is on this branch'. Also relevant during PR creation and finishing-a-development-branch workflows.
Loaded at session start when spellbook_session_init returns mode data, or when mode.type is 'unset'. Triggers: session init mode handling, '/mode', mode selection, 'fun mode', 'tarot mode'.