
Artifact-backed interface critique and polish for hierarchy, typography, layout, density, IA, interaction feel, content, brand fit, and taste. Requires screenshot, URL, rendered artifact, or explicit file plus intent. Use when: "make this look better", "improve the design", "polish the UI", "critique this screen", "design pass", "art direction", "scaffold design". Trigger: /design.
Pick browser automation for web/Electron: CI E2E, scripted flows, scraping, visual regression, exploratory QA, persona walks, monitoring, or browser agents. Deterministic Playwright first; harden exploratory findings into repeatable tests. Use for "automate the browser", "test this web app", "test this electron app", "Playwright or Stagehand", "scrape this site", "browser agent", "visual regression", "E2E tests". Trigger: /browser.
Investigate, audit, triage, and fix. Systematic debugging, incident lifecycle, domain auditing, and issue logging. Four-phase protocol: root cause → pattern analysis → hypothesis test → fix. Use for: any bug, test failure, production incident, error spikes, audit, triage, postmortem, "diagnose", "why is this broken", "debug this", "production down", "is production ok", "audit stripe", "log issues". Trigger: /diagnose.
Audit CI gates, strengthen weak coverage, then drive green. Dagger owns the canonical pipeline; missing Dagger is auto-scaffolded. Acts directly on mechanical fixes and never returns red without structured diagnosis. Use when: "run ci", "check ci", "fix ci", "audit ci", "is ci passing", "run the gates", "dagger check", "why is ci failing", "strengthen ci", "tighten ci", "ci is red", "gates failing". Trigger: /ci, /gates.
Outer-loop shipping orchestrator. Composes /shape, /implement, /yeet, /deliver --polish-only, /ship, and /monitor per backlog item. Closure (archive, reflect, harness routing) lives in /ship; flywheel does not invoke /reflect directly. Use when: "flywheel", "run the outer loop", "next N items", "overnight queue", "cycle". Trigger: /flywheel.
Shape a raw idea into something buildable. Product + technical exploration. Spec, design, critique, plan. Output is a context packet. Use when: "shape this", "write a spec", "design this feature", "plan this", "spec out", "context packet", "technical design". Trigger: /shape, /spec, /plan, /cp.
Four LLM-agent guardrails: surface assumptions, prefer simplicity, make surgical changes, and drive by verifiable goals. Reference for scope, simplicity, assumptions, or success-criteria judgment calls. Use when: "am I overcomplicating this", "what are my assumptions", "how do I verify success", "/karpathy", "/principles".
Audit CI gates, strengthen weak coverage, then drive green. Dagger owns the canonical pipeline; missing Dagger is auto-scaffolded. Acts directly on mechanical fixes and never returns red without structured diagnosis. Use when: "run ci", "check ci", "fix ci", "audit ci", "is ci passing", "run the gates", "dagger check", "why is ci failing", "strengthen ci", "tighten ci", "ci is red", "gates failing". Trigger: /ci, /gates.
Capture evidence for what changed: blurb, paste, screenshot, GIF, video, launch note, or repo-local demo skill. Pick by change shape, audience, and budget. Use when: "make a demo", "record walkthrough", "PR evidence", "upload screenshots", "show what changed", "release notes blurb", "scaffold demo", "generate demo skill". Trigger: /demo.
Redact and package local agent-session excerpts for PRs, issues, receipts, or review evidence. Use when: "add agent transcript", "attach session proof", "show agent provenance", "redact transcript", "PR transcript". Trigger: /agent-transcript.
Branch-aware simplification. On feature branches, compare against base and simplify the diff before merge. On primary branch, scan for the highest impact deletion, consolidation, or boundary improvement. Use when: "refactor this", "simplify this diff", "clean this up", "reduce complexity", "pay down tech debt", "make this easier to maintain", "make this more elegant", "reduce the number of states", "clarify naming". Trigger: /refactor.
DEPRECATED redirect. /settle, /pr-fix, and /pr-polish now route to /deliver --polish-only <branch|PR> — the single owner of "existing branch -> merge-ready" (backlog 080 collapsed settle into deliver). Slated for deletion next release. Use when: muscle-memory "polish this", "address PR reviews", "get this merge-ready" — then run /deliver --polish-only. Trigger: /settle, /pr-fix, /pr-polish.
Four LLM-agent guardrails: surface assumptions, prefer simplicity, make surgical changes, and drive by verifiable goals. Reference for scope, simplicity, assumptions, or success-criteria judgment calls. Use when: "am I overcomplicating this", "what are my assumptions", "how do I verify success", "/karpathy", "/principles".
End-to-end "ship it to the remote": read worktree, classify changes, tidy debris, split semantic commits, and push. Judgment layer over git, not a wrapper; decides what belongs and how reviewers should read the diff. Use when: "yeet", "yeet this", "commit and push", "ship local changes", "tidy and commit", "wrap this up and push", "get this off my machine". Trigger: /yeet, /ship-local (alias).
Redact and package local agent-session excerpts for PRs, issues, receipts, or review evidence. Use when: "add agent transcript", "attach session proof", "show agent provenance", "redact transcript", "PR transcript". Trigger: /agent-transcript.
Pick browser automation for web/Electron: CI E2E, scripted flows, scraping, visual regression, exploratory QA, persona walks, monitoring, or browser agents. Deterministic Playwright first; harden exploratory findings into repeatable tests. Use for "automate the browser", "test this web app", "test this electron app", "Playwright or Stagehand", "scrape this site", "browser agent", "visual regression", "E2E tests". Trigger: /browser.
Investigate, audit, triage, and fix. Systematic debugging, incident lifecycle, domain auditing, and issue logging. Four-phase protocol: root cause → pattern analysis → hypothesis test → fix. Use for: any bug, test failure, production incident, error spikes, audit, triage, postmortem, "diagnose", "why is this broken", "debug this", "production down", "is production ok", "audit stripe", "log issues". Trigger: /diagnose.
Branch-aware simplification. On feature branches, compare against base and simplify the diff before merge. On primary branch, scan for the highest impact deletion, consolidation, or boundary improvement. Use when: "refactor this", "simplify this diff", "clean this up", "reduce complexity", "pay down tech debt", "make this easier to maintain", "make this more elegant", "reduce the number of states", "clarify naming". Trigger: /refactor.
DEPRECATED redirect. /settle, /pr-fix, and /pr-polish now route to /deliver --polish-only <branch|PR> — the single owner of "existing branch -> merge-ready" (backlog 080 collapsed settle into deliver). Slated for deletion next release. Use when: muscle-memory "polish this", "address PR reviews", "get this merge-ready" — then run /deliver --polish-only. Trigger: /settle, /pr-fix, /pr-polish.
End-to-end "ship it to the remote": read worktree, classify changes, tidy debris, split semantic commits, and push. Judgment layer over git, not a wrapper; decides what belongs and how reviewers should read the diff. Use when: "yeet", "yeet this", "commit and push", "ship local changes", "tidy and commit", "wrap this up and push", "get this off my machine". Trigger: /yeet, /ship-local (alias).
Session retrospective, operator coaching, harness postmortem, codification, and outer-loop cycle critique. Turns evidence into hooks, rules, skills, backlog mutations, or explicit non-actions. Use when: "done", "wrap up", "what did we learn", "retro", "calibrate", "prompt better", "teach me from this session", "reflect on cycle", post-/flywheel critique. Trigger: /reflect, /retro, /calibrate, /reflect checkpoint <topic>, /reflect cycle <cycle-ulid>.
Inner-loop composer for one backlog item to merge-ready code. Composes /shape, /implement, /code-review, /ci, /refactor, and /qa; stops before push, merge, or deploy. Emits receipt.json plus operator brief + /reflect. Use for "deliver this", "make it merge-ready", shaped-ticket builds, and `--polish-only <branch|PR>` for existing branch/PR cleanup. Trigger: /deliver.
Harness engineering for Harness Kit primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "harness engineering", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "eval skill", "sync primitives", "roster defaults". Trigger: /harness-engineering, /harness, /skill.
Web research, multi-AI delegation, and multi-perspective validation. /research [query], /research delegate [task], /research thinktank [topic]. Triggers: "search for", "look up", "research", "delegate", "get perspectives", "web search", "find out", "investigate", "introspect", "session analysis", "check readwise", "saved articles", "reading list", "highlights", "what are people saying", "X search", "social sentiment", "trending".
Verify the running thing works. Browser walks for web, request replay for APIs, shell smoke for CLIs, consumer builds for libraries, tool-call replay for MCP. "Tests pass" is not QA. Use when: "run QA", "verify the feature", "test this", "check the app", "exploratory test", "QA this PR", "smoke test", "manual testing", "capture evidence". For generated repo QA skills, use /create-repo-skill qa. Trigger: /qa.
Assess and improve codebase readiness for AI coding agents across style, tests, docs, architecture, CI, observability, security, and dev setup. Produces a scored report, prioritizes remediation, then executes the highest-impact fixes. Use when: "agent readiness", "is this codebase agent-ready", "readiness report", "make this codebase agent-friendly", "agent-ready assessment", "readiness audit", "prepare for agents". Trigger: /agent-readiness, /readiness.
Assess and improve codebase readiness for AI coding agents across style, tests, docs, architecture, CI, observability, security, and dev setup. Produces a scored report, prioritizes remediation, then executes the highest-impact fixes. Use when: "agent readiness", "is this codebase agent-ready", "readiness report", "make this codebase agent-friendly", "agent-ready assessment", "readiness audit", "prepare for agents". Trigger: /agent-readiness, /readiness.
Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.
Inner-loop composer for one backlog item to merge-ready code. Composes /shape, /implement, /code-review, /ci, /refactor, and /qa; stops before push, merge, or deploy. Emits receipt.json plus operator brief + /reflect. Use for "deliver this", "make it merge-ready", shaped-ticket builds, and `--polish-only <branch|PR>` for existing branch/PR cleanup. Trigger: /deliver.
Artifact-backed interface critique and polish for hierarchy, typography, layout, density, IA, interaction feel, content, brand fit, and taste. Requires screenshot, URL, rendered artifact, or explicit file plus intent. Use when: "make this look better", "improve the design", "polish the UI", "critique this screen", "design pass", "art direction", "scaffold design". Trigger: /design.
Harness engineering for Harness Kit primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "harness engineering", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "eval skill", "sync primitives", "roster defaults". Trigger: /harness-engineering, /harness, /skill.
Verify the running thing works. Browser walks for web, request replay for APIs, shell smoke for CLIs, consumer builds for libraries, tool-call replay for MCP. "Tests pass" is not QA. Use when: "run QA", "verify the feature", "test this", "check the app", "exploratory test", "QA this PR", "smoke test", "manual testing", "capture evidence". For generated repo QA skills, use /create-repo-skill qa. Trigger: /qa.
Session retrospective, operator coaching, harness postmortem, codification, and outer-loop cycle critique. Turns evidence into hooks, rules, skills, backlog mutations, or explicit non-actions. Use when: "done", "wrap up", "what did we learn", "retro", "calibrate", "prompt better", "teach me from this session", "reflect on cycle", post-/flywheel critique. Trigger: /reflect, /retro, /calibrate, /reflect checkpoint <topic>, /reflect cycle <cycle-ulid>.
Web research, multi-AI delegation, and multi-perspective validation. /research [query], /research delegate [task], /research thinktank [topic]. Triggers: "search for", "look up", "research", "delegate", "get perspectives", "web search", "find out", "investigate", "introspect", "session analysis", "check readwise", "saved articles", "reading list", "highlights", "what are people saying", "X search", "social sentiment", "trending".
Outer-loop shipping orchestrator. Composes /shape, /implement, /yeet, /deliver --polish-only, /ship, and /monitor per backlog item. Closure (archive, reflect, harness routing) lives in /ship; flywheel does not invoke /reflect directly. Use when: "flywheel", "run the outer loop", "next N items", "overnight queue", "cycle". Trigger: /flywheel.
Parallel multi-agent code review. Launch reviewer team, synthesize findings, auto-fix blocking issues, loop until clean. Use when: "review this", "code review", "is this ready to ship", "check this code", "review my changes". Trigger: /code-review, /review.
Build and repair Spellbook primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "eval skill", "sync primitives", "roster defaults". Trigger: /harness, /skill, /primitive.
Parallel multi-agent code review. Launch reviewer team, synthesize findings, auto-fix blocking issues, loop until clean. Use when: "review this", "code review", "is this ready to ship", "check this code", "review my changes". Trigger: /code-review, /review.
Build and repair Spellbook primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "eval skill", "sync primitives", "roster defaults". Trigger: /harness, /skill, /primitive.
Shape a raw idea into something buildable. Product + technical exploration. Spec, design, critique, plan. Output is a context packet. Use when: "shape this", "write a spec", "design this feature", "plan this", "spec out", "context packet", "technical design". Trigger: /shape, /spec, /plan, /cp.
Capture evidence for what changed: blurb, paste, screenshot, GIF, video, launch note, or repo-local demo skill. Pick by change shape, audience, and budget. Use when: "make a demo", "record walkthrough", "PR evidence", "upload screenshots", "show what changed", "release notes blurb", "scaffold demo", "generate demo skill". Trigger: /demo.
Final mile from merge-ready branch to shipped: squash-merge, archive backlog tickets with trailers, update touched docs, run /reflect, apply outputs. Assumes /deliver or /deliver --polish-only already made the branch ready. Use when: "ship it", "merge and close out", "final mile", "land and reflect", "finish this ticket". Trigger: /ship.
Vendor the system-wide Harness Kit catalog into the current repo when a project needs checked-in local copies instead of relying on global bootstrap symlinks. Copies first-party skills and agents into a repo-local shared skill layer, then bridges harness-specific entrypoints back to that shared copy. Use when: "seed this repo", "vendor harness kit here", "initialize the agent here", "set me up offline". Trigger: /seed.
Vendor the system-wide Harness Kit catalog into the current repo when a project needs checked-in local copies instead of relying on global bootstrap symlinks. Copies first-party skills and agents into a repo-local shared skill layer, then bridges harness-specific entrypoints back to that shared copy. Use when: "seed this repo", "vendor harness kit here", "initialize the agent here", "set me up offline". Trigger: /seed.
Generate a repository-local skill from live repo discovery and user intent. Use when: "create QA skill", "generate repo skill", "make a local skill", "persona acceptance skill", "value proposition QA", "bespoke repo QA", "scaffold local skill". Creates concrete `.agents/skills/<name>/` guidance with harness-specific bridges when useful. Trigger: /create-repo-skill, /repo-skill, /create-qa-skill.
Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.
Watch signals after deploy, release, local run, CI change, or repeated workflow. Uses telemetry when present; otherwise healthchecks, logs, CI, flaky tests, benchmark drift, daemon output, regressions, or agent traces. Emits events, escalates to /diagnose on trip, closes on green. Use when: "monitor signals", "watch the deploy", "watch production", "watch CI", "watch logs", "watch benchmark drift", "is it healthy". Trigger: /monitor.
Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.
Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.
Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.
Ship merged code to one deploy target. Thin router: detect target, run the platform recipe, capture receipt (sha, version, URL, rollback handle), stop when healthy. Does not monitor, triage, or decide whether to deploy. Use when: "deploy", "deploy to prod", "release", "push to staging", "deploy this branch", "release cut". Trigger: /deploy, /release.
Agent-driven test hardening for functions, specs, and acceptance surfaces: property testing, mutation testing, acceptance mutation, CRAP/SCRAP risk, and structural DRY analysis. Use when: "property test this", "mutation test", "CRAP analysis", "DRY analysis", "harden tests", "find weak tests", "kill surviving mutants", "acceptance mutation", "uncle bob hardening". Trigger: /hardening.
Always-on backlog grooming. Tidy, brainstorm, interrogate, investigate, research, and simplify in a single loop. Tidy is not a mode — it happens every time. Strategic-layer work fans out parallel interrogation, design-critique, technical-review, and research lanes. Use when: "groom", "what should we build", "rethink this", "biggest opportunity", "backlog", "prioritize", "backlog session", "audit skills", "skill quality audit". Trigger: /groom, /groom audit, /backlog, /rethink, /moonshot, /scaffold.
Atomic TDD build skill. Takes a context packet (shaped ticket) and produces code + tests on a feature branch. Red → Green → Refactor. Does not shape, review, QA, or ship — single concern: spec to green tests. Use when: "implement this spec", "build this", "TDD this", "code this up", "write the code for this ticket", after /shape has produced a context packet. Trigger: /implement, /build (alias).
Generate a repository-local skill from live repo discovery and user intent. Use when: "create QA skill", "generate repo skill", "make a local skill", "persona acceptance skill", "value proposition QA", "bespoke repo QA", "scaffold local skill". Creates concrete `.agents/skills/<name>/` guidance with harness-specific bridges when useful. Trigger: /create-repo-skill, /repo-skill, /create-qa-skill.
Ship merged code to one deploy target. Thin router: detect target, run the platform recipe, capture receipt (sha, version, URL, rollback handle), stop when healthy. Does not monitor, triage, or decide whether to deploy. Use when: "deploy", "deploy to prod", "release", "push to staging", "deploy this branch", "release cut". Trigger: /deploy, /release.
Always-on backlog grooming. Tidy, brainstorm, interrogate, investigate, research, and simplify in a single loop. Tidy is not a mode — it happens every time. Strategic-layer work fans out parallel interrogation, design-critique, technical-review, and research lanes. Use when: "groom", "what should we build", "rethink this", "biggest opportunity", "backlog", "prioritize", "backlog session", "audit skills", "skill quality audit". Trigger: /groom, /groom audit, /backlog, /rethink, /moonshot, /scaffold.
Agent-driven test hardening for functions, specs, and acceptance surfaces: property testing, mutation testing, acceptance mutation, CRAP/SCRAP risk, and structural DRY analysis. Use when: "property test this", "mutation test", "CRAP analysis", "DRY analysis", "harden tests", "find weak tests", "kill surviving mutants", "acceptance mutation", "uncle bob hardening". Trigger: /hardening.
Atomic TDD build skill. Takes a context packet (shaped ticket) and produces code + tests on a feature branch. Red → Green → Refactor. Does not shape, review, QA, or ship — single concern: spec to green tests. Use when: "implement this spec", "build this", "TDD this", "code this up", "write the code for this ticket", after /shape has produced a context packet. Trigger: /implement, /build (alias).
Watch signals after deploy, release, local run, CI change, or repeated workflow. Uses telemetry when present; otherwise healthchecks, logs, CI, flaky tests, benchmark drift, daemon output, regressions, or agent traces. Emits events, escalates to /diagnose on trip, closes on green. Use when: "monitor signals", "watch the deploy", "watch production", "watch CI", "watch logs", "watch benchmark drift", "is it healthy". Trigger: /monitor.
Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.
Final mile from merge-ready branch to shipped: squash-merge, archive backlog tickets with trailers, update touched docs, run /reflect, apply outputs. Assumes /deliver or /deliver --polish-only already made the branch ready. Use when: "ship it", "merge and close out", "final mile", "land and reflect", "finish this ticket". Trigger: /ship.
Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.
Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.
YC-partner-style interrogation of a raw idea. Six forcing questions before you shape anything: demand reality, status quo, desperate specificity, narrowest wedge, observation & surprise, future-fit. Operationalizes the AGENTS.md "Diverge Before You Converge" doctrine at the ideation stage — problem-diamond, pre-/shape. Use when: user arrives with a rough idea, backlog item is fuzzy, you can't already write a one-sentence goal with a testable outcome, or the phrase "what should we build" would be useful. Trigger: /office-hours, /oh, /interrogate.
Dialectical premise-and-alternatives audit for a plan, spec, or context packet. Four moves: premise challenge (is this the right problem?), mandatory structurally-distinct alternatives, cross-model outside voice, user-ratified convergence. Operationalizes the AGENTS.md "Diverge Before You Converge" doctrine at the plan-review stage. Use when: about to commit to a plan/spec/design, reviewing a ticket before shape, when "is this the right problem" would be useful, or any time a proposal smells like a symptom fix instead of a root-cause fix. Trigger: /ceo-review, /challenge, /premise-check.
Tailor this repository's harness. Explore the repo, read prior session history, browse the spellbook catalog, and install a per-repo set of skills into a shared repo-local skill layer, with harness-specific entrypoints bridged back to that shared copy. Workflow skills get rewritten with this repo's commands and conventions embedded throughout — not a generic body with a repo-notes appendix. Use when: "tailor this repo", "configure the agent for this codebase", "set up a harness", "what skills apply here". Trigger: /tailor.
Tailor this repository's harness. Explore the repo, read prior session history, browse the spellbook catalog, and install a per-repo set of skills into a shared repo-local skill layer, with harness-specific entrypoints bridged back to that shared copy. Workflow skills get rewritten with this repo's commands and conventions embedded throughout — not a generic body with a repo-notes appendix. Use when: "tailor this repo", "configure the agent for this codebase", "set up a harness", "what skills apply here". Trigger: /tailor.
YC-partner-style interrogation of a raw idea. Six forcing questions before you shape anything: demand reality, status quo, desperate specificity, narrowest wedge, observation & surprise, future-fit. Operationalizes the AGENTS.md "Diverge Before You Converge" doctrine at the ideation stage — problem-diamond, pre-/shape. Use when: user arrives with a rough idea, backlog item is fuzzy, you can't already write a one-sentence goal with a testable outcome, or the phrase "what should we build" would be useful. Trigger: /office-hours, /oh, /interrogate.
Dialectical premise-and-alternatives audit for a plan, spec, or context packet. Four moves: premise challenge (is this the right problem?), mandatory structurally-distinct alternatives, cross-model outside voice, user-ratified convergence. Operationalizes the AGENTS.md "Diverge Before You Converge" doctrine at the plan-review stage. Use when: about to commit to a plan/spec/design, reviewing a ticket before shape, when "is this the right problem" would be useful, or any time a proposal smells like a symptom fix instead of a root-cause fix. Trigger: /ceo-review, /challenge, /premise-check.
Research, compare, and select LLM models for AI-powered apps and workflows. Finds latest models per family, verifies availability on target platform, compares pricing/benchmarks/tool-calling, produces ranked recommendations. Use when: "which model", "compare models", "find a model", "model research", "best model for", "cheapest model", "tool calling models", "model selection", "upgrade model", "swap model", "fallback models", "model chain".
Evolve the Spellbook library. Scan external sources for new skills worth indexing, review observations for improvement opportunities, brainstorm new primitives, investigate existing skills for consolidation or deletion, research power user patterns and best practices. Use when: "curate", "evolve spellbook", "what should we build", "find new skills", "audit the library", "consolidate skills", "what's new in the ecosystem", "spellbook maintenance", "improve primitives".
Outer-loop delivery orchestrator. Composes cycles of /deliver → /deploy → /monitor → /investigate → /reflect, mutates the backlog, and emits harness suggestions to a branch. Inner loop is /deliver (one ticket → merge-ready, a black box here). Outer loop is this: continuous, unattended, budgeted. Use when: continuous delivery, "autopilot", "run the outer loop", "next N items", "overnight queue", "outer loop", "cycle". Trigger: /autopilot.
Accessibility audit, remediation, and verification. WCAG 2.2 AA compliance. Three-agent protocol: audit (find issues) → remediate (fix them) → critique (verify fixes). Use when: "accessibility audit", "a11y", "WCAG", "screen reader", "keyboard navigation", "contrast check", "aria fix", "accessibility sprint", "audit accessibility", "fix accessibility", "a11y issues", "a11y check". Trigger: /a11y
Evolve the Spellbook library. Scan external sources for new skills worth indexing, review observations for improvement opportunities, brainstorm new primitives, investigate existing skills for consolidation or deletion, research power user patterns and best practices. Use when: "curate", "evolve spellbook", "what should we build", "find new skills", "audit the library", "consolidate skills", "what's new in the ecosystem", "spellbook maintenance", "improve primitives".
Accessibility audit, remediation, and verification. WCAG 2.2 AA compliance. Three-agent protocol: audit (find issues) → remediate (fix them) → critique (verify fixes). Use when: "accessibility audit", "a11y", "WCAG", "screen reader", "keyboard navigation", "contrast check", "aria fix", "accessibility sprint", "audit accessibility", "fix accessibility", "a11y issues", "a11y check". Trigger: /a11y
Evolve the Spellbook library. Scan external sources for new skills worth indexing, review observations for improvement opportunities, brainstorm new primitives, investigate existing skills for consolidation or deletion, research power user patterns and best practices. Use when: "curate", "evolve spellbook", "what should we build", "find new skills", "audit the library", "consolidate skills", "what's new in the ecosystem", "spellbook maintenance", "improve primitives".
Research, compare, and select LLM models for AI-powered apps and workflows. Finds latest models per family, verifies availability on target platform, compares pricing/benchmarks/tool-calling, produces ranked recommendations. Use when: "which model", "compare models", "find a model", "model research", "best model for", "cheapest model", "tool calling models", "model selection", "upgrade model", "swap model", "fallback models", "model chain".
Analyze, test, and upgrade dependencies. One curated PR, not 47 version bumps. Reachability analysis, behavioral diffs, risk assessment. Package-manager agnostic. Use when: "upgrade deps", "dependency audit", "check for updates", "outdated packages", "security audit deps", "update dependencies", "vulnerable dependencies", "deps". Trigger: /deps.
Evolve the Spellbook library. Scan external sources for new skills worth indexing, review observations for improvement opportunities, brainstorm new primitives, investigate existing skills for consolidation or deletion, research power user patterns and best practices. Use when: "curate", "evolve spellbook", "what should we build", "find new skills", "audit the library", "consolidate skills", "what's new in the ecosystem", "spellbook maintenance", "improve primitives".
Per-repo skill subset selector. Reads the current project, reads the spellbook catalog, picks 10-20 skills that fit this repo, writes the selection to .spellbook.yaml so bootstrap only symlinks that subset. Use when: "tailor skills", "select skills", "prune skills", "tailor catalog for this repo", "scope skills to this repo", "reduce context bloat". Trigger: /tailor-skills.
Analyze, test, and upgrade dependencies. One curated PR, not 47 version bumps. Reachability analysis, behavioral diffs, risk assessment. Package-manager agnostic. Use when: "upgrade deps", "dependency audit", "check for updates", "outdated packages", "security audit deps", "update dependencies", "vulnerable dependencies", "deps". Trigger: /deps.
Investigate, audit, triage, and fix. Systematic debugging, incident lifecycle, domain auditing, and issue logging. Four-phase protocol: root cause → pattern analysis → hypothesis test → fix. Use for: any bug, test failure, production incident, error spikes, audit, triage, postmortem, "investigate", "why is this broken", "debug this", "production down", "is production ok", "audit stripe", "log issues".