Skip to main content

About Getting Started

Verify Personas AI News Submit Get In Touch

Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub

Goose Amp Cursor Claude Code

Letta OpenCode Claude OpenAI Codex

Factory VS Code Gemini CLI GitHub

Goose Amp Cursor Claude Code

Letta OpenCode Claude OpenAI Codex

Get in touch

Let's build the future of AI skills together

We're building this out of love for the community and would really love your feedback, suggestions, and ideas. Whether you're a skill author, an enterprise team, or just curious — we want to hear from you.

Community-driven

Built by developers, for developers. Your feedback directly shapes the platform.

Security-first

Every skill passes a 4-layer security scan before it's published.

Open ecosystem

Contribute skills, report issues, or suggest new features on GitHub.

Stockholm, Sweden

Name

Email

Phone (optional)

Message

The marketplace for AI agent skills. Discover, download, and share.

Browse

All Skills
Blog
Publishers
Personas
Trending This Week
Security Scanning
Submit a Skill
Claude Code vs Cursor
Best Skills for Cursor
Python Skills
TypeScript Skills
Token Economics

Popular

Skills for Claude Code
Skills for Cursor
Skills for Codex CLI
Skills for Windsurf
Skills for Gemini CLI
Skills for Copilot
Skills for Product Managers
Skills for Superpower
Skills for Graphify

Top Skills

Anthropic Skills
OpenAI Skills
Cloudflare Skills
Vercel Skills
Hugging Face Skills
Most Installed

About

About Us
Docs
Getting Started
News
Blog
GitHub
Discord

© 2026 SkillsAuth. All rights reserved.

Terms of Service Privacy

Home/Skills/phrazzld

phrazzld

120 verified skills1,531 total stars

github.com/phrazzld

research

Web research, multi-AI delegation, and multi-perspective validation. /research [query], /research delegate [task]. Use when: "search for", "look up", "research", "delegate", "get perspectives", "web search", "find out", "investigate", "introspect", "check readwise", "saved articles", "reading list", "what are people saying", "X search", "trending", "which model", "compare models", "best model for", "model selection".

showcase

Turn a working repo into credible external-facing proof: demoability audit, deterministic demo path, marketing site, case study, screenshots, demo video, launch copy, and consulting portfolio assets. Use when: "productize this", "make this demoable", "make this polished", "make a marketing site", "show this off", "demo video", "case study", "portfolio piece", "consulting asset", "launch page", "sales demo". Trigger: /showcase, /productize, /demoability.

flywheel

Outer-loop shipping orchestrator. Composes /shape, /implement, /yeet, /deliver --polish-only, /ship, and /monitor per backlog item. Closure (archive, reflect, harness routing) lives in /ship; flywheel does not invoke /reflect directly. Use when: "flywheel", "run the outer loop", "next N items", "overnight queue", "cycle". Trigger: /flywheel.

demo

Capture evidence for what changed: blurb, paste, screenshot, GIF, video, launch note, or repo-local demo skill. Pick by change shape, audience, and budget. Use when: "make a demo", "record walkthrough", "PR evidence", "upload screenshots", "show what changed", "release notes blurb", "scaffold demo", "generate demo skill". Trigger: /demo.

content-media13

settle

DEPRECATED redirect. /settle, /pr-fix, and /pr-polish now route to /deliver --polish-only <branch|PR> — the single owner of "existing branch -> merge-ready" (backlog 080 collapsed settle into deliver). Slated for deletion next release. Use when: muscle-memory "polish this", "address PR reviews", "get this merge-ready" — then run /deliver --polish-only. Trigger: /settle, /pr-fix, /pr-polish.

browser

Pick browser automation for web/Electron: CI E2E, scripted flows, scraping, visual regression, exploratory QA, persona walks, monitoring, or browser agents. Deterministic Playwright first; harden exploratory findings into repeatable tests. Use for "automate the browser", "test this web app", "test this electron app", "Playwright or Stagehand", "scrape this site", "browser agent", "visual regression", "E2E tests". Trigger: /browser.

agent-transcript

Redact and package local agent-session excerpts for PRs, issues, receipts, or review evidence. Use when: "add agent transcript", "attach session proof", "show agent provenance", "redact transcript", "PR transcript". Trigger: /agent-transcript.

karpathy-guidelines

Four LLM-agent guardrails: surface assumptions, prefer simplicity, make surgical changes, and drive by verifiable goals. Reference for scope, simplicity, assumptions, or success-criteria judgment calls. Use when: "am I overcomplicating this", "what are my assumptions", "how do I verify success", "/karpathy", "/principles". Trigger: /karpathy, /principles.

create-repo-skill

Generate a repository-local skill from live repo discovery and user intent. Use when: "create QA skill", "generate repo skill", "make a local skill", "persona acceptance skill", "value proposition QA", "bespoke repo QA", "scaffold local skill". Creates concrete `.agents/skills/<name>/` guidance with harness-specific bridges when useful. Trigger: /create-repo-skill, /repo-skill, /create-qa-skill.

debrief

Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.

deps

Analyze, test, and upgrade dependencies. One curated PR, not 47 version bumps. Reachability analysis, behavioral diffs, risk assessment. Package-manager agnostic. Use when: "upgrade deps", "dependency audit", "check for updates", "outdated packages", "security audit deps", "update dependencies", "vulnerable dependencies", "deps". Trigger: /deps.

deploy

Ship merged code to one deploy target. Thin router: detect target, run the platform recipe, capture receipt (sha, version, URL, rollback handle), stop when healthy. Does not monitor, triage, or decide whether to deploy. Use when: "deploy", "deploy to prod", "release", "push to staging", "deploy this branch", "release cut". Trigger: /deploy, /release.

flywheel

Outer-loop shipping orchestrator. Composes /shape, /implement, /yeet, /deliver --polish-only, /ship, and /monitor per backlog item. Closure (archive, reflect, harness routing) lives in /ship; flywheel does not invoke /reflect directly. Use when: "flywheel", "run the outer loop", "next N items", "overnight queue", "cycle". Trigger: /flywheel.

implement

Atomic TDD build skill. Takes a context packet (shaped ticket) and produces code + tests on a feature branch. Red → Green → Refactor. Does not shape, review, QA, or ship — single concern: spec to green tests. Use when: "implement this spec", "build this", "TDD this", "code this up", "write the code for this ticket", after /shape has produced a context packet. Trigger: /implement, /build (alias).

hardening

Agent-driven test hardening for functions, specs, and acceptance surfaces: property testing, mutation testing, acceptance mutation, CRAP/SCRAP risk, and structural DRY analysis. Use when: "property test this", "mutation test", "CRAP analysis", "DRY analysis", "harden tests", "find weak tests", "kill surviving mutants", "acceptance mutation", "uncle bob hardening". Trigger: /hardening.

model-research

Research, compare, and select LLM models for AI-powered apps and workflows. Finds latest models per family, verifies availability on target platform, compares pricing/benchmarks/tool-calling, produces ranked recommendations. Use when: "which model", "compare models", "find a model", "model research", "best model for", "cheapest model", "tool calling models", "model selection", "upgrade model", "swap model", "fallback models", "model chain". Trigger: /model-research.

agent-readiness

Assess and improve codebase readiness for AI coding agents across style, tests, docs, architecture, CI, observability, security, and dev setup. Produces a scored report, prioritizes remediation, then executes the highest-impact fixes. Use when: "agent readiness", "is this codebase agent-ready", "readiness report", "make this codebase agent-friendly", "agent-ready assessment", "readiness audit", "prepare for agents". Trigger: /agent-readiness, /readiness.

monitor

Watch signals after deploy, release, local run, CI change, or repeated workflow. Uses telemetry when present; otherwise healthchecks, logs, CI, flaky tests, benchmark drift, daemon output, regressions, or agent traces. Emits events, escalates to /diagnose on trip, closes on green. Use when: "monitor signals", "watch the deploy", "watch production", "watch CI", "watch logs", "watch benchmark drift", "is it healthy". Trigger: /monitor.

a11y

Accessibility audit, remediation, and verification. WCAG 2.2 AA compliance. Three-agent protocol: audit (find issues) → remediate (fix them) → critique (verify fixes). Use when: "accessibility audit", "a11y", "WCAG", "screen reader", "keyboard navigation", "contrast check", "aria fix", "accessibility sprint", "audit accessibility", "fix accessibility", "a11y issues", "a11y check". Trigger: /a11y

ship

Final mile from current committed work to shipped: get the work onto master, archive backlog tickets with trailers, update touched docs, run /reflect, apply outputs. Recovers detached HEAD or headless work by creating a shipping ref before squash-merging to master. Use when: "ship it", "merge and close out", "final mile", "land and reflect", "finish this ticket". Trigger: /ship.

documentation13

trace

Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.

yeet

End-to-end "ship it to the remote": read worktree, classify changes, tidy debris, split semantic commits, and push. Judgment layer over git, not a wrapper; decides what belongs and how reviewers should read the diff. Use when: "yeet", "yeet this", "commit and push", "ship local changes", "tidy and commit", "wrap this up and push", "get this off my machine". Trigger: /yeet, /ship-local (alias).

agent-transcript

Redact and package local agent-session excerpts for PRs, issues, receipts, or review evidence. Use when: "add agent transcript", "attach session proof", "show agent provenance", "redact transcript", "PR transcript". Trigger: /agent-transcript.

a11y

Accessibility audit, remediation, and verification. WCAG 2.2 AA compliance. Three-agent protocol: audit (find issues) → remediate (fix them) → critique (verify fixes). Use when: "accessibility audit", "a11y", "WCAG", "screen reader", "keyboard navigation", "contrast check", "aria fix", "accessibility sprint", "audit accessibility", "fix accessibility", "a11y issues", "a11y check". Trigger: /a11y

browser

Pick browser automation for web/Electron: CI E2E, scripted flows, scraping, visual regression, exploratory QA, persona walks, monitoring, or browser agents. Deterministic Playwright first; harden exploratory findings into repeatable tests. Use for "automate the browser", "test this web app", "test this electron app", "Playwright or Stagehand", "scrape this site", "browser agent", "visual regression", "E2E tests". Trigger: /browser.

create-repo-skill

Generate a repository-local skill from live repo discovery and user intent. Use when: "create QA skill", "generate repo skill", "make a local skill", "persona acceptance skill", "value proposition QA", "bespoke repo QA", "scaffold local skill". Creates concrete `.agents/skills/<name>/` guidance with harness-specific bridges when useful. Trigger: /create-repo-skill, /repo-skill, /create-qa-skill.

debrief

Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.

deploy

Ship merged code to one deploy target. Thin router: detect target, run the platform recipe, capture receipt (sha, version, URL, rollback handle), stop when healthy. Does not monitor, triage, or decide whether to deploy. Use when: "deploy", "deploy to prod", "release", "push to staging", "deploy this branch", "release cut". Trigger: /deploy, /release.

deps

Analyze, test, and upgrade dependencies. One curated PR, not 47 version bumps. Reachability analysis, behavioral diffs, risk assessment. Package-manager agnostic. Use when: "upgrade deps", "dependency audit", "check for updates", "outdated packages", "security audit deps", "update dependencies", "vulnerable dependencies", "deps". Trigger: /deps.

showcase

Turn a working repo into credible external-facing proof: demoability audit, deterministic demo path, marketing site, case study, screenshots, demo video, launch copy, and consulting portfolio assets. Use when: "productize this", "make this demoable", "make this polished", "make a marketing site", "show this off", "demo video", "case study", "portfolio piece", "consulting asset", "launch page", "sales demo". Trigger: /showcase, /productize, /demoability.

oracle

Browser-mode-only Oracle consults: bundle a prompt plus selected files and ask a signed-in ChatGPT GPT-5.5 Pro browser session for a second opinion. Use when stuck, debugging hard bugs, reviewing an architecture plan, or cross-checking a substantive diff with large file context. Never use Oracle API mode from Harness Kit. Trigger: /oracle, /consult.

ci

Audit, design, and run repo-owned CI gates. Host-agnostic by default: local, GitHub Actions, Azure, or another runner should call the same repo-owned contract. Harness Kit's own gate is the Rust command `cargo run --locked -p harness-kit-checks -- check --repo .`; consumer repos keep their own native gate. Use when: "run ci", "check ci", "fix ci", "audit ci", "design CI", "host-agnostic CI", "Dagger", "is ci passing", "run the gates", "why is ci failing", "strengthen ci", "tighten ci", "ci is red", "gates failing", "feedback loop is slow", "run gates less often". Trigger: /ci, /gates.

orient

Fast session-start repository orientation from live local evidence. Use when: "orient yourself", "start of session", "new session", "where are we", "catch me up before acting", after compaction, after switching worktrees, or before choosing a Harness Kit workflow. Trigger: /orient, /ground, /session-start.

todoist

Manage the operator's Todoist as the system of record for tasks, reminders, and follow-ups — capture, triage, complete, and organize. Agent-native via the Todoist MCP; scriptable via the `td` CLI. Use when: "add this to my todoist", "add a task", "remind me to", "capture this", "what's on my todoist", "what's due today", "what's in my inbox", "mark this done", "create a project/label", "log a follow-up". Trigger: /todoist, /task, /todo.

sprites

Run lane cards on Fly Sprites: remote, isolated, scale-to-zero sandboxes for heavy or parallel agent work. Golden-checkpoint provisioning so lanes start on a ready sprite with zero setup tokens. Use when: "run this on a sprite", "remote lane", "offload to a sandbox", "dispatch to sprites", "bake a sprite", "sprite fleet", heavy/long-running/parallel sub-agent work that should not run on this machine. Trigger: /sprites, /sprite-lane.

factory-apps

Route Misty Step factory application capabilities. Use when choosing, auditing, integrating, or operating Canary, Powder, Landmark, Aesthetic, or Bitterblossom: production observability, incidents, health checks, error logging, backlog/work-card state, release intelligence, UI/UX system adoption, or supervised/unsupervised agent dispatch. Trigger: /factory-apps, /factory-stack.

shape

Shape a raw idea into something buildable. Product + technical exploration. Spec, design, critique, plan. Output is a context packet. Use when: "shape this", "write a spec", "design this feature", "plan this", "spec out", "context packet", "technical design". Trigger: /shape, /spec, /plan, /cp.

code-review

Dispatch-shaped code review: fan the diff out to fresh-context reviewers across diverse providers and model families, synthesize, fix blockers, re-review until clean. Use when: "review this", "code review", "is this ready to ship", "second-model review". Trigger: /code-review, /review.

compound

Capture one compounding repo-technical learning while a solved problem is still fresh. Use when: after a bug fix, diagnosis, delivery, review, or incident reveals a reusable pattern worth adding to `docs/solutions/`. Trigger: /compound, /capture-learning, /learning.

diagnose

Investigate, audit, triage, and fix. Systematic debugging, incident lifecycle, domain auditing, and issue logging. Feedback-loop-first protocol: reproduce or replay before root cause, pattern analysis, hypothesis test, and fix. Use for: any bug, test failure, production incident, error spikes, audit, triage, postmortem, "diagnose", "why is this broken", "debug this", "production down", "is production ok", "audit stripe", "log issues". Trigger: /diagnose.

document

Generate world-class, source-verified reference documentation for a codebase: a multi-agent loop that surveys the repo, plans the information architecture, writes facet-scoped pages, and adversarially verifies every claim against live source before committing markdown + HTML + diagrams to docs/. Always runs the full verify loop; scope is incremental by provenance. Use when: "document this codebase", "generate the docs", "build a codebase wiki", "write architecture docs", "onboarding docs", "documentation site", "keep the docs in sync", "world-class docs". Trigger: /document, /docs, /wiki.

council

Convene a council/thinktank: fan one question out to several DISTINCT high-quality OpenRouter model families (via opencode/pi), each carrying a different generative persona/perspective, then synthesize the divergent thinking as chair. Generative deliberation — brainstorm, explore the option space, weigh tradeoffs, decorrelated ideation, lock a contested direction. Distinct from /roster's adversarial critique bench (that finds bugs in an artifact; this generates and reframes). Use when: "convene a council", "thinktank", "council of models", "brainstorm with different models", "get diverse perspectives", "panel of AIs", "what would different experts think", "divergence pass", "ideate broadly", "stress-test this direction with other models". Trigger: /council, /thinktank.

reflect

Session retrospective, operator coaching, harness postmortem, codification, and outer-loop cycle critique. Turns evidence into hooks, rules, skills, backlog mutations, or explicit non-actions. Use when: "done", "wrap up", "what did we learn", "retro", "calibrate", "prompt better", "teach me from this session", "reflect on cycle", post-/flywheel critique. Trigger: /reflect, /retro, /calibrate, /reflect checkpoint <topic>, /reflect cycle <cycle-ulid>.

agent-readiness

Assess and improve codebase readiness for AI coding agents across style, tests, docs, architecture, CI, observability, security, and dev setup. Produces a scored report, prioritizes remediation, then executes the highest-impact fixes. Use when: "agent readiness", "is this codebase agent-ready", "readiness report", "make this codebase agent-friendly", "agent-ready assessment", "readiness audit", "prepare for agents". Trigger: /agent-readiness, /readiness.

critique

Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.

hardening

Agent-driven test hardening for functions, specs, and acceptance surfaces: property testing, mutation testing, acceptance mutation, CRAP/SCRAP risk, and structural DRY analysis. Use when: "property test this", "mutation test", "CRAP analysis", "DRY analysis", "harden tests", "find weak tests", "kill surviving mutants", "acceptance mutation", "uncle bob hardening". Trigger: /hardening.

implement

Atomic TDD build skill. Takes a context packet (shaped ticket) and produces code + tests on a feature branch. Red → Green → Refactor. Does not shape, review, QA, or ship — single concern: spec to green tests. Use when: "implement this spec", "build this", "TDD this", "code this up", "write the code for this ticket", after /shape has produced a context packet. Trigger: /implement, /build (alias).

karpathy-guidelines

Four LLM-agent guardrails: surface assumptions, prefer simplicity, make surgical changes, and drive by verifiable goals. Reference for scope, simplicity, assumptions, or success-criteria judgment calls. Use when: "am I overcomplicating this", "what are my assumptions", "how do I verify success", "/karpathy", "/principles". Trigger: /karpathy, /principles.

model-research

Research, compare, and select LLM models for AI-powered apps and workflows. Finds latest models per family, verifies availability on target platform, compares pricing/benchmarks/tool-calling, produces ranked recommendations. Use when: "which model", "compare models", "find a model", "model research", "best model for", "cheapest model", "tool calling models", "model selection", "upgrade model", "swap model", "fallback models", "model chain". Trigger: /model-research.

monitor

Watch signals after deploy, release, local run, CI change, or repeated workflow. Uses telemetry when present; otherwise healthchecks, logs, CI, flaky tests, benchmark drift, daemon output, regressions, or agent traces. Emits events, escalates to /diagnose on trip, closes on green. Use when: "monitor signals", "watch the deploy", "watch production", "watch CI", "watch logs", "watch benchmark drift", "is it healthy". Trigger: /monitor.

reflect

Session retrospective, operator coaching, harness postmortem, codification, and outer-loop cycle critique. Turns evidence into hooks, rules, skills, backlog mutations, or explicit non-actions. Use when: "done", "wrap up", "what did we learn", "retro", "calibrate", "prompt better", "teach me from this session", "reflect on cycle", post-/flywheel critique. Trigger: /reflect, /retro, /calibrate, /reflect checkpoint <topic>, /reflect cycle <cycle-ulid>.

settle

DEPRECATED redirect. /settle, /pr-fix, and /pr-polish now route to /deliver --polish-only <branch|PR> — the single owner of "existing branch -> merge-ready" (backlog 080 collapsed settle into deliver). Slated for deletion next release. Use when: muscle-memory "polish this", "address PR reviews", "get this merge-ready" — then run /deliver --polish-only. Trigger: /settle, /pr-fix, /pr-polish.

ship

Final mile from current committed work to shipped: get the work onto master, archive backlog tickets with trailers, update touched docs, run /reflect, apply outputs. Recovers detached HEAD or headless work by creating a shipping ref before squash-merging to master. Use when: "ship it", "merge and close out", "final mile", "land and reflect", "finish this ticket". Trigger: /ship.

documentation13

skillify

Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.

trace

Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.

yeet

End-to-end "ship it to the remote": read worktree, classify changes, tidy debris, split semantic commits, and push. Judgment layer over git, not a wrapper; decides what belongs and how reviewers should read the diff. Use when: "yeet", "yeet this", "commit and push", "ship local changes", "tidy and commit", "wrap this up and push", "get this off my machine". Trigger: /yeet, /ship-local (alias).

code-review

Dispatch-shaped code review: fan the diff out to fresh-context reviewers across diverse providers and model families, synthesize, fix blockers, re-review until clean. Use when: "review this", "code review", "is this ready to ship", "second-model review". Trigger: /code-review, /review.

factory-apps

Route Misty Step factory application capabilities. Use when choosing, auditing, integrating, or operating Canary, Powder, Landmark, Aesthetic, or Bitterblossom: production observability, incidents, health checks, error logging, backlog/work-card state, release intelligence, UI/UX system adoption, or supervised/unsupervised agent dispatch. Trigger: /factory-apps, /factory-stack.

artifact

Produce a consistently-styled, self-contained HTML report served privately over Tailscale. One house template (Silver Age comic-ops palette, dark/light toggle, and a mandatory copy-page button) so every report an agent hands the operator looks and behaves the same. Use when: "make an HTML artifact/report", "serve this over tailscale", "write up a brief/report/dashboard as a page", or any time you'd otherwise dump a long analysis into chat. Trigger: /artifact.

artifact

Produce a consistently-styled, self-contained HTML report served privately over Tailscale. One house template (Silver Age comic-ops palette, dark/light toggle, and a mandatory copy-page button) so every report an agent hands the operator looks and behaves the same. Use when: "make an HTML artifact/report", "serve this over tailscale", "write up a brief/report/dashboard as a page", or any time you'd otherwise dump a long analysis into chat. Trigger: /artifact.

demo

Capture evidence for what changed: blurb, paste, screenshot, GIF, video, launch note, or repo-local demo skill. Pick by change shape, audience, and budget. Use when: "make a demo", "record walkthrough", "PR evidence", "upload screenshots", "show what changed", "release notes blurb", "scaffold demo", "generate demo skill". Trigger: /demo.

content-media13

seed

Vendor the system-wide Harness Kit catalog into the current repo when a project needs checked-in local copies instead of relying on global bootstrap symlinks. Copies first-party skills and agents into a repo-local shared skill layer, then bridges harness-specific entrypoints back to that shared copy. Use when: "seed this repo", "vendor harness kit here", "initialize the agent here", "set me up offline". Trigger: /seed.

dispatch

Compose and launch roster-backed specialist lanes with prompt-native lane cards and receipts. Use when: "dispatch agents", "use subagents", "compose a team", "run provider lanes", "make lane cards". Trigger: /dispatch, /subagents, /lanes.

qa

Verify the running thing works. Browser walks for web, request replay for APIs, local API emulation for supported third-party services, shell smoke for CLIs, consumer builds for libraries, tool-call replay for MCP. "Tests pass" is not QA. Use when: "run QA", "verify the feature", "test this", "check the app", "smoke test", "exploratory test", "capture evidence". Trigger: /qa.

critique

Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.

skillify

Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.

seed

Vendor the system-wide Harness Kit catalog into the current repo when a project needs checked-in local copies instead of relying on global bootstrap symlinks. Copies first-party skills and agents into a repo-local shared skill layer, then bridges harness-specific entrypoints back to that shared copy. Use when: "seed this repo", "vendor harness kit here", "initialize the agent here", "set me up offline". Trigger: /seed.

next

Recommend the best next move from live thread and repo state. Use when: "what's next", "what next", "now what", "what should I do next", "what should we do next", "anything else to do", "where are we now, what's next", "what next in the backlog". Trigger: /next, /what-next, /now-what.

design

Artifact-backed interface design: critique, polish, redesign, generate, and a repo-owned design contract. One front door over a bench of design specialists — routes to exactly one primary per role; you pick the aesthetic. Requires screenshot, URL, rendered artifact, or explicit file plus intent. Use when: "make this look better", "improve the design", "polish the UI", "critique this screen", "design pass", "art direction", "make it premium", "make it brutalist/minimalist", "deslop this", "scaffold design", "DESIGN.md", "design system", "prototype this", "show me a few options", "mock up variations", "is this accessible", docs layout, report polish, generated diagrams/images, dashboards, charts, or any product-facing visual artifact. Trigger: /design, /prototype.

roster

Enumerates the peer AI agent CLIs installed on this machine (codex, pi, goose, opencode, claude, cursor-agent, grok, agy, hermes, oracle) and how to invoke each headlessly. A capability map, not a quota: useful for fresh-context adversarial review on a different model family, second opinions, competing attempts, and wide benches. Use when: "ask codex", "ask another model", "second opinion", "cross-model review", "what AI tools do I have", "other agents", "different model family", "adversarial critique from another provider". Trigger: /roster.

qa

Verify the running thing works. Browser walks for web, request replay for APIs, local API emulation for supported third-party services, shell smoke for CLIs, consumer builds for libraries, tool-call replay for MCP. "Tests pass" is not QA. Use when: "run QA", "verify the feature", "test this", "check the app", "smoke test", "exploratory test", "capture evidence". Trigger: /qa.

roster

Enumerates the peer AI agent CLIs installed on this machine (codex, pi, goose, opencode, claude, cursor-agent, grok, agy, hermes, oracle) and how to invoke each headlessly. A capability map, not a quota: useful for fresh-context adversarial review on a different model family, second opinions, competing attempts, and wide benches. Use when: "ask codex", "ask another model", "second opinion", "cross-model review", "what AI tools do I have", "other agents", "different model family", "adversarial critique from another provider". Trigger: /roster.

design

Artifact-backed interface design: critique, polish, redesign, generate, and a repo-owned design contract. One front door over a bench of design specialists — routes to exactly one primary per role; you pick the aesthetic. Requires screenshot, URL, rendered artifact, or explicit file plus intent. Use when: "make this look better", "improve the design", "polish the UI", "critique this screen", "design pass", "art direction", "make it premium", "make it brutalist/minimalist", "deslop this", "scaffold design", "DESIGN.md", "design system", "prototype this", "show me a few options", "mock up variations", "is this accessible", docs layout, report polish, generated diagrams/images, dashboards, charts, or any product-facing visual artifact. Trigger: /design, /prototype.

shape

Shape a raw idea into something buildable. Product + technical exploration. Spec, design, critique, plan. Output is a context packet. Use when: "shape this", "write a spec", "design this feature", "plan this", "spec out", "context packet", "technical design". Trigger: /shape, /spec, /plan, /cp.

diagnose

Investigate, audit, triage, and fix. Systematic debugging, incident lifecycle, domain auditing, and issue logging. Feedback-loop-first protocol: reproduce or replay before root cause, pattern analysis, hypothesis test, and fix. Use for: any bug, test failure, production incident, error spikes, audit, triage, postmortem, "diagnose", "why is this broken", "debug this", "production down", "is production ok", "audit stripe", "log issues". Trigger: /diagnose.

vision

Create or update root VISION.md as a first-class project north-star artifact. Conversational project interrogation, repo/workspace research, competitive or exemplar scan, lifespan clarification, philosophy distillation, and wiring repo-local harness primitives to read it. Use when: "vision", "vision.md", "project vision", "north star", "what is this project", "clarify product direction", "write/update VISION.md", "project philosophy", "why does this repo exist". Trigger: /vision, /north-star.

documentation13

dispatch

Compose and launch roster-backed specialist lanes with prompt-native lane cards and receipts. Use when: "dispatch agents", "use subagents", "compose a team", "run provider lanes", "make lane cards". Trigger: /dispatch, /subagents, /lanes.

deliver

Take one ticket or idea from raw intent to merge-ready (or shipped, when asked): context-first, docs→tests→code, live QA, refactor at three altitudes, semantic commits, diverse-provider review, adversarial pre-ship thinking. Use for "deliver this", "build this ticket", "make it merge-ready", "take this end to end". Trigger: /deliver.

harness

Build and repair Spellbook primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "eval skill", "sync primitives", "roster defaults". Trigger: /harness, /skill, /primitive.

harness

Build and repair Spellbook primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "eval skill", "sync primitives", "roster defaults". Trigger: /harness, /skill, /primitive.

compound

Capture one compounding repo-technical learning while a solved problem is still fresh. Use when: after a bug fix, diagnosis, delivery, review, or incident reveals a reusable pattern worth adding to `docs/solutions/`. Trigger: /compound, /capture-learning, /learning.

groom

Always-on backlog grooming. Tidy, brainstorm, interrogate, investigate, research, and simplify in a single loop. Tidy is not a mode — it happens every time. Strategic-layer work is a mega-sweep: swarm investigation, external research, critique, synthesis, and backlog shaping across product, codebase, docs, infrastructure, ops, architecture, system design, value prop, and agent readiness. Use when: "groom", "what should we build", "rethink this", "biggest opportunity", "backlog", "prioritize", "backlog session", "audit skills", "skill quality audit". Trigger: /groom, /groom audit, /backlog, /rethink, /moonshot, /scaffold.

harness-engineering

Harness engineering for Harness Kit primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "harness engineering", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "skill usage", "undertriggering skill", "description tax", "eval skill", "sync primitives", "roster defaults", "preferred stack", "stack defaults", "hosting defaults", "CI defaults", "observability defaults", "release defaults", "design system defaults", "storage defaults", "agent substrate defaults", "one core many faces", "API CLI MCP SDK skill template", "factory product template", "generate repo-local skill", "bespoke skill subset", "domain agent skill". Trigger: /harness-engineering, /harness, /skill.

skills/harness-engineering/templates/repo-local-skill

> Template. Copy to `<target-repo>/.agents/skills/<repo>-<domain>/SKILL.md` > and fill every bracketed placeholder from the live target repo. Delete this > line and every other `> ` guidance line before committing. See > `../../references/repo-local-skill-generation.md` for the full process. --- name: <repo>-<domain> description: | [One paragraph: what this skill verifies/runs/operates for <repo>, stated in terms of the repo's real shape (service/CLI/library/etc.), not generic process. En

next

Recommend the best next move from live thread and repo state. Use when: "what's next", "what next", "now what", "what should I do next", "what should we do next", "anything else to do", "where are we now, what's next", "what next in the backlog". Trigger: /next, /what-next, /now-what.

council

Convene a council/thinktank: fan one question out to several DISTINCT high-quality OpenRouter model families (via opencode/pi), each carrying a different generative persona/perspective, then synthesize the divergent thinking as chair. Generative deliberation — brainstorm, explore the option space, weigh tradeoffs, decorrelated ideation, lock a contested direction. Distinct from /roster's adversarial critique bench (that finds bugs in an artifact; this generates and reframes). Use when: "convene a council", "thinktank", "council of models", "brainstorm with different models", "get diverse perspectives", "panel of AIs", "what would different experts think", "divergence pass", "ideate broadly", "stress-test this direction with other models". Trigger: /council, /thinktank.

ci

Audit, design, and run repo-owned CI gates. Host-agnostic by default: local, GitHub Actions, Azure, or another runner should call the same repo-owned contract. Harness Kit's own gate is the Rust command `cargo run --locked -p harness-kit-checks -- check --repo .`; consumer repos keep their own native gate. Use when: "run ci", "check ci", "fix ci", "audit ci", "design CI", "host-agnostic CI", "Dagger", "is ci passing", "run the gates", "why is ci failing", "strengthen ci", "tighten ci", "ci is red", "gates failing", "feedback loop is slow", "run gates less often". Trigger: /ci, /gates.

groom

Always-on backlog grooming. Tidy, brainstorm, interrogate, investigate, research, and simplify in a single loop. Tidy is not a mode — it happens every time. Strategic-layer work is a mega-sweep: swarm investigation, external research, critique, synthesis, and backlog shaping across product, codebase, docs, infrastructure, ops, architecture, system design, value prop, and agent readiness. Use when: "groom", "what should we build", "rethink this", "biggest opportunity", "backlog", "prioritize", "backlog session", "audit skills", "skill quality audit". Trigger: /groom, /groom audit, /backlog, /rethink, /moonshot, /scaffold.

document

Generate world-class, source-verified reference documentation for a codebase: a multi-agent loop that surveys the repo, plans the information architecture, writes facet-scoped pages, and adversarially verifies every claim against live source before committing markdown + HTML + diagrams to docs/. Always runs the full verify loop; scope is incremental by provenance. Use when: "document this codebase", "generate the docs", "build a codebase wiki", "write architecture docs", "onboarding docs", "documentation site", "keep the docs in sync", "world-class docs". Trigger: /document, /docs, /wiki.

research

Web research, multi-AI delegation, and multi-perspective validation. /research [query], /research delegate [task]. Use when: "search for", "look up", "research", "delegate", "get perspectives", "web search", "find out", "investigate", "introspect", "check readwise", "saved articles", "reading list", "what are people saying", "X search", "trending", "which model", "compare models", "best model for", "model selection".

refactor

Architecture refactor mode: set a concrete improvement goal, refactor until the architecture is simpler and coherent, live-test after each significant step, autoreview, commit green milestones, and track progress in /tmp/refactor-{project}.md. Use when: "refactor this", "clean up the architecture", "make the design better", "refactor until you're happy", "pay down design debt", "simplify this subsystem". Trigger: /refactor.

harness-engineering

Harness engineering for Harness Kit primitives: skills, shared doctrine, provider roster, harness configs, gates, evals, bootstrap, and sync logic. Use for "improve the harness", "harness engineering", "bootstrap is wrong", "AGENTS.md is stale", "skill health", "skill usage", "undertriggering skill", "description tax", "eval skill", "sync primitives", "roster defaults", "preferred stack", "stack defaults", "hosting defaults", "CI defaults", "observability defaults", "release defaults", "design system defaults", "storage defaults", "agent substrate defaults", "one core many faces", "API CLI MCP SDK skill template", "factory product template", "generate repo-local skill", "bespoke skill subset", "domain agent skill". Trigger: /harness-engineering, /harness, /skill.

vision

Create or update root VISION.md as a first-class project north-star artifact. Conversational project interrogation, repo/workspace research, competitive or exemplar scan, lifespan clarification, philosophy distillation, and wiring repo-local harness primitives to read it. Use when: "vision", "vision.md", "project vision", "north star", "what is this project", "clarify product direction", "write/update VISION.md", "project philosophy", "why does this repo exist". Trigger: /vision, /north-star.

documentation13

skill-eval

Prove a skill beats no-skill with a falsifiable A/B eval, or retire it. Design, generate, run, and maintain a skill-specific eval: name the one claim the skill must earn, run it skill-on vs raw same-model, grade blind with objective checks first, return a keep/adapt/cut verdict. Use when: "eval this skill", "does this skill help", "prove the skill beats no skill", "write an eval for", "benchmark a skill", "is this skill worth it", "skill A/B", "skill regression test", "generate skill evals". Trigger: /skill-eval, /eval-skill, /prove-skill.

skill-eval

Prove a skill beats no-skill with a falsifiable A/B eval, or retire it. Design, generate, run, and maintain a skill-specific eval: name the one claim the skill must earn, run it skill-on vs raw same-model, grade blind with objective checks first, return a keep/adapt/cut verdict. Use when: "eval this skill", "does this skill help", "prove the skill beats no skill", "write an eval for", "benchmark a skill", "is this skill worth it", "skill A/B", "skill regression test", "generate skill evals". Trigger: /skill-eval, /eval-skill, /prove-skill.

skills/harness-engineering/templates/repo-local-skill

> Template. Copy to `<target-repo>/.agents/skills/<repo>-<domain>/SKILL.md` > and fill every bracketed placeholder from the live target repo. Delete this > line and every other `> ` guidance line before committing. See > `../../references/repo-local-skill-generation.md` for the full process. --- name: <repo>-<domain> description: | [One paragraph: what this skill verifies/runs/operates for <repo>, stated in terms of the repo's real shape (service/CLI/library/etc.), not generic process. En

human-writing

Edit, audit, or rewrite prose so it sounds like a specific human wrote it, not a generic AI draft. Removes AI tells, filler, formulaic structure, fake polish, vague claims, and detector-bait phrasing while preserving truth, voice, and audience fit. Use when: "humanize this", "make this sound less AI", "remove AI slop", "de-slop this", "edit this prose", "make this sound natural", "fix the writing voice", "rewrite this copy". Trigger: /human-writing, /deslop.

refactor

Architecture refactor mode: set a concrete improvement goal, refactor until the architecture is simpler and coherent, live-test after each significant step, autoreview, commit green milestones, and track progress in /tmp/refactor-{project}.md. Use when: "refactor this", "clean up the architecture", "make the design better", "refactor until you're happy", "pay down design debt", "simplify this subsystem". Trigger: /refactor.

deliver

Take one ticket or idea from raw intent to merge-ready (or shipped, when asked): context-first, docs→tests→code, live QA, refactor at three altitudes, semantic commits, diverse-provider review, adversarial pre-ship thinking. Use for "deliver this", "build this ticket", "make it merge-ready", "take this end to end". Trigger: /deliver.

sprites

Run lane cards on Fly Sprites: remote, isolated, scale-to-zero sandboxes for heavy or parallel agent work. Golden-checkpoint provisioning so lanes start on a ready sprite with zero setup tokens. Use when: "run this on a sprite", "remote lane", "offload to a sandbox", "dispatch to sprites", "bake a sprite", "sprite fleet", heavy/long-running/parallel sub-agent work that should not run on this machine. Trigger: /sprites, /sprite-lane.

human-writing

Edit, audit, or rewrite prose so it sounds like a specific human wrote it, not a generic AI draft. Removes AI tells, filler, formulaic structure, fake polish, vague claims, and detector-bait phrasing while preserving truth, voice, and audience fit. Use when: "humanize this", "make this sound less AI", "remove AI slop", "de-slop this", "edit this prose", "make this sound natural", "fix the writing voice", "rewrite this copy". Trigger: /human-writing, /deslop.

oracle

Browser-mode-only Oracle consults: bundle a prompt plus selected files and ask a signed-in ChatGPT GPT-5.5 Pro browser session for a second opinion. Use when stuck, debugging hard bugs, reviewing an architecture plan, or cross-checking a substantive diff with large file context. Never use Oracle API mode from Harness Kit. Trigger: /oracle, /consult.

orient

Fast session-start repository orientation from live local evidence. Use when: "orient yourself", "start of session", "new session", "where are we", "catch me up before acting", after compaction, after switching worktrees, or before choosing a Harness Kit workflow. Trigger: /orient, /ground, /session-start.

todoist

Manage the operator's Todoist as the system of record for tasks, reminders, and follow-ups — capture, triage, complete, and organize. Agent-native via the Todoist MCP; scriptable via the `td` CLI. Use when: "add this to my todoist", "add a task", "remind me to", "capture this", "what's on my todoist", "what's due today", "what's in my inbox", "mark this done", "create a project/label", "log a follow-up". Trigger: /todoist, /task, /todo.

ceo-review

Dialectical premise-and-alternatives audit for a plan, spec, or context packet. Four moves: premise challenge (is this the right problem?), mandatory structurally-distinct alternatives, cross-model outside voice, user-ratified convergence. Operationalizes the AGENTS.md "Diverge Before You Converge" doctrine at the plan-review stage. Use when: about to commit to a plan/spec/design, reviewing a ticket before shape, when "is this the right problem" would be useful, or any time a proposal smells like a symptom fix instead of a root-cause fix. Trigger: /ceo-review, /challenge, /premise-check.

office-hours

YC-partner-style interrogation of a raw idea. Six forcing questions before you shape anything: demand reality, status quo, desperate specificity, narrowest wedge, observation & surprise, future-fit. Operationalizes the AGENTS.md "Diverge Before You Converge" doctrine at the ideation stage — problem-diamond, pre-/shape. Use when: user arrives with a rough idea, backlog item is fuzzy, you can't already write a one-sentence goal with a testable outcome, or the phrase "what should we build" would be useful. Trigger: /office-hours, /oh, /interrogate.

tailor

Tailor this repository's harness. Explore the repo, read prior session history, browse the spellbook catalog, and install a per-repo set of skills into a shared repo-local skill layer, with harness-specific entrypoints bridged back to that shared copy. Workflow skills get rewritten with this repo's commands and conventions embedded throughout — not a generic body with a repo-notes appendix. Use when: "tailor this repo", "configure the agent for this codebase", "set up a harness", "what skills apply here". Trigger: /tailor.

tailor

Tailor this repository's harness. Explore the repo, read prior session history, browse the spellbook catalog, and install a per-repo set of skills into a shared repo-local skill layer, with harness-specific entrypoints bridged back to that shared copy. Workflow skills get rewritten with this repo's commands and conventions embedded throughout — not a generic body with a repo-notes appendix. Use when: "tailor this repo", "configure the agent for this codebase", "set up a harness", "what skills apply here". Trigger: /tailor.

ceo-review

Dialectical premise-and-alternatives audit for a plan, spec, or context packet. Four moves: premise challenge (is this the right problem?), mandatory structurally-distinct alternatives, cross-model outside voice, user-ratified convergence. Operationalizes the AGENTS.md "Diverge Before You Converge" doctrine at the plan-review stage. Use when: about to commit to a plan/spec/design, reviewing a ticket before shape, when "is this the right problem" would be useful, or any time a proposal smells like a symptom fix instead of a root-cause fix. Trigger: /ceo-review, /challenge, /premise-check.

office-hours

YC-partner-style interrogation of a raw idea. Six forcing questions before you shape anything: demand reality, status quo, desperate specificity, narrowest wedge, observation & surprise, future-fit. Operationalizes the AGENTS.md "Diverge Before You Converge" doctrine at the ideation stage — problem-diamond, pre-/shape. Use when: user arrives with a rough idea, backlog item is fuzzy, you can't already write a one-sentence goal with a testable outcome, or the phrase "what should we build" would be useful. Trigger: /office-hours, /oh, /interrogate.

investigate

Investigate, audit, triage, and fix. Systematic debugging, incident lifecycle, domain auditing, and issue logging. Four-phase protocol: root cause → pattern analysis → hypothesis test → fix. Use for: any bug, test failure, production incident, error spikes, audit, triage, postmortem, "investigate", "why is this broken", "debug this", "production down", "is production ok", "audit stripe", "log issues".

curate

Evolve the Spellbook library. Scan external sources for new skills worth indexing, review observations for improvement opportunities, brainstorm new primitives, investigate existing skills for consolidation or deletion, research power user patterns and best practices. Use when: "curate", "evolve spellbook", "what should we build", "find new skills", "audit the library", "consolidate skills", "what's new in the ecosystem", "spellbook maintenance", "improve primitives".

autopilot

Outer-loop delivery orchestrator. Composes cycles of /deliver → /deploy → /monitor → /investigate → /reflect, mutates the backlog, and emits harness suggestions to a branch. Inner loop is /deliver (one ticket → merge-ready, a black box here). Outer loop is this: continuous, unattended, budgeted. Use when: continuous delivery, "autopilot", "run the outer loop", "next N items", "overnight queue", "outer loop", "cycle". Trigger: /autopilot.

curate

Evolve the Spellbook library. Scan external sources for new skills worth indexing, review observations for improvement opportunities, brainstorm new primitives, investigate existing skills for consolidation or deletion, research power user patterns and best practices. Use when: "curate", "evolve spellbook", "what should we build", "find new skills", "audit the library", "consolidate skills", "what's new in the ecosystem", "spellbook maintenance", "improve primitives".

curate

Evolve the Spellbook library. Scan external sources for new skills worth indexing, review observations for improvement opportunities, brainstorm new primitives, investigate existing skills for consolidation or deletion, research power user patterns and best practices. Use when: "curate", "evolve spellbook", "what should we build", "find new skills", "audit the library", "consolidate skills", "what's new in the ecosystem", "spellbook maintenance", "improve primitives".

curate

Evolve the Spellbook library. Scan external sources for new skills worth indexing, review observations for improvement opportunities, brainstorm new primitives, investigate existing skills for consolidation or deletion, research power user patterns and best practices. Use when: "curate", "evolve spellbook", "what should we build", "find new skills", "audit the library", "consolidate skills", "what's new in the ecosystem", "spellbook maintenance", "improve primitives".

tailor-skills

Per-repo skill subset selector. Reads the current project, reads the spellbook catalog, picks 10-20 skills that fit this repo, writes the selection to .spellbook.yaml so bootstrap only symlinks that subset. Use when: "tailor skills", "select skills", "prune skills", "tailor catalog for this repo", "scope skills to this repo", "reduce context bloat". Trigger: /tailor-skills.

documentation11

frontend-design

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

autopilot

Full delivery pipeline: plan, build, ship, settle. Covers: shape/spec/design, TDD build, commit, PR creation, PR fix (CI/reviews/conflicts), PR polish, simplify, test coverage, verify ACs, walkthrough, issue management. Use when: shipping features, fixing PRs, creating PRs, building issues, simplifying code, checking quality, writing commits, managing issues. Trigger: /autopilot, /build, /shape, /commit, /issue, /check-quality, /test-coverage, /verify-ac, /pr-walkthrough.

design-lab

Conduct design interviews, generate five distinct UI variations in a temporary design lab, collect feedback, and produce implementation plans. Use when the user wants to explore UI design options, redesign existing components, or create new UI with multiple approaches to compare.

content-media10