
Create, update, or refactor Codex skills in this repo: SKILL.md, frontmatter triggers, agents/openai.yaml, scripts, references, and assets. Use for creating a skill or direct in-place skill edits now; not for read-only analysis or "should we update this?" turns. Treat `$refine` as higher-level evidence-driven existing-skill refinement.
Diagnose and optimize Codex skills from orthogonal evidence. Trigger for `$tune`, in-flight or historical skill usage analysis, intended-vs-observed behavior, missed/false/partial activations, `$seq` session evidence, skill-gap classification, `$refine` briefs, or applying a validated edit. Stop at audit/proposal for analysis asks; apply, commit, and push only when explicitly asked.
Refine an existing Codex skill in place with minimal diffs, then run quick_validate. Trigger for improving a skill's trigger description/frontmatter, workflow text, metadata, scripts/references/assets, or agents/openai.yaml; also iteration, refactor, rename, or fixes using usage/session-mining evidence such as `$seq`.
Classify a local codebase's current architecture from collector-backed, code-first evidence. Use for prompts asking what architecture a repo or slice actually uses, whether it is layered/hexagonal/MVC/plugin/pipeline/etc., what the strongest runner-up is, which coexisting patterns are directly evidenced, whether docs match implementation, or when an implementation agent needs a narrow repo-dialect preflight. Do not use for broad repo onboarding, layer removal, structural redesign, domain algebra, invariant design, implementation specs, or execution planning.
Shrink and unify code without changing behavior. Use when: simplify, refactor, reduce duplication, remove lines, extract helper, reuse component, DRY, collapse, better abstraction.
Orchestrate Codex skill optimization during active sessions through $cas goal control, $shadow single-session evidence, $tune diagnosis/refinement briefs, and the skill-optimizer custom subagent. Trigger for $opt, skill optimization loops, session-driven skill tuning, meta-skill audits, or explicit validated skill edits. Do not use for general code optimization, product optimization, or performance tuning.
Generate high-fidelity Deckset markdown presentations from conversation context. Use for decks, slides, presentations, speaker notes, Deckset markdown, or converting a conversation into a narrative slide flow. Checks upstream Deckset docs/examples without volatile refresh metadata.
Run Zig CAS helpers (`cas`, `cas_goal`, `cas_smoke_check`, `cas_instance_runner`, `cas_review_session`, `cas_conformance_suite`) for v2 app-server smoke checks, thread goal lifecycle control, direct thread/turn execution, detached review control, multi-instance fanout, and `$st` swarm conformance/retry-policy checks.
Mine Codex session JSONL (`~/.codex/sessions`) and memories (`~/.codex/memories`) for explicit `$seq` and artifact-forensics questions. Prefer `artifact-search`, then `adjudication-audit` for selected/rejected review-adjudication audits, then `skill-success-rank`, `skill-audit`, `workflow-audit`, `tool-audit`, `memory-inventory`, `message-search`, `workdir-report`, `skill-blocks`, `plan-search`, `session-prompts`, `session-tooling`, `orchestration-concurrency`, then `query-diagnose`, then generic `query`. Run opencode only when the request literally says `opencode`.
Capture, browse, and query evidence-backed execution learnings in `.learnings.jsonl`. Trigger cues: `$learnings`, browse/recent/search learnings, lessons learned, takeaways, wrap up, handoff, before commit/PR, after tests pass, fail-to-pass, strategy pivot, footgun, retry loop.
Land GitHub PRs end-to-end: update branch/PR, confirm reviews resolved, monitor CI until green, squash-merge, and clean local/remote state. Use for `$land`, finish/land/merge/close a PR, watch checks/runs, squash-merge, delete branch, or sync local state.
Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.
Implement, adapt, harden, or repair non-trivial code in a narrow, reviewable, contract-first, witness-backed way. Trigger for planned features, new code, design/plan implementation, migrations, correctness-sensitive refactors, review fixes, bugs, regressions, failing tests, or single-change hardening. Accept review-adjudication/fixed-point-driver handoffs unless stale or contradictory. Do not use for trivial formatting, rote renames, or informational questions.
Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.
Turn ambiguous project, architecture, implementation, or product requests into decision-complete implementation specs by researching first, emitting a spec-pipeline receipt, asking only material judgment questions or justifying skipped grilling, gating readiness before planning, compiling a spec, running one invariant challenge, running a fresh-eyes pass, linting final output, and preventing execution from outrunning decisions. Use for `$spec-pipeline`, write a spec, turn this plan into a spec, grill me then spec, spec automation, strict `$grill-me` to `$plan` handoff, spec receipts, or spec readiness automation. Never emit a proposed_plan block.
Challenge non-trivial code artifacts with authority-gated adversarial review. Surface only material, current, owned, witness-backed findings; separate candidate concern validity from material-finding eligibility; require no-finding countercases, soundness rows, authority clearance, verification paths, and change-agenda consistency before emitting a remediation agenda. Trigger for exhaustive review, fresh-eyes second pass, re-review after fixes, patch hardening, full-scope de novo challenge, or material fixed-point review. Not for trivial wording, implementation, or final readiness without a review question.
Discriminately adjudicate PR review comments before implementation. Treat each comment as a claim to test, preserve raw comment identity and input inventory, bind decisions to artifact state, build the strongest no-change countercase, separate valid concerns from valid proposed fixes, recover PR rationale with explicit `$seq` when needed, and emit a stale-proof gated ledger, resolve-selection map, adversarial action matrix, ablative/isomorphic counterproposal, resolution warrants, and surface budgets that decide what to address, validate only, resolve with proof only, rebut, defer, investigate, delete/collapse, or route. Trigger for `$review-adjudication`, review the review, adjudicate PR comments, are these comments relevant, which comments matter, should we act on these comments, gate review comments before implementation, refine this list to just those worth resolving, or select review comments to resolve. Not for implementing fixes, writing rebuttals only, or final merge closure.
Decide whether a grill/handoff packet is complete enough for `$plan`, spec generation, or downstream mutation. Use for `$spec-gate`, is this ready to plan, block planning, handoff packet, decision packet, premature specs, no-grill justification, mutation gate, or underspecified questions, proof bar, scope, non-goals, rollout/rollback, and receipts.
Run exactly one strongest project-specific invariant/adversarial challenge against a generated spec or plan, then decide whether to regenerate it. Use for `$spec-challenge`, A+ this plan, pressure-test the invariant, does this preserve X, single strongest critique, or after `$plan`/`$spec-pipeline` before implementation.
Lint generated implementation specs/proposed plans for missing non-goals, weak proof, unmapped requirements, absent rollback/abort criteria, unresolved material questions, missing primary invariant, missing receipts, unaccounted subagents, skipped challenge/fresh-eyes pass, oversized audit prose, or plan churn. Use for `$spec-lint`, lint this spec, implementation-ready plan checks, proof/rollback/traceability checks, or is this more plan or better plan.
Drive exhaustive build-review-improve-verify loops to Truth-Owner Ablative-Isomorphic Normal Form: one canonical owner per material invariant, no duplicate truth surfaces, no unresolved review counterexamples, no unresolved adversarial or ablation vetoes, no unretired additive scaffolding, no dominated/vestigial/subsumed surface without a keep warrant, and behavior-preserving proof-gated closure. Trigger when coding needs de novo re-litigation, PR review closure, repeated review/fix loops, invariant repair, proof-surface hardening, negative-evidence pruning, CAS/Codex review resolution, parallel adversarial action, optional architecture fingerprint preflight, or when agents risk adding local patches instead of deleting/refactoring/canonicalizing. Do not use for trivial one-step tasks or when the user wants one narrow phase.
Decide final readiness after material coding work by checking closure gates, current-head proof, soundness rows, invariant witnesses, stale handoffs, and residual risk. Trigger for final readiness, closure gates, fixed-point claims, proof receipts, or 'is this ready?' after material work. Do not use as the initial reviewer or implementer.
Precision language surface: sharpen wording, names, labels, headings, PR replies, commit/PR text, docs, user-facing explanations, and doctrine stacks without semantic drift. Trigger implicitly when the task asks for wording, naming, terminology, phrasing, language polish, final copy, doctrine words, mode names, or human-facing text. Do not trigger for ordinary implementation, verification, code review, or machine-consumed artifacts unless wording/naming output is part of the requested result.
Manage durable task plans in `.step/st-plan.jsonl`, including first-use choice between repo-committed storage and local-only `.git/info/exclude`, so state survives turns/sessions and can stay reviewable. Use for `use $st`, resume/export/import plan state, checkpoint milestones, track dependencies/blocked work, show ready next tasks, keep shared TODO state on disk, store backlog tasks off `update_plan`, select durable tasks for the mirror, map a `$select` plan into durable state, prove `$st` implementation tracking, mirror durable plan into Codex `update_plan` or OpenCode `TodoWrite`, or diagnose/repair `st-plan.jsonl` semantics, lock-file ignore policy, or seq/checkpoint integrity.
Resolve the current branch with a CAS-first review loop, native review as fallback-only, deterministic base/HEAD pinning, PR comment adjudication, validation, commit, push, and final PR sweep.
Mine historical Codex sessions, plans, reports, and workflows to improve future spec automation. Use for `$spec-retro`, learn from my past planning, mine plan churn, update exemplar library, derive automation from past practice, spec telemetry, receipt metrics, or which historical sessions should tune this spec workflow.
Capture, query, map, compact, reopen, and hand off evidence-backed failed hypotheses, reverted approaches, benchmark regressions, no-effect attempts, and strategy pivots so future work avoids repeated dead ends while stale evidence remains reopenable. Trigger cues: negative ledger, failed attempts, what have we already tried, avoid retrying dead ends, benchmark regressions, reverted approaches, strategy pivot, why did the last approach fail, before another optimization, search-space pruning.
Turn should-never-happen into cannot-happen with authority-gated invariant design. Define owned inductive invariants, separate predicates from pre/postconditions and derived facts, require counterexample traces, owner/scope/source-of-truth proof, transition preservation, policy/exception authority, witness parity, enforcement-boundary choice, and verification. Use for invariants, impossible states, validation sprawl, cache/index drift, idempotency/versioning, retries/duplicates/out-of-order events, races, loop correctness, policy-owned exceptions, generator/validator parity, descriptor-vs-occurrence identity, certificate/proof witness drift, fixture precondition alignment, or invariant-first hardening. Not for generic refactors, broad architecture essays, or implementation without an invariant gate.
Explicitly shadow, tail, watch, follow, monitor, supervise, or companion exactly one Codex session id/path through `$seq`, then apply a named target skill as an interpretation/reporting/proposal/action lens until the watched session stops.
Produce decision-complete, self-contained proposed_plan blocks, including lean greenfield bootstrap for sparse new projects and an execution spine for implementation campaigns/architecture work, especially when concrete APIs/protocols must be preserved.
Use for nontrivial code changes, refactors, bug fixes, PR reviews, AI-generated edits, blast-radius analysis, verification planning, regression tests, rollout/rollback, closure/readiness claims, implementation handoffs, or correctness under incomplete context/hidden constraints. Do not use for textual edits or trivial formatting unless risk analysis is requested. Alias: context-bound-verification.
Analyze OpenAI Responses API logs, Agents SDK traces, OpenTelemetry/LangSmith/Langfuse/custom spans, transcripts, and agent-loop code as effectful execution graphs. Use for agentic latency reduction, serial round-trip elimination, prompt-cache stability, model routing, tool-loop optimization, speculative execution validation, or proof-carrying rewrite design. Produces Latency Treaty IR, critical path, counterfactual schedules, prioritized fixes, instrumentation gaps, and CI-ready regression checks.
Finalize work after validation: confirm a signal, capture proof in the PR description, and open a PR without merging. Use for `$ship`, ship/finalize a branch, prepare/open a PR, or publish validation proof before handoff.
Create a language-agnostic ghost package (spec + portable tests) from a repo: SPEC.md, exhaustive tests.yaml, INSTALL.md, README.md, VERIFY.md, and upstream LICENSE provenance/regeneration. Use for `$ghost`, ghostify, spec-ify/spec-package this library, ghost library, or portable spec/tests for libraries or tool-using agent loops. For Lean-aided/formal/proved/machine-checked ghost extraction, keep Ghost as artifact authority and route Lean modeling/proof through `$lean`; do not use for implementation or skill edits.
Run Codex-native domain audits for security, UX/accessibility, performance, API design, copy, and CLI quality. Use for code audits, quality assessment, issue finding, pre-launch review, or explicit parallel Codex subagent audits.
Create/manage Codex app automations in local SQLite (~/.codex/sqlite/codex-dev.db). Use to add, list, update, enable/disable, delete, run now, edit names/prompts/RRULE/cwd scopes, or inspect automation records while troubleshooting.
Apply Algebra-Driven Design from `codex/algebra-driven-design`. Use for ADD, denotational design, combinator models, law-driven architecture, domain algebra, property tests, codebase modeling, event sourcing, workflow design, or agentic skill design.
Run one civilizational-scale ASI reframing pass verbatim: expand ambition 10x, collapse to the smallest artifact preserving the 10x insight, and require a mechanism, interface, proof surface, or strategy. Use for `$asi` or adequate answers needing ambition expansion without hype.
Mitigate incidental complexity when control flow is tangled, names are opaque, or reasoning crosses files. Use when reviews stall on readability, analysis-first refactor planning is needed, or you want essential-vs-incidental verdicts, dominant-risk triage, ranked simplification steps, one visibility artifact, and TRACE assessment. Do not use for greenfield requirements, architecture selection, or delivery planning.
View the user's screen plus hours of history. MUST use when resolving ambiguous requests lacking enough context, including recent work, specific app/document/error references, Chronicle questions, or asks about what you can see onscreen.
Performance optimization with measurement-driven latency, throughput, memory/GC, tail, algorithmic, systems, and micro-architectural work; profile evidence, score-gated experiments, behavior proofs, golden oracles, and regression guards. Use for optimize, speed up, reduce p95/p99, increase throughput/QPS, lower CPU/memory/allocations/GC/syscalls/round trips, profiling, bottlenecks, algorithmic improvement, or benchmarked perf passes. Without a runnable workload, operate in labelled UNMEASURED mode with exact benchmark/profiling/proof commands. Prove Zig-only bench_stats/perf_report CLI iteration before shipping.
Use before convergence for non-obvious but testable frames, analogies, inversions, or recombinations. Produces bounded frame packets with proof signals, assumptions, risks, and handoff guidance. Do not use for final selection, execution, routine brainstorming, wording polish, or option portfolios unless unusually distant frames are explicitly requested.
Clarify ambiguous or conflicting requests by researching first, then exhaustively interrogating assumptions, constraints, dependencies, trade-offs, edge cases, and failure modes before planning or implementation. Use for `$grill-me`, "grill me", hard questions, relentless interrogation, pressure-testing assumptions, scope/success clarification, or product/system-design decisions before implementation. Reply in the user's language. Stop before implementation. Each question round must include compact context explaining the downstream decision.
Manage AI coding CLI accounts with sub-100ms switching. Use when Claude Max, GPT Pro, or Gemini Ultra rate limits require instant account swapping without browser OAuth.
Activate maximal ambition, add a non-obvious frame, then choose one dominant accretive move grounded in current project state. Preserve original accretive prompt wording verbatim except for one injectable target parameter.
Host-driven ten-turn proof/disproof gauntlet for absolute claims. Valid use runs exactly 10 separate assistant turns, one numbered round each, through the bundled autoturn driver. No step, pause, incremental, single-reply, partial-run, or early-terminal mode.
Lateral-thinking playbook returning a five-tier strategy portfolio (Quick Win through Moonshot) with a default-basin check. Use for options, alternatives, trade-offs, stalled/repeated failures, creative reframing, or strategic path selection before execution.
Use for Lean 4 work: programs, proof fixes, formal specs, implementation correctness proofs, external behavior models, termination proofs, trust-boundary audits, mathlib, and Lake/toolchain diagnosis. Do not use for Coq, Isabelle, Agda, or generic pseudocode unless comparison is requested.
Systematically explore unfamiliar codebases and build reusable architecture summaries. Use for repo onboarding, legacy-code understanding, data-flow maps, entry-point discovery, or explicit parallel Codex subagent exploration.
Use when delimited continuations or defunctionalization are central: shift/reset, prompt/control, prompts, subcontinuations, effect handlers as control operators, CPS translations, answer-type modification, abstract machines, first-orderizing higher-order interpreters, continuation runtimes, or source-backed study/research roadmaps. Do not use for ordinary async/await, generators, monads, compiler optimization, PL theory, or functional-programming questions unless delimited control, continuations, CPS/control translation, or defunctionalization is explicit.
Orchestrate evidence-backed autonomous improvements to the local Codex skills ecosystem. Use when asked to auto-update skills, optimize skills from session evidence, bootstrap per-skill AUTO.md policies, scan skills for improvement candidates, create autonomous skill improvement PRs, or inspect auto-update status.
Mine a codebase for breakthrough, evidence-backed changes, additions, refactors, simplifications, DX, UX, reliability, performance, or architecture cleanup. Always run Glaze and ASI escalation gates before choosing. Output ranked opportunities, escalation ledger, and plan seed; do not implement or create tickets.
Use for implementing, reviewing, migrating, zig fmt steering, linting, testing, fuzzing, profiling, optimizing, or hardening Zig 0.16.0 code, including hazardous-code/Illegal Behavior audits: .zig, build.zig/build.zig.zon, std.Io/process.Init migration, C interop, comptime/reflection/codegen, allocator ownership, FFI, concurrency, safety-disabled scopes, raw pointers/layout/ABI hazards, dependencies, cache hygiene/disk pressure, and measured performance.
Cross-modal diagnostic/review workflow for software systems. Use to understand, explain, compare, critique, debug, profile, review, or refactor code by mapping technical signals into sensory models, then translating them back into engineering language. Best for architecture review, readability/maintainability, strange/flaky behavior, performance bottlenecks, API/UX critique, onboarding, and comparing implementations/designs by feel, friction, weight, rhythm, sharpness, smoothness, coupling, or complexity. Also use when prompts ask what a codebase, bug, logs, API, or system feels/sounds/looks like, or to make it lighter, smoother, cleaner, tighter, quieter, or more coherent. Do not use for exact API syntax, compliance/legal interpretation, security sign-off, rote code edits, or terse factual tasks.
Use when repeated validation, policy branching, protocol/state-machine drift, generated provenance loss, callback/effect boundaries, or semantic-consumer context point to structural refactor over local polish. Select one signal, one world boundary, and the smallest honest construction: typed state, free syntax, observations/projections, lifted interpreter/handler, explicit IR, law tests, canonical boundary artifact, Composition Certificate, Boundary Normal Form step, or Exact Context/Context Certificate. Prefer adapter-first staging and one falsifiable proof signal; escalate to Kan/codensity/free-builder mechanics only after a real cross-world boundary is identified.
Use when universalist or the user names a concrete world/boundary requiring Kan mechanics: Kan extensions/lifts, pre/postcomposition, Freyd/AFT free-builder diagnostics, Yoneda/Coyoneda boundary representations, defunctionalized boundary IRs, codensity/density and dense probes/duality, Exact Context Doctrine, context compilation, task-indexed exchange, Context Certificates, pointwise formulas, free/cofree completions, functorial data migration, compatibility facades, lifted implementations, residual obligations, Composition Certificates, Boundary Normal Form audits, or categorical law tests. Do not use for generic architecture unless worlds, boundary kind, known side, unknown location, witness slice, proof signal, and when applicable Composition Certificate are named or must be recovered.
Audit over-engineered codebases. Use when change latency or agent difficulty comes from frameworks, plugins, DI, codegen, task runners, config indirection, ORMs, GraphQL, monorepo/infra tooling, bundled web stacks, or requests to remove layers while preserving behavior. Produces evidence-backed cuts, lower-level replacements, phased migration, proof signals, rollback, and an essential-abstraction check.
Run one explicit glaze escalation pass verbatim, requiring a material new frame, invariant, mechanism, or artifact. Use for `$glaze`, merely adequate first answers, or preserving original rhetoric while pushing to a materially stronger direction.
Adversarially judge candidate moves: name a dominant winner or reject the set. Use after `$accretive` or ideation when strict dominance, anti-theater filtering, and concrete proof requirements are needed.
Score and rigorously improve a CLI tool's ergonomics for AI agents as the primary user. Use when "agent ergonomics", "make CLI agent-friendly", "robot mode audit", "intuitiveness scoring", "score my CLI for agents", or rebuilding a CLI's --help / --json / robot surface. Produces a sibling `__agent_ergonomics_audit/` workspace with surfaces, scorecard, heatmap, recommendations, playbook, regression tests, applied on an `agent-ergonomics-pass-N` branch.
Finalize GitHub PRs end-to-end: update branch/PR, confirm review conversations are resolved, monitor CI until green, squash-merge, and clean up local/remote state. Use when asked to $fin or to finish/land/merge/close a PR, watch checks or runs, squash-merge, delete the branch, and sync local state.
Rank hot paths by CPU, memory, I/O, and contention; hand the optimization skill a scored target list. Use when: profile, flamegraph, hotspot, bottleneck, p95/p99, IOPS, fsync, "why is this slow".
Use this skill as a read-only workflow-starting composite skill for coding move selection before implementation. It must visibly run or emulate the sequence $latent-diver -> $creative-problem-solver -> $accretive -> $dominance, optionally using .codex/agents read-only subagents only as evidence lenses, then stop with a Dominant Move Brief. Trigger for ambiguous architecture, refactor, debugging, performance, integration, migration, stalled work, repeated failures, competing implementation paths, or explicit requests to use latent-move, latent-diver, creative-problem-solver, accretive, and dominance together. Do not edit code, apply patches, or invoke an executor from this skill.
Launch and manage Codex Cloud tasks from the CLI, including detached background watchers that track completion. Use when users ask to run coding work in cloud/background agents, queue multiple cloud tasks, poll task status, fetch cloud diffs, apply cloud outputs locally, or pair cloud kickoff with `$cas` orchestration.
Use when tasks need real-browser web automation in Chrome/Chromium via CDP: open or navigate URLs, click/type/select in forms, run page JS, wait for selectors, scrape structured content, capture screenshots, validate UI flows, or run measured web-browser latency checks (`bench:eval`, `bench:all`) for perf regressions.
Review an agentic system’s configuration and implementation quality. Use when the user wants an opinionated assessment of a system prompt, tool surface, orchestration, guardrails, context handling, or eval setup, and wants concrete recommendations or a redesign plan.
Publish provided content to a secret GitHub gist bucket and surface it to a human via a local macOS notification that opens the gist when tapped. Use when Codex needs to show or notify a human with raw text or file content, wants a gist-backed human handoff, or is explicitly invoked as `$yo`. YO accepts inline text or file paths, creates or reuses a repo-scoped secret gist, and notifies through macOS Notification Center.
Use when the user wants to improve, optimize, debug, test, or iterate on a prompt, agent instruction, Claude Skill, or workflow through a measured eval loop. Runs baseline tests, creates binary success checks, changes one thing at a time, retests, keeps/reverts changes, and returns an optimized final prompt or skill.
Use when a repo has `.xit/` or the user asks for xit: translate git-like intents to non-interactive `xit` CLI commands (`status/diff/log --cli`, add/commit/branch/merge/cherry-pick), avoid the TUI, and do not use git unless explicitly requested.
Use when designing, auditing, migrating, or fixing OpenAI agent harnesses where prompt caching, Responses API state, cached_tokens, prompt_cache_key, prompt_cache_retention, tool/schema stability, reasoning-item carryover, or compaction affect latency or cost. Do not use for generic HTTP caching, answer memoization, CDN/browser caching, or vector-store caches.
Research software tools via source code, GitHub, web. Use when creating skills, learning new tools, finding undocumented features, or bleeding-edge patterns.
Continuously evaluate and improve AGENTS.md-style harness instructions through explicit-trigger OpenCode loops with an explicit model. Use when you want recurring harness reliability runs, especially for Gemini 2.5 Pro/OpenCode harness tuning, clean-repo eval cycles, curated exact-output probes, automatic eval-branch commits and PR updates for passing harness/doc changes, and external-blocker detection or regression auto-revert without scheduler/cron automation.
Systematic UX evaluation using Nielsen heuristics and accessibility checks. Use when reviewing UI, "is this usable", improving user experience, or pre-launch.
Create micro-patches from staged git changes (minimal incision) with at least one validation signal per patch. Use when asked to split work into small .patch files, export/share diffs, or produce patches instead of commits.
Operationalize expert methods into corpus, quote bank, triangulated kernel, operator library, and validators. Use when distilling a methodology or mining session history into executable rules.
Systematic audit-fix-rescan cycle for comprehensive bug elimination. Use when code review, deep audit, "find all bugs", or pre-release hardening.
PR autopilot via `gh` only: create/manage PRs, keep branches current, enforce required CI gates, apply surgical code patches, and publish merge-ready handoff without merging. Use when asked to run or monitor PR automation, fix failing required checks, keep local/remote branch state clean, or prepare branch/PR cleanup for human merge.
Generate provider-agnostic AI agent guardrail blueprints and control matrices from a use case. Use when designing or reviewing agent safety architecture, prompt-injection and tool-misuse defenses, risk-tiered human approval gates, or auditable enterprise guardrail policies using industry patterns across top providers.
Control Ghostty terminal emulator via CLI. Use when managing windows, tabs, splits, fonts, or configuration for Ghostty.
Remove telltale signs of AI-generated "slop" writing from documentation. Use when polishing README files, API docs, or any public-facing text to sound authentically human.
Convert ChatGPT, Gemini, Grok, and Claude share links to clean Markdown + HTML. Use when archiving AI conversations, preserving code fences, or publishing transcripts to GitHub Pages.
Fetch and summarize upcoming unreleased Codex features using a durable local clone synced from GitHub, with source-file mining as primary evidence. Use when asked for latest upcoming/openai-codex features, what is coming next but not in the latest stable release, or a live release-gap summary with links and as-of timestamp.
Produce reusable technical architecture documents from codebase exploration. Use when onboarding, "write up what this does", architecture docs, or handoff.
Mine patterns that recur across multiple projects and generalize into reusable artifacts. Use when "I've seen this before", DRY across repos, or building shared libraries.
Mine past agent sessions for working prompts, decisions, and patterns. Use when "what did I ask?", "find that prompt", session archaeology, or agent history.
CASS Memory System (cm) for procedural memory. Use when starting non-trivial tasks, learning from past sessions, building playbooks, or preventing repeated mistakes via trauma guard.
Fungible agent architecture for multi-agent coding. Use when scaling agent swarms, designing multi-agent workflows, recovering from agent failures, or choosing specialized vs. interchangeable agent patterns.
Audit SaaS billing security: payment bypass, webhook integrity, auth gaps, RLS, secrets. Use when "security audit", "billing security", or pre-launch review.
Run NTM for multi-agent tmux orchestration, work triage, robot mode, safety, coordination, and local APIs. Use when spawning swarms, dispatching work, or operating `ntm` as an agent or human operator.
Assess project status against README/plan vision. Use when "where are we", "reality check", "what's missing", "are we on track", "gap analysis", or "does this actually work".
Find and fix concurrency bugs - deadlocks, races, livelocks, await-holding-lock, database locks, LD_PRELOAD init, swarm races. Use when processes hang, tests flake, or auditing concurrency.
Run Ultimate Bug Scanner (UBS) for code review. Use when reviewing code, checking for bugs, scanning for security issues, validating AI-generated code, or pre-commit quality checks.
Convert local PDF files or folders of PDFs into Markdown files using the bundled converter in this skill. Use this when the task is PDF-to-Markdown conversion inside the current workspace. Do not use it for OCR-heavy scanned PDFs, image extraction, or unrelated PDF summarization.
Comprehensive markdown planning methodology for software projects. Use when starting a new project, creating implementation plans, or refining architecture before coding.
Profile-driven performance optimization with behavior proofs. Use when: optimize, slow, bottleneck, hotspot, profile, p95, latency, throughput, or algorithmic improvements.
Generate and operationalize improvement ideas for projects. Use when brainstorming features, planning improvements, creating beads from ideas, or "what should we build next".
Coordinate heterogeneous MultiAgentV2 task trees with `update_plan`, `spawn_agent`, `assign_task`, `send_message`, `list_agents`, and built-in `explorer`/`worker` roles. Hand only homogeneous leaf batches to `$mesh`.
Use when a task involves ambiguous or shifting software requirements, architecture or system design choices, build-vs-buy decisions, thin prototypes, incremental delivery planning, or evaluating tools/frameworks/AI systems without treating them as a silver bullet. Use it to separate essential complexity from accidental friction and to produce a grounded plan before or alongside implementation. Do not use for straightforward code edits, isolated bug fixes with clear reproduction steps, rote migrations, or purely syntactic refactors unless the user explicitly asks for broader design guidance.
Swarm-ready work selector: choose one source (invocation list, `SLICES.md`, or `plan-N.md`), refine it into dependency-aware atomic tasks, and emit an OrchPlan (waves + delegation) plus optional pipelines. Use for prompts like `$select`, `use $select`, `pick the next safe wave`, `pick the next ready slice`, `orchestrate workers from SLICES`, or `what should run in parallel next`. Plan-only; no writeback; orchestration-agnostic.
Iteratively apply a named skill or slash command N times with progressive deepening. Use when "apply 10 times", "keep improving", "run again", iterative polish, improvement loop, or multi-pass refinement.
Create micro-commits (minimal incision) with at least one validation signal per commit. Use when requests say "split this into micro commits," "stage only the minimal change and commit," "keep commits tiny while checks pass," or when parallel workers/slices need isolated, reviewable commits.
Install strict Xcode Makefile tooling for iOS/macOS projects, including build/run/test scripts with AGENT_NAME-based per-agent isolation under build/. Use when a project needs reproducible local CLI builds without full app scaffolding.
Orchestrate iOS/macOS app scaffolding and optional skill adoption for existing projects. Use when users want a guided wizard that can scaffold with XcodeGen and optionally install xcode-makefiles and simple-tasks.
Use `$mesh` only for homogeneous leaf-batch execution over `spawn_agents_on_csv`: once planning has shaped repeated independent units, prefer one substantive row per unit with structured results and explicit concurrency.
Run a second, harder escalation pass using the original glazer prompt words verbatim. Use when prompts say `$glazer`, when a first escalation still feels incremental, or when you want the exact original rhetoric preserved while forcing a sharper replacement rather than more polish.
Restore machine responsiveness via safe, selective process cleanup. Use when system unresponsive, high CPU/load average, IO pressure, filesystem cache bloat, memory pressure from btrfs/ext4, stuck tests, competing cargo builds, confused agents in loops, swap thrashing, disk full, systemd-oomd kills, or tmux/zellij session sprawl.