tkersey

134 verified skills7,729 total stars

review-fold

Classify and quotient review findings, failing tests, incidents, bug reports, migration failures, and other witnessed falsifiers against accepted intent and the current Construction. Author counterexample-set/v1 without selecting repairs, counting review credit, or granting mutation.

testing64

learnings

Capture, browse, query, supersede, migrate, and selectively admit evidence-backed execution learnings through the repo-local `ledger --source learnings` API. Trigger for `$learnings`, browse/recent/search learnings, lessons learned, takeaways, wrap up, handoff, validation transitions, strategy pivots, footguns, retry loops, or memory admission of a durable learning.

development64

opt

Orchestrate evidence-backed optimization of user-owned Codex skills through $seq or $shadow evidence, $tune diagnosis, and $refine package editing and outcome observation. Use for explicit skill audits, missed/false/ceremonial activation, decision-contract tuning, regression repair, or authorized skill edits. Not for application-code optimization or autonomous portfolio mutation.

development64

proof-patch

Render a concise human proof from Actuating's current complete closure receipt. Use after implementation or review closeout to bind the Goal, Construction, subject, Evidence Ledger head, Counterexample disposition, proof, retirements, applicable review convergence, publication, residual risk, and human review focus without deciding closure or publishing.

development64

retrace

Reconstruct and experimentally challenge decisions from prior Codex sessions. Use for `$retrace`, historical decision replay, counterfactual forks, alternative-route challenges, hindsight-separated retrospectives, workflow-governance audits, skill decision attribution, or 'why did that session choose this?'. `$seq` owns deterministic history and source-governance evidence; `$cas` owns safe thread/rollout replay and FIR lifecycle; `$retrace` owns bounded experiments and DRR synthesis. Never present fork output as the source model's hidden chain of thought.

development64

ship

Finalize validated work into a proof-backed pull request without merging, and return immutable SHIP-v1 publication evidence to Actuating without taking architecture, review, or closure authority. Use for $ship, opening or updating a PR, promoting a draft, publishing validation proof, or producing a PR handoff.

testing64

prove-it

Run an artifactless parallel subagent gauntlet for absolute or suspiciously clean claims. Rounds 1-9 are independent lens packets produced concurrently by the reusable prove_it_lens custom agent; round 10 is the prove_it_oracle custom agent that runs only after all nine packets return and owns the final verdict plus final response text.

testing64

memory-source-notes

Safely append, inspect, validate, deploy, and materialize derived digests for typed source-evidence notes in controlled Codex memory extensions. Use only after a handoff from learnings, negative-ledger, or synesthesia, or an explicit custom source capture or diagnostic request. Never edits compiled memory.

tools64

tune

Tune an existing Codex skill by comparing its intended decision contract with observed decision episodes and outcomes. Prefer `seq skill-decision-audit --mode tune-packet`; use for `$tune`, intended-vs-observed behavior, missed/false/ceremonial activations, ignored clauses, wrong routes, outcome regressions, repeated workarounds, STE-v1 packets, skill-delta candidates, explicit `$refine` handoff, or commit/push authorization for skill-refinement changes. Stop at audit/proposal unless apply or skill-refinement publication is explicit. Commit/push only with explicit publish intent. `$seq` CLI changes require a separate special spec.

tools64

synesthesia

Reversible cross-modal diagnostic lens for software. Use when the user asks what code, architecture, behavior, logs, APIs, or alternatives feel, sound, look, or move like; for compare-by-feel analysis; when literal analysis leaves multiple plausible structural, temporal, interaction, or boundary interpretations that cross-modal recoding could distinguish; or after an owning technical workflow documents such an ambiguity. Start from literal evidence and translate every sensory statement into a technical hypothesis, uncertainty, falsifier, and next move. Not for ordinary architecture, performance, readability, or UX audits; exact syntax; legal/compliance or security sign-off; or code mutation by itself.

development64

glaze

Run one explicit glaze escalation pass verbatim, using EXCAVATORY depth, APORETIC resistance to premature closure, AUDACIOUS option expansion, POIETIC creation, and SALTATORY discontinuity-seeking to require a material new frame, invariant, mechanism, artifact, or breakthrough candidate. Use for `$glaze`, merely adequate first answers, or preserving original rhetoric while pushing to a materially stronger direction.

testing64

ms

Create or directly edit Codex skill packages: SKILL.md, triggers, agents/openai.yaml, scripts, references, assets, and optional decision instrumentation. Use for explicit skill creation or direct skill surgery now; classify the skill as decision/execution/evidence/orchestration/mixed, and scaffold SKDC-v1 only when stable decision rules need future `$seq`/`$tune` observability. Use `$refine` instead for usage-backed existing-skill refinement.

development64

refine

Own and apply bounded, evidence-backed optimization of an existing Codex skill. Use after `$tune` supplies STE-v1/SDC-v2 or a complete REFINE-SKILL-v3 brief; inspect the target package, select one smallest intervention, edit only authorized files, preserve stable decision-contract IDs, and retain the named behavioral `$seq` observation query. Not for broad historical diagnosis or system-managed skill optimization.

development64

shadow

Watch exactly one Codex session through one target-skill lens and emit only decision-relevant deltas. Prefer `seq skill-decision-audit --mode delta` for `$shadow $tune`; use for shadow/tail/follow/monitor one session, missed or contrary skill decisions, validation/outcome changes, worker decisions, or goal-cycle status. Do not scan broad history, inspect raw JSONL, create a second continual controller, or repeat full analysis when the cursor contains no new decision evidence.

development64

logophile

Precision language and doctrine compiler: sharpen wording, names, labels, headings, PR replies, commit/PR text, docs, explanations, doctrine words, activation phrases, and agent doctrine stacks without semantic drift. Trigger for wording, naming, terminology, phrasing, language polish, final copy, doctrine words, activation phrases, mode names, operator naming, persona naming, human-facing text, or finding language that activates better agent behavior. Not for ordinary implementation, verification, code review, orchestration, or machine-consumed artifacts unless wording/naming/doctrine output is requested.

development64

seq

Mine Codex session JSONL and memory artifacts with the Zig `seq` CLI. Use for explicit `$seq`, artifact/session/tool/memory/plan forensics, skill activation and outcome audits, decision provenance, `$tune` evidence, `$retrace` source capsules, review-compiler provenance, watched-session deltas, worker attribution, or reproducible historical reports. Prefer the narrowest lifted command and preserve denominators, provenance, contamination, and uncertainty.

tools64

codebase-doctrine

Compile deep repository evidence into artifact-bound correctness doctrine, authority/law/proof maps, strongest knowledge destinations, and an optional minimal repository-skill portfolio. Use when the user wants both deep codebase understanding and durable doctrine, knowledge routing, or repository-specific skill recommendations. Research discoverable facts before asking; use `$grill-me` only for material user-owned intent choices. Not for quick onboarding, one isolated invariant, ordinary implementation, generic review, or direct skill creation.

development64

grill-me

Clarify ambiguous or conflicting requests by researching first, then interrogating assumptions, constraints, dependencies, trade-offs, edge cases, and failure modes before planning or implementation. Use for `$grill-me`, "grill me", hard questions, pressure-testing, scope/success clarification, or product/system-design decisions. Reply in the user's language. Stop before implementation. Each question round needs compact downstream-decision context.

testing64

universalist

Use whenever implementation, review, migration, or resolution creates, changes, preserves, validates, bypasses, or removes an owned code boundary. Universalist synthesizes the smallest context-relative, correct-by-construction boundary candidate from current requirements and host capabilities. In Actuating composition it nominates that candidate without selecting or reopening the Construction; standalone work may select a route under its root authority. Make invalid states and illegal compositions unrepresentable where possible, centralize residual checks, preserve observations and compatibility, record invalidation triggers, and return obstruction rather than invent correctness. Includes double-category square calculus when processes and architecture changes compose independently. Implicit invocation on; team mode only by explicit request.

development64

reduce

Audit over-engineered codebases by factoring layers into live obligations, quotienting redundant distinctions, ablating unearned surface, and normalizing the survivors while preserving required behavior. Use when change latency or agent difficulty comes from frameworks, plugins, DI, codegen, task runners, config indirection, ORMs, GraphQL, monorepo/infra tooling, web stacks, or requests to remove layers. In Actuating composition, return one compact non-authoritative minimization challenge before Construction selection; use RC-v1 for standalone audits, migrations, or independently durable handoffs.

tools64

goal-workgraph

Project current source-bound work into the smallest verifier-first owner graph only when decomposition changes execution. Omit the graph for one bounded operation; compress repeated classes, derive the frontier from current evidence, and invalidate stale graphs.

development64

complexity-mitigator

Existing-code comprehension and local winnowing preflight. Use for simplify/refactor/clean up/untangle, nested branches, boolean soup, opaque names, mixed responsibilities, cross-file state, or review stalls. Factor the local whole, separate essential/incidental/specification-risk factors, winnow dominated or duplicated factors, and emit the smallest clarity cut. Not for broad architectural layer removal, kernel quotienting, invariant remediation, or greenfield planning.

development64

plan

Compile accepted intent or a `$spec-pipeline` PSC-v1 source contract into a source-bound execution policy and immutable `plan_id`, then exhaustively refine it to a policy-synthesis fixed point before handoff to `$actuating`. Use for `$plan`, spec-to-execution lowering, adaptive probes, stabilization plans, or plan revision. Preserve semantic authority; never mutate the repository or silently select an execution plan.

development64

evidence-fold

Fold tests, diffs, logs, benchmarks, screenshots, review results, and artifact state into a structured verdict: done, continue, regress, blocked, invalid-proof, ask-human, or refactor-kernel.

development64

ledger

Ensure a `ledger` command is available on PATH; materialize, validate, record, replay, and project requested Actuating artifacts without taking semantic or execution authority; coordinate the shared Learnings/Synesthesia/Negative Ledger lifecycle checkpoint and repo-local source-memory reconciliation; address Universalist plans and receipts; and perform pure artifact validation.

testing64

actuating

Turn accepted intent and review evidence into correct-by-construction software through Goal Contracts, Counterexample Sets, Construction Contracts, and an Evidence Ledger. Use bare $actuating for implementation, Ship publication, and review convergence; use explicit implement, triage, remediation-plan, or review-closeout for their bounded routes. Actuating owns construction selection, orchestration, Counterexample evaluation, and the next action; Ledger is a non-executing artifact substrate and Ship alone owns public effects.

development64

hylo

Compile historical Codex sessions into governed counterfactual evidence, evaluate an existing owner-applied candidate through blinded paired HCTP trials, and fold observable evidence into RUN, OBSERVE, or STOP. Use for `$hylo`, CRF extraction, counterfactual replay, source-governed direct or historical trials, sealed evidence, paired baseline/candidate evaluation, causal frontiers, or evidence-governed improvement.

development64

cybernetic

Systems-thinking and feedback-control skill. Use for `$cybernetic`, cybernetics, complex adaptive systems, root-cause vs structure, feedback loops, stocks/flows, leverage points, incentives, delayed effects, unintended consequences, DART diagnosis, clear/complicated/complex/chaotic classification, intervention/policy design, organizational dynamics, product/business/ecosystem diagnosis, workflow loops, same-cluster review recurrence, or avoiding local optimizations that harm the whole. Produces cybernetic_context or cybernetic_packet; read-only unless routed to implementation.

data-ai64

codebase-archaeology

Systematically explore unfamiliar codebases and build reusable architecture summaries. Use for repo onboarding, legacy-code understanding, data-flow maps, entry-point discovery, or explicit parallel Codex subagent exploration.

development64

zig

Use for Zig 0.16.0 implementation, review, migration, build/package, comptime/codegen, formatting/lint, testing/fuzzing, profiling, hazardous low-level code, FFI/layout, concurrency, cache operations, boundedness/assertion/control-flow review, and semantic failures involving proof binding, borrowed-lifetime escape, fallible mutation atomicity, parser/verifier completeness, repository contract drift, or stale proof context. Verify the installed Zig version before version-sensitive work.

development64

spec-pipeline

Canonical current-spec engine. Turn ambiguous project, architecture, implementation, or product requests into decision-complete implementation specs; operate narrowly in gate-only, challenge-only, or repair mode; default full-mode plan-ready specs to lane=spec_to_plan; and tail-call `$plan` when SGR-v2 and execution handoff authorize planning. Never emit a proposed_plan block.

testing64

spec-retro

Historical learning skill for the specification system. Mine multiple prior specs, SGR-v2 receipts, plans, sessions, reports, and churn evidence into concrete updates to `$spec-pipeline` contracts, tools, subagent policy, exemplars, and measurement. Use for `$spec-retro`, improve my spec process from history, analyze spec usage reports, mine plan churn, missing phase impact, repeated gate/challenge/lint failures, or report-to-automation work. Do not use for producing or linting one current spec.

tools64

invariant-ace

Turn should-never-happen into cannot-happen with authority-gated invariant design: owned inductive invariants, counterexample traces, source-of-truth proof, transition preservation, exception authority, witness parity, enforcement boundary, and verification. Use for invariants, impossible states, validation sprawl, cache/index drift, idempotency/versioning, retries/duplicates/out-of-order events, races, loop correctness, policy exceptions, generator/validator parity, descriptor identity, witness drift, fixture preconditions, or invariant-first hardening. Not for generic refactors, architecture essays, or implementation without an invariant gate.

development64

negative-ledger

Implicitly invoke when implementation, debugging, review, or validation encounters a witnessed failed/no-effect attempt, benchmark or test regression, revert, repeated same-cluster retry, abandoned strategy, or asks what has already been tried. Query/map before repeating a route; capture only inspectable decision-shaping negative evidence through the repo-local `ledger` source API; reopen only after proved applicability changes; selectively admit complete projections to Codex memory.

development64

accts

Manage local Codex account switching with a metadata-only TOML config, safe auth.json vault backups, pending manual account activation, and weekly reset-cycle rotation. Use when the user asks to manage Codex accounts, switch Codex accounts, inspect account status, or rotate through accounts after weekly limits reset.

development64

goal-contract

Compile accepted intent into the sole source-bound goal-contract/v3 artifact. Use before multi-step implementation, review closeout, migration, or hard debugging to bind outcomes, laws, authority, scope, compatibility, and acceptance without selecting architecture, choosing operations, or granting mutation.

development64

ideate

Mine a codebase or product surface for evidence-backed breakthrough opportunities. Use for `$ideate`, repo/product improvement discovery, idea portfolios, non-obvious refactors, DX/UX/reliability/performance opportunities, or choosing what to plan next. Mode-aware: fast, standard, deep, or audit-only. Run Glaze and ASI prompt gates before choosing; output a ranked opportunity portfolio, escalation ledger, IDR-v1 receipt, and a planning handoff seed when evidence is sufficient. Do not implement, create tickets, or emit task graphs.

development64

ghost

Create a language-agnostic ghost package from a repo: SPEC.md, exhaustive tests.yaml, INSTALL.md, README.md, VERIFY.md, and LICENSE provenance/regeneration. Use for `$ghost`, ghostify, spec-ify/spec-package this library, ghost library, or portable spec/tests for libraries or tool-using agent loops. For Lean-aided/formal/proved extraction, keep Ghost as artifact authority and route Lean modeling/proof through `$lean`. Not for implementation or skill edits.

tools64

creative-problem-solver

Generate a compact five-tier strategy portfolio when the next task is choosing among materially different paths. Implicitly invoke for explicit requests for options, alternatives, trade-offs, reframing, or help escaping repeated failure. A name-only or meta mention does not authorize the portfolio route. Do not activate for direct implementation, single-answer advice, skill analysis/tuning, repository-evidence opportunity mining ($ideate), or detailed planning ($plan).

development64

fm

Invokes Apple's macOS 27 fm command-line tool from a local Mac to use the on-device system model or Private Cloud Compute, including instructions, image prompts, schema-constrained JSON, and noninteractive automation. Use when the user asks to run Apple Foundation Models through fm, compare system versus pcc, generate structured output, or automate fm without Swift or an app.

tools64

land

Safely finish an explicitly selected GitHub PR: bind exact repository/base/head identity, close review blockers, verify required checks, merge or wait for queue/auto-merge completion, prove live MERGED state, and then clean remote/local branches and associated worktrees. Use only for explicit `$land` or unmistakable merge/land intent. Do not use merely to watch CI, close an unmerged PR, delete a branch, sync local state, or open/update a PR.

testing64

cas

Run the Zig CAS app-server helpers for account and goal control, direct app-server methods, detached review attempts, session inquiry, and bounded fanout. For reviews, CAS owns target capture, attempt lifecycle, transport recovery, principal quality, structured tuple-bound verdicts, and finding provenance through the current cas review run, start, and wait surface.

tools64

goal-grind

Execute exactly one lead-selected Zig actuation operation and return event-bound evidence to the coordinator. Use after $goal-actuating has prepared a capability; do not create authority, choose scope, recurse, resolve review findings, or claim goal completion.

development63

goal-actuating

Coordinate an accepted implementation goal or explicit review workflow through actuation-kernel generations. Select one bounded operation, delegate execution to $goal-grind, fold evidence, consume review-resolution/v1, acquire CAS review evidence when required, and request a kernel-derived closure-decision/v1 without owning publication.

development63

agent-loop-schemes

Translate an accepted GoalContract into normalized Zig actuation-operation topology. Use when work has repeated classes, migrations, debugging history, review campaigns, proof fanout, branch comparison, or nontrivial stopping conditions; remain advisory and never grant mutation or completion.

development63

fresh-eyes

Run one explicit fresh-eyes review pass verbatim, rechecking the whole target for blunders, mistakes, errors, oversights, omissions, problems, misconceptions, bugs, and related defects. Use for `$fresh-eyes` or as the mandatory fresh-eyes auxiliary lane in `$actuating` closure-grade review.

testing63

emulator

Instantiate Ghost-style behavior contracts as executable, replayable, mutatable synthetic implementations. Use for `$emulator`, emulator runs, generated worlds from Ghost packages, synthetic implementations, scenario mutation, counterexamples, implementation divergence, trace reports, or EER-v1 execution reports. Not for deciding what to specify, producing Ghost contracts, editing target skills, or assuming emulator failures imply skill defects.

testing62

recursion-scheme-planner

Use after an implementation spec or direct goal is accepted when deciding how to break it into planning, implementation, review, proof, memory, and parallel subagent loops. Selects the recursion-scheme topology that $agent-loop-schemes compiles into ALSR/HYL.

development62

accretive-implementer

Implement exactly the owned contract with the smallest sufficient surface. In Minimum Behavioral Kernel mode, realize an accepted kernel design in a disposable worktree without introducing new distinctions, orphan constructs, or wound-specific proof.

testing61

dominance

Adversarially judge candidate moves: name a dominant winner or reject the set. Use after `$accretive` or ideation when strict dominance, anti-theater filtering, and concrete proof requirements are needed.

tools61

lean

Use for deliberate Lean 4 work: proof repair, theorem development, verified programs, model/specification design, external-code models, state-machine or trace invariants, termination proofs, Std/mathlib theorem discovery, Lake/toolchain diagnosis, and high-assurance trust audits. Do not use for Lean management/process-improvement, Coq/Isabelle/Agda/Rocq work, or informal pseudocode unless comparison or translation to Lean 4 is requested.

tools61

doctrine-compiler

Use when non-trivial work needs Challenge Escalation, latent-intelligence activation, frame-market selection, doctrine operators, dominant-move selection, ablation/surface-tax judgment, reification, review comment law, negative capability, route receipts, or proof-bearing refusal to mutate.

development61

lift

Performance optimization with measurement-driven latency, throughput, memory/GC, tail, algorithmic, systems, and micro-architectural work; profile evidence, score-gated experiments, behavior proofs, golden oracles, and regression guards. Use for optimize, speed up, reduce p95/p99, increase throughput/QPS, lower CPU/memory/allocations/GC/syscalls/round trips, profiling, bottlenecks, algorithmic improvement, or benchmarked perf passes. Without a runnable workload, operate in labelled UNMEASURED mode with exact benchmark/profiling/proof commands. Prove Zig-only bench_stats/perf_report CLI iteration before shipping.

tools61

context-bounded-verification

Use for nontrivial code changes, refactors, bug fixes, PR reviews, AI-generated edits, blast-radius analysis, verification planning, regression tests, rollout/rollback, closure/readiness claims, handoffs, or correctness under incomplete context/hidden constraints. Not for textual edits or trivial formatting unless risk analysis is requested. Alias: context-bound-verification.

development61

fixed-point-driver

Realize one already selected normal form or execution-policy action inside a fenced `$st` workspace claim. Use only with explicit workspace, plan, claim, fencing token, GCR-v2, external worktree, resource boundary, and proof obligations. Emit a bounded realization result/change-set candidate; never widen scope, edit another plan, or advance the shared target branch.

testing61

algebra-driven-design

Apply Algebra-Driven Design. Use for ADD, denotational design, combinator models, law-driven architecture, domain algebra, property tests, codebase modeling, event sourcing, workflow design, or agentic skill design. If the canonical bundle is unavailable, use this wrapper as the minimal ADD kernel and report the missing bundle path.

development61

review-adjudication

Convert review claims into minimal, intent-anchored counterexamples. Verify current behavior, branch liability, AC-v2 horizon relation, novelty, kernel impact, and the only legal disposition. Use for review findings, PR comments, CAS findings, terminal holdouts, CEX-v1, or deciding whether a valid issue belongs in the current campaign. Never issue direct code-mutation authority or hand raw review prose to an implementer.

development61

verification-closure

Make the final current-artifact readiness decision. For Minimum Behavioral Kernel `$resolve`, require a current MBKC-v1, fixed campaign base, accepted kernel, whole-PR realization, semantic-surface conservation, zero orphan code, compressed proof, current holdouts, physical commit/push, and explicit closure horizon.

development61

cron

Create/manage Codex app automations in local SQLite (~/.codex/sqlite/codex-dev.db). Use to add, list, update, enable/disable, delete, run now, edit names/prompts/RRULE/cwd scopes, or inspect automation records while troubleshooting.

tools61

beads-workflow

Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.

tools61

adversarial-reviewer

Authority-gated adversarial review for non-trivial code artifacts. Surface only material, current, owned, witness-backed findings; require countercases, soundness rows, authority clearance, verification paths, and change-agenda consistency before remediation. Trigger for exhaustive review, fresh-eyes second pass, re-review after fixes, patch hardening, de novo challenge, or material fixed-point review. Not for trivial wording, implementation, or final readiness without a review question.

development61

forensic-elicitation

Forensic elicitation from prior Codex/coding-agent sessions: mine $seq/$cas/session JSONL, memories, tool traces, commits, PRs, and review receipts into provenance-preserving maps. Trigger for learning from past sessions, reconstructing what happened, extracting improvements/lessons, auditing closure, comparing memories vs traces, or resolving contradictions.

tools61

invariant-stewardship

Use before local patching when bugs, regressions, malformed state, crashes, parser failures, migrations, cache drift, protocol problems, compatibility requests, tolerant readers, fallbacks, coercions, retries, catch-and-continue, or local workarounds may broaden accepted invalid state.

testing61

latent-diver

Use before convergence for non-obvious but testable frames, analogies, inversions, or recombinations. Produces bounded frame packets with proof signals, assumptions, risks, and handoff guidance. Not for final selection, execution, routine brainstorming, wording polish, or option portfolios unless unusually distant frames are requested.

testing61

failure-memory

Use when goal or review loops repeat failures, oscillate, regress, or encounter many same-shaped compiler/test/review findings. Clusters failure signatures, memoizes solved and invalid strategies, and prevents repeated work.

development61

resolve

Intent-closed counterexample-guided review synthesis with fail-closed authority gates. Use for `$resolve`, material branch review/fix/prove/push/closure, repeated CAS/PR findings, review-driven growth, semantic-surface conservation, MBK/RC realization, or deciding exactly which review observations may change code. Raw review text is never executable: mutation requires RAC-v1 from claim to AC/CEX/RB/CEB/MBK/RC/proof/realization, and closure requires terminal closure-gate proof. Not for one-shot review, PR creation, merge/land, or isolated implementation.

development61

deckset

Generate high-fidelity Deckset markdown presentations from conversation context. Use for decks, slides, presentations, speaker notes, Deckset markdown, or converting a conversation into a narrative slide flow. Checks upstream Deckset docs/examples without volatile refresh metadata.

testing61

st

Repository-level durable graph workspace under `.ledger/st/`. Use for `$st`, one or many plans, dependency graphs, proof-carrying completion, execution-policy horizons, multi-agent allocation, same-repo/same-target-branch coordination, session-local Codex/OpenCode projections, or resuming durable work. Every material mutation requires a current plan-scoped GCR-v2 plus a workspace claim with an unexpired fencing token.

development61

footgun-finder

Read-only review lens for latent misuse hazards: APIs, defaults, flags, fallbacks, examples, config, state, cleanup, permissions, and workflows where the easy or obvious use is unsafe, surprising, irreversible, or likely to be copied wrong. Use for `$footgun-finder`, footguns, sharp edges, dangerous affordances, trap doors, misleading names, unsafe defaults, partial-success ambiguity, hidden coupling, or review requests focused on future misuse. Not for generic bugs, invariant ownership, or local simplification unless the hazard is a misuse trap.

development61

harness-memory

Capture durable, evidence-backed corrections and steering about how Codex should operate, then hand accepted rules to memory-source-notes for append-only harness admission. Use for explicit durable operating corrections, repeated harness rules, verification gates, stop rules, or escalation rules.

development61

chronicle

Allows you to view the user's screen as well as several hours of history. Use when the user makes a reference to their recent work, for which it'd be helpful to see the screen. This skill MUST be used whenever you need to resolve ambiguity in a user request, where the user hasn't specified enough context to do the task. Examples include disambiguating the specific user/app/document/error the user is referring to. You must also use this skill if the user asks about any question regarding Chronicle or asks what you can see from the screen.

testing61

simplify-and-refactor-code-isomorphically

Run a proof-heavy simplification campaign that factors code, classifies duplication, quotients proven-equivalent distinctions, ablates redundant surface, and normalizes the survivors. Use when simplification must preserve a declared observation set or exact structure. This skill treats isomorphism as an optional strict preservation relation, not as the reduction objective. Route intentional contractions of obsolete, invalid, or legacy behavior to `reduce` or `resolve` under a refinement-preserving contract.

development61

review-compression-compiler

Compile a sealed batch of intent-anchored CEX-v1 counterexamples into CEB-v2, one Minimum Behavioral Kernel, RC-v1, targeted review apertures, realization constraints, and an initial review-potential baseline. Use for repeated CAS/PR findings, same-family recurrence, behavioral quotienting, review batching, conformance planning, proof compression, MBK/RC synthesis, or review-driven growth. Read-only; never output patch hunks, mutate delivery, or admit raw review prose.

development61

evidence-discipline

Use for bug reports, PR/issue prose, reviewer comments, user diagnoses, generated summaries, memories, retrieved context, public tracker context, claimed root causes/fixes, fake-minimal repro risk, or investigations where natural-language context could anchor implementation scope.

testing61

asi

Run one civilizational-scale ASI reframing pass verbatim: expand ambition 10x, collapse to the smallest artifact preserving the 10x insight, and require a mechanism, interface, proof surface, or strategy. Use for `$asi` or adequate answers needing ambition expansion without hype.

testing61

parse

Classify a local codebase's current architecture from collector-backed, code-first evidence. Use for prompts asking what architecture a repo or slice actually uses, whether it is layered/hexagonal/MVC/plugin/pipeline/etc., what the strongest runner-up is, which coexisting patterns are directly evidenced, whether docs match implementation, or when an implementation agent needs a narrow repo-dialect preflight. Do not use for broad repo onboarding, layer removal, structural redesign, domain algebra, invariant design, implementation specs, or execution planning.

tools61

codebase-audit

Run Codex-native domain audits for security, UX/accessibility, performance, API design, copy, and CLI quality. Use for code audits, quality assessment, issue finding, pre-launch review, or explicit parallel Codex subagent audits.

tools60

fresh-eyes-blunder-pass

Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.

development57

spec-lint

Lint generated implementation specs/proposed plans for missing non-goals, weak proof, unmapped requirements, absent rollback/abort criteria, unresolved material questions, missing primary invariant, missing receipts, unaccounted subagents, skipped challenge/fresh-eyes pass, oversized audit prose, or plan churn. Use for `$spec-lint`, lint this spec, implementation-ready plan checks, proof/rollback/traceability checks, or is this more plan or better plan.

development57

spec-challenge

Run exactly one strongest project-specific invariant/adversarial challenge against a generated spec or plan, then decide whether to regenerate it. Use for `$spec-challenge`, A+ this plan, pressure-test the invariant, does this preserve X, single strongest critique, or after `$plan`/`$spec-pipeline` before implementation.

testing57

spec-gate

Decide whether a grill/handoff packet is complete enough for `$plan`, spec generation, or downstream mutation. Use for `$spec-gate`, is this ready to plan, block planning, handoff packet, decision packet, premature specs, no-grill justification, mutation gate, or underspecified questions, proof bar, scope, non-goals, rollout/rollback, and receipts.

development57

auto

Orchestrate evidence-backed autonomous improvements to the local Codex skills ecosystem. Use when asked to auto-update skills, optimize skills from session evidence, bootstrap per-skill AUTO.md policies, scan skills for improvement candidates, create autonomous skill improvement PRs, or inspect auto-update status.

development56

liminal

Use when delimited continuations or defunctionalization are central: shift/reset, prompt/control, prompts, subcontinuations, effect handlers as control operators, CPS translations, answer-type modification, abstract machines, first-orderizing higher-order interpreters, continuation runtimes, or source-backed study/research roadmaps. Do not use for ordinary async/await, generators, monads, compiler optimization, PL theory, or functional-programming questions unless delimited control, continuations, CPS/control translation, or defunctionalization is explicit.

development56

caam

Manage AI coding CLI accounts with sub-100ms switching. Use when Claude Max, GPT Pro, or Gemini Ultra rate limits require instant account swapping without browser OAuth.

tools56

accretive

Activate maximal ambition, add a non-obvious frame, then choose one dominant accretive move grounded in current project state. Preserve original accretive prompt wording verbatim except for one injectable target parameter.

data-ai56

kan

Use when universalist or the user names a concrete world/boundary requiring Kan mechanics: Kan extensions/lifts, pre/postcomposition, Freyd/AFT free-builder diagnostics, Yoneda/Coyoneda boundary representations, defunctionalized boundary IRs, codensity/density and dense probes/duality, Exact Context Doctrine, context compilation, task-indexed exchange, Context Certificates, pointwise formulas, free/cofree completions, functorial data migration, compatibility facades, lifted implementations, residual obligations, Composition Certificates, Boundary Normal Form audits, or categorical law tests. Do not use for generic architecture unless worlds, boundary kind, known side, unknown location, witness slice, proof signal, and when applicable Composition Certificate are named or must be recovered.

tools56

tracepact

Analyze OpenAI Responses API logs, Agents SDK traces, OpenTelemetry/LangSmith/Langfuse/custom spans, transcripts, and agent-loop code as effectful execution graphs. Use for agentic latency reduction, serial round-trip elimination, prompt-cache stability, model routing, tool-loop optimization, speculative execution validation, or proof-carrying rewrite design. Produces Latency Treaty IR, critical path, counterfactual schedules, prioritized fixes, instrumentation gaps, and CI-ready regression checks.

tools56

fin

Finalize GitHub PRs end-to-end: update branch/PR, confirm review conversations are resolved, monitor CI until green, squash-merge, and clean up local/remote state. Use when asked to $fin or to finish/land/merge/close a PR, watch checks or runs, squash-merge, delete the branch, and sync local state.

testing54

agent-ergonomics-and-intuitiveness-maximization-for-cli-tools

Score and rigorously improve a CLI tool's ergonomics for AI agents as the primary user. Use when "agent ergonomics", "make CLI agent-friendly", "robot mode audit", "intuitiveness scoring", "score my CLI for agents", or rebuilding a CLI's --help / --json / robot surface. Produces a sibling `__agent_ergonomics_audit/` workspace with surfaces, scorecard, heatmap, recommendations, playbook, regression tests, applied on an `agent-ergonomics-pass-N` branch.

tools54

profiling-software-performance

Rank hot paths by CPU, memory, I/O, and contention; hand the optimization skill a scored target list. Use when: profile, flamegraph, hotspot, bottleneck, p95/p99, IOPS, fsync, "why is this slow".

content-media53

puff

Launch and manage Codex Cloud tasks from the CLI, including detached background watchers that track completion. Use when users ask to run coding work in cloud/background agents, queue multiple cloud tasks, poll task status, fetch cloud diffs, apply cloud outputs locally, or pair cloud kickoff with `$cas` orchestration.

tools51

web-browser

Use when tasks need real-browser web automation in Chrome/Chromium via CDP: open or navigate URLs, click/type/select in forms, run page JS, wait for selectors, scrape structured content, capture screenshots, validate UI flows, or run measured web-browser latency checks (`bench:eval`, `bench:all`) for perf regressions.

tools51

yo

Publish provided content to a secret GitHub gist bucket and surface it to a human via a local macOS notification that opens the gist when tapped. Use when Codex needs to show or notify a human with raw text or file content, wants a gist-backed human handoff, or is explicitly invoked as `$yo`. YO accepts inline text or file paths, creates or reuses a repo-scoped secret gist, and notifies through macOS Notification Center.

development51

karpathy-loop

Use when the user wants to improve, optimize, debug, test, or iterate on a prompt, agent instruction, Claude Skill, or workflow through a measured eval loop. Runs baseline tests, creates binary success checks, changes one thing at a time, retests, keeps/reverts changes, and returns an optimized final prompt or skill.

development51

xit

Use when a repo has `.xit/` or the user asks for xit: translate git-like intents to non-interactive `xit` CLI commands (`status/diff/log --cli`, add/commit/branch/merge/cherry-pick), avoid the TUI, and do not use git unless explicitly requested.

tools51

prompt-caching

Use when designing, auditing, migrating, or fixing OpenAI agent harnesses where prompt caching, Responses API state, cached_tokens, prompt_cache_key, prompt_cache_retention, tool/schema stability, reasoning-item carryover, or compaction affect latency or cost. Do not use for generic HTTP caching, answer memoization, CDN/browser caching, or vector-store caches.

tools51

harness

Review an agentic system’s configuration and implementation quality. Use when the user wants an opinionated assessment of a system prompt, tool surface, orchestration, guardrails, context handling, or eval setup, and wants concrete recommendations or a redesign plan.

tools51

latent-move

Use this skill as a read-only workflow-starting composite skill for coding move selection before implementation. It must visibly run or emulate the sequence $latent-diver -> $creative-problem-solver -> $accretive -> $dominance, optionally using .codex/agents read-only subagents only as evidence lenses, then stop with a Dominant Move Brief. Trigger for ambiguous architecture, refactor, debugging, performance, integration, migration, stalled work, repeated failures, competing implementation paths, or explicit requests to use latent-move, latent-diver, creative-problem-solver, accretive, and dominance together. Do not edit code, apply patches, or invoke an executor from this skill.

development51

saddle-up

Continuously evaluate and improve AGENTS.md-style harness instructions through explicit-trigger OpenCode loops with an explicit model. Use when you want recurring harness reliability runs, especially for Gemini 2.5 Pro/OpenCode harness tuning, clean-repo eval cycles, curated exact-output probes, automatic eval-branch commits and PR updates for passing harness/doc changes, and external-blocker detection or regression auto-revert without scheduler/cron automation.

tools49

patch

Create micro-patches from staged git changes (minimal incision) with at least one validation signal per patch. Use when asked to split work into small .patch files, export/share diffs, or produce patches instead of commits.

testing49

operationalizing-expertise

Operationalize expert methods into corpus, quote bank, triangulated kernel, operator library, and validators. Use when distilling a methodology or mining session history into executable rules.

development49

multi-pass-bug-hunting

Systematic audit-fix-rescan cycle for comprehensive bug elimination. Use when code review, deep audit, "find all bugs", or pre-release hardening.

development49

agent-fungibility-philosophy

Fungible agent architecture for multi-agent coding. Use when scaling agent swarms, designing multi-agent workflows, recovering from agent failures, or choosing specialized vs. interchangeable agent patterns.

testing49

join

PR autopilot via `gh` only: create/manage PRs, keep branches current, enforce required CI gates, apply surgical code patches, and publish merge-ready handoff without merging. Use when asked to run or monitor PR automation, fix failing required checks, keep local/remote branch state clean, or prepare branch/PR cleanup for human merge.

tools49

guards

Generate provider-agnostic AI agent guardrail blueprints and control matrices from a use case. Use when designing or reviewing agent safety architecture, prompt-injection and tool-misuse defenses, risk-tiered human approval gates, or auditable enterprise guardrail policies using industry patterns across top providers.

tools49

deadlock-finder-and-fixer

Find and fix concurrency bugs - deadlocks, races, livelocks, await-holding-lock, database locks, LD_PRELOAD init, swarm races. Use when processes hang, tests flake, or auditing concurrency.

testing49

csctf

Convert ChatGPT, Gemini, Grok, and Claude share links to clean Markdown + HTML. Use when archiving AI conversations, preserving code fences, or publishing transcripts to GitHub Pages.

development49

codebase-pattern-extraction

Mine patterns that recur across multiple projects and generalize into reusable artifacts. Use when "I've seen this before", DRY across repos, or building shared libraries.

development49

cass

Mine past agent sessions for working prompts, decisions, and patterns. Use when "what did I ask?", "find that prompt", session archaeology, or agent history.

testing49

cass-memory

CASS Memory System (cm) for procedural memory. Use when starting non-trivial tasks, learning from past sessions, building playbooks, or preventing repeated mistakes via trauma guard.

development49

pdf-to-markdown

Convert local PDF files or folders of PDFs into Markdown files using the bundled converter in this skill. Use this when the task is PDF-to-Markdown conversion inside the current workspace. Do not use it for OCR-heavy scanned PDFs, image extraction, or unrelated PDF summarization.

content-media49

codex-upcoming-features

Fetch and summarize upcoming unreleased Codex features using a durable local clone synced from GitHub, with source-file mining as primary evidence. Use when asked for latest upcoming/openai-codex features, what is coming next but not in the latest stable release, or a live release-gap summary with links and as-of timestamp.

development49

system-performance-remediation

Restore machine responsiveness via safe, selective process cleanup. Use when system unresponsive, high CPU/load average, IO pressure, filesystem cache bloat, memory pressure from btrfs/ext4, stuck tests, competing cargo builds, confused agents in loops, swap thrashing, disk full, systemd-oomd kills, or tmux/zellij session sprawl.

development49

ubs

Run Ultimate Bug Scanner (UBS) for code review. Use when reviewing code, checking for bugs, scanning for security issues, validating AI-generated code, or pre-commit quality checks.

development49

glazer

Run a second, harder escalation pass using the original glazer prompt words verbatim. Use when prompts say `$glazer`, when a first escalation still feels incremental, or when you want the exact original rhetoric preserved while forcing a sharper replacement rather than more polish.

testing49

planning-workflow

Comprehensive markdown planning methodology for software projects. Use when starting a new project, creating implementation plans, or refining architecture before coding.

tools49

ntm

Run NTM for multi-agent tmux orchestration, work triage, robot mode, safety, coordination, and local APIs. Use when spawning swarms, dispatching work, or operating `ntm` as an agent or human operator.

development49

ux-audit

Systematic UX evaluation using Nielsen heuristics and accessibility checks. Use when reviewing UI, "is this usable", improving user experience, or pre-launch.

testing49

ghostty

Control Ghostty terminal emulator via CLI. Use when managing windows, tabs, splits, fonts, or configuration for Ghostty.

tools49

de-slopify

Remove telltale signs of AI-generated "slop" writing from documentation. Use when polishing README files, API docs, or any public-facing text to sound authentically human.

development49

security-audit-for-saas

Audit SaaS billing security: payment bypass, webhook integrity, auth gaps, RLS, secrets. Use when "security audit", "billing security", or pre-launch review.

development49

research-software

Research software tools via source code, GitHub, web. Use when creating skills, learning new tools, finding undocumented features, or bleeding-edge patterns.

tools49

idea-wizard

Generate and operationalize improvement ideas for projects. Use when brainstorming features, planning improvements, creating beads from ideas, or "what should we build next".

development49

teams

Coordinate heterogeneous MultiAgentV2 task trees with `update_plan`, `spawn_agent`, `assign_task`, `send_message`, `list_agents`, and built-in `explorer`/`worker` roles. Hand only homogeneous leaf batches to `$mesh`.

data-ai49

tame-software-complexity

Use when a task involves ambiguous or shifting software requirements, architecture or system design choices, build-vs-buy decisions, thin prototypes, incremental delivery planning, or evaluating tools/frameworks/AI systems without treating them as a silver bullet. Use it to separate essential complexity from accidental friction and to produce a grounded plan before or alongside implementation. Do not use for straightforward code edits, isolated bug fixes with clear reproduction steps, rote migrations, or purely syntactic refactors unless the user explicitly asks for broader design guidance.

tools49

repeatedly-apply-skill

Iteratively apply a named skill or slash command N times with progressive deepening. Use when "apply 10 times", "keep improving", "run again", iterative polish, improvement loop, or multi-pass refinement.

data-ai49

commit

Create micro-commits (minimal incision) with at least one validation signal per commit. Use when requests say "split this into micro commits," "stage only the minimal change and commit," "keep commits tiny while checks pass," or when parallel workers/slices need isolated, reviewable commits.

testing49

xcode-makefiles

Install strict Xcode Makefile tooling for iOS/macOS projects, including build/run/test scripts with AGENT_NAME-based per-agent isolation under build/. Use when a project needs reproducible local CLI builds without full app scaffolding.

tools49

app-creator

Orchestrate iOS/macOS app scaffolding and optional skill adoption for existing projects. Use when users want a guided wizard that can scaffold with XcodeGen and optionally install xcode-makefiles and simple-tasks.

development49

mesh

Use `$mesh` only for homogeneous leaf-batch execution over `spawn_agents_on_csv`: once planning has shaped repeated independent units, prefer one substantive row per unit with structured results and explicit concurrency.

testing49

codebase-report

Produce reusable technical architecture documents from codebase exploration. Use when onboarding, "write up what this does", architecture docs, or handoff.

development49

reality-check-for-project

Assess project status against README/plan vision. Use when "where are we", "reality check", "what's missing", "are we on track", "gap analysis", or "does this actually work".

testing49

extreme-software-optimization

Profile-driven performance optimization with behavior proofs. Use when: optimize, slow, bottleneck, hotspot, profile, p95, latency, throughput, or algorithmic improvements.

development49

select

Swarm-ready work selector: choose one source (invocation list, `SLICES.md`, or `plan-N.md`), refine it into dependency-aware atomic tasks, and emit an OrchPlan (waves + delegation) plus optional pipelines. Use for prompts like `$select`, `use $select`, `pick the next safe wave`, `pick the next ready slice`, `orchestrate workers from SLICES`, or `what should run in parallel next`. Plan-only; no writeback; orchestration-agnostic.

devops49