skills/autonomous-codex-supervision/SKILL.md
Use when planning, launching, supervising, or integrating Codex/agent implementation work: tmux/cron supervisors, TaskNode task boards, Release-the-Hounds parallel worktrees, L1 full-auto project ownership, validation gates, bounded repair, and safe checkpointing.
npx skillsauth add escapewu/skills autonomous-codex-supervisionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This is the umbrella skill for class-level Codex/agent implementation supervision. It covers the whole ladder from one supervised tmux Codex task to a project-level L1 owner that decomposes work into TaskNodes, releases parallel hounds in isolated worktrees, validates results, integrates accepted diffs, and safely checkpoints progress.
Use this skill instead of narrow one-session variants named around a specific supervisor mode, hound metaphor, TaskNode export, or global L1 run. Session-specific mechanics and historical variants live in references/; reusable scripts live in scripts/.
Load this skill when the user asks for any of these:
Do not use this skill for a quick one-off code edit that you can do directly with file/terminal tools.
Pitfall (2026-05-11): Running codex exec --full-auto "<prompt>" without a structured task spec leads to Codex choosing the minimum-effort path (e.g., only modifying prompt/markdown files instead of implementing actual functionality). This happened with MAR-10 where Codex added 18 lines to a prompt file instead of implementing group_by='trigger_status' in Python.
Mandatory before any Codex delegation:
acceptance criteria (what "done" looks like, verifiable)validation_commands (e.g., make test, specific curl calls)safety_rules / constraints (e.g., "DO NOT only modify .md files")Use for one bounded implementation task. Build a task spec before launching:
mode: semi-auto | full-auto
original_task: <user request>
normalized_goal: <concrete target>
workdir: <absolute repo path>
tmux_session: <session>
tmux_window: codex
max_rounds: 3
repo_map: <compact navigation context>
in_scope: []
out_of_scope: []
acceptance: []
validation_commands: []
safety_rules: []
stop_when: []
Semi-auto monitors, validates, reports, and stops. Full-auto may relaunch focused repair rounds until acceptance passes or a stop condition triggers.
Use when the task is too large for one Codex prompt. Create a milestone-gated task tree and save it in taskBoard.md as the execution source of truth.
TaskNodes should include: ID, title, milestone, parent, layer, status, dependencies, preconditions, input context, expected output, acceptance criteria, validation commands, safety rules, supervisor mode, max rounds, worktree, Codex session, and done evidence.
Status flow:
planned → ready → running → validating → done
validating → repair_needed → running
running/validating → blocked | failed
Milestones are gates, not topic buckets. If a milestone contains a serial chain longer than two nodes, split it into foundation, parallel implementation, and integration milestones.
Only release hounds after each target has a leash:
Use .worktrees/<task-id> and feature/<task-id> by default. Never stage .worktrees/ into the main repo.
Use only when the user has granted broad project-level authorization. L1 replaces the user for routine implementation decisions inside an approved plan, but remains bounded by that plan.
Allowed without repeated confirmation inside an approved plan:
Stop for owner decision when the task requires a new product lane, safety boundary change, secrets/private data access, destructive operation, unresolved strategy tradeoff, repeated validation failure, merge conflict requiring architecture choice, plan completion with only speculative follow-up remaining, or pushing/merging to a protected remote that the user has not explicitly included in the L1 authorization.
Inspect project rules and current git state.
Build or refresh repo map: purpose, stack, entrypoints, tests, docs/rules, task-relevant files, generated/runtime/secret-adjacent paths.
Write task spec or taskBoard before launch.
Create isolated worktree/branch if running parallel work.
ATOMIC: Launch Codex + Create Cron Supervisor (不可分割) This is ONE step. You are NOT done until BOTH sub-actions are complete. Do NOT verify launch, do NOT check output, do NOT respond to user between 5a and 5b.
/tmp/codex-taskspec-<timestamp>-<pid>.md, then run codex exec --full-auto - < /tmp/codex-taskspec-xxx.md 2>&1 with background=true, notify_on_complete=true. NEVER use "$(cat ...)" shell substitution.⚠️ If you find yourself reporting "Codex launched" without having created the cron — STOP, go back, create it NOW. This failure has occurred 3+ times historically.
Monitor tmux/cron and process state.
On exit, collect git status, diff stat, pane output, validation output, static acceptance checks, and safety scan evidence.
Update acceptance booleans in persisted state.
If incomplete and safe, relaunch with a focused repair prompt only for failed criteria.
Integrate accepted work deliberately; preserve the union of accepted behavior when hounds touched the same files.
Run milestone-level validation and safety checks.
Update taskBoard Done Evidence and create a safe checkpoint commit.
codex exec --full-auto without task specNever run codex exec --full-auto "do X" without acceptance criteria. Codex will choose the minimum-effort path (e.g., editing a prompt file instead of implementing code). Always build a task spec first with acceptance, validation_commands, and stop_when fields.
Root cause (2026-05-13): When codex exec --full-auto "$(cat TASK_SPEC.md)" is run in a background process, Codex may attempt to read additional input from stdin. Background processes have no interactive stdin, so Codex blocks indefinitely — zero files changed, process alive but idle.
Mandatory pattern — always use /tmp file + stdin redirect:
# 1. Write spec to /tmp with unique name
SPEC_FILE="/tmp/codex-taskspec-$(date +%s)-$$.md"
cp TASK_SPEC.md "$SPEC_FILE"
# 2. Launch with explicit stdin redirect (prevents stdin starvation)
codex exec --full-auto - < "$SPEC_FILE" 2>&1
NEVER do this:
# BAD: shell substitution — Codex may still try to read stdin
codex exec --full-auto "$(cat TASK_SPEC.md)" 2>&1
Why /tmp: Task specs can be large (3KB+). Shell argument substitution has length limits and doesn't close stdin. Redirecting from a file guarantees Codex receives the full spec and sees EOF on stdin immediately.
git add new filesCodex creates new files but does NOT stage or commit them. After Codex exits, git status will show new files as ?? (untracked) while modified files show as M. The supervisor must git add new files explicitly before committing. Always run git status --short after Codex exits to catch untracked files.
deliver: "origin" thread affinitydeliver: "origin" sends reports to the specific conversation thread where the cron was created. If the user moves to a different thread/channel, they won't see reports. For long-running supervisors (>1 session), consider using a fixed delivery target (e.g., a dedicated Discord channel) instead of origin. Always do a manual cronjob(action="run") immediately after creation to verify the delivery path works.
This has happened multiple times. The skill's Standard Harness Workflow step 6 explicitly says "Create a cron supervisor" and marks it NOT optional. If you find yourself reporting "Codex launched, waiting for completion" without having created a cron job — stop, go back, create it. The user should never have to ask "where's my status update?"
When the user raises any issue, do not stop at fixing the immediate problem. Always ask: "What mechanism prevents this from recurring?" Then take action (patch a skill, add a checklist step, update a rule). This is a persistent work habit, not a one-time instruction.
When Codex is triggered via Linear issue (Symphony path), the issue body must follow the structured spec format:
## Problem Statement (not hypothesis/suggestion)## Acceptance Criteria (verifiable conditions)## Constraints (include "DO NOT only modify prompt/skill markdown files")## Baseline (backtest metrics for comparison)## Target Files — Codex must self-locateWithout this structure, Codex picks the laziest path (proven by MAR-9/MAR-10 vs MAR-11 comparison).
Do NOT unify these into one path:
Both share the same quality standard (task spec + acceptance criteria + validation), just different execution environments.
This umbrella carries reusable scripts under scripts/:
generate_repo_map.py — create compact repo navigation context. If it fails, fix the script rather than silently falling back; the known-good version defines SECRET_NAME_RE as a normal re.compile(...) and must pass python -m py_compile.launch_codex_round.py — build/launch a round prompt from spec + repo map + state.supervisor_validate.py — collect validation/static-check evidence and update supervisor state.Example:
python ~/.hermes/skills/autonomous-ai-agents/autonomous-codex-supervision/scripts/generate_repo_map.py \
/absolute/path/to/project --task "<normalized task>" --json
Cron prompts must be self-contained. Include workdir, state paths, task spec, validation commands, safety rules, reporting language, delivery target, and self_job_id when known.
For L1 / full-auto Codex supervision, tmux alone is not sufficient. Before reporting that L1 supervision is active, ensure all four pieces exist: saved task spec, isolated worktree + tmux Codex, persisted supervisor state, and a cron job that monitors, validates, and launches bounded repair rounds. See references/l1-cron-supervisor-checklist.md.
After creating or updating a cron supervisor, inspect next_run_at and trigger one manual cronjob(action="run") to verify the schedule/prompt/delivery path. Prefer explicit cron expressions such as */5 * * * * over natural-language schedules when the job must run in the near future; if next_run_at is unexpectedly far away, patch the schedule immediately.
For chat-bound supervisor reports, explicitly require a non-empty report every run unless the user asked for silent watchdog behavior. Do not let the agent append [SILENT] after a real report: Hermes cron injects a [SILENT] convention, and a final response that mixes useful content with [SILENT] can be suppressed by delivery. Add wording such as: “Every run must output a non-empty report; never include [SILENT], even when there is no new progress.” After the first manual run, inspect the saved output under ~/.hermes/cron/output/<job_id>/ to confirm the response contains no accidental [SILENT] marker.
For self-pausing chat-bound supervisors, use a two-phase create/update so the returned job_id is injected into the prompt and state file. Prefer pausing completed chat-bound supervisors over removing them immediately, so final reports remain inspectable if platform delivery is swallowed.
Do not create recursively spawning cron trees. One authorized master owns continuation; mirrors, if any, are read-only.
Tests passing are not enough. Verify the original acceptance criteria:
Mandatory: Every cron supervisor prompt MUST embed this template verbatim as the required output format. Do not create cron supervisors with free-form reporting — the template ensures consistent, parseable status updates.
主人,supervisor 状态:
- 范围:<single task | TaskNode wave | L1 plan>
- 当前阶段:<TaskNode or milestone>
- Codex/hounds:<running/exited/done/blocked counts>
- Validation:<pass/fail/not run + key evidence>
- Acceptance:<complete/incomplete/blocked>
- Safety:<clean/issue>
- Next action:<wait / repair / integrate / stop for decision>
When creating a cron supervisor (step 6 of Standard Harness Workflow), copy this template into the cron prompt's "输出格式" section and instruct the cron agent to strictly follow it.
Historical narrow skills absorbed into this umbrella are preserved as support files:
references/codex-supervisor-cron-modes.mdreferences/release-the-hounds.mdreferences/tasknode-supervisor.mdreferences/global-release-the-hounds.mdreferences/codex-post-exit-checklist.mdUse those references for detailed legacy wording, examples, TradingSignal defaults, delivery-reliability notes, and tool-agnostic export instructions.
development
Use when working with the news fetcher REST API at <news-fetcher-host> for supported-site lookup, domain article discovery, URL fetching, batch fetch/crawl workflows, fetch history queries, and Bearer-authenticated integration examples.
development
create and refresh repository-specific development standards for an existing local codebase. use when the user wants to analyze a local repository, extract coding conventions from real files, generate docs/ai-dev-standards, create code review checklists, or update existing agents.md or claude.md files so future coding agents load the right standards before development. do not use for generic programming advice detached from a repository.
documentation
analyze postgresql or mysql database schemas from ddl files, schema-only dumps, migration sql, or read-only database metadata. use when the user wants table structure summaries, primary keys, foreign keys, indexes, inferred table relationships, er diagrams, dbml, mermaid erd, schema documentation, or database relationship analysis for postgres/mysql schemas.
tools
Replace with description of the skill and when Claude should use it.