MJ-skill/codex-dispatch-watchdog/SKILL.md
Use for ANY codex CLI dispatch via dispatch wrapper (no time threshold; presence-of-risk triggers, not estimated wall — stdin-EOF stalls occur at <60s). Combines internal log-inactivity watchdog wrapper + external Claude-session cron probe + sentinel hook enforcement. Detects stalls in ≤60-270s vs hours-without-detection failure mode.
npx skillsauth add develata/deve-skills codex-dispatch-watchdogInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Triggers — set up probe for any of:
dispatch_codex.sh (project wrapper) — regardless of expected wall timecodex exec with expected wall >2 min (rare — prefer wrapper)Does NOT trigger:
codex exec "<short prompt>" whose stdout returns in seconds — stall is immediately visiblecodex exec resume <session-id> for short queries (inline visibility)Why no time threshold for wrapper: stdin-EOF silent-stall has occurred at <60s wall (PID 70280 2026-05-15 initial freeze). Time-based filtering misses these; presence-of-risk is the only safe trigger. Cost: probe = 1 cron tick / 3 min for dispatch duration ≈ trivial vs cost of an undetected stall (updated 2026-05-19; was 1/min before).
codex exec can fail invisibly in multiple ways — process alive but log frozen, or process exits cleanly but produces no deliverable. Single-mechanism defenses fail:
| Defense | Failure mode |
|---|---|
| task-notification only | Fires on process exit only; silent-stall never exits |
| Log inactivity watchdog inside dispatch wrapper | Wrapper itself can be killed externally (SessionStart hook re-fire, OOM, kill -9 of parent) — no marker written |
| Single ScheduleWakeup probe | One-shot; cascade depends on Claude correctly re-arming each fire (failed multiple times in practice 2026-05-16/17) |
| Exit-code-based success check | codex CAN exit 0 with no deliverable produced (acknowledges task, runs reasoning, exits cleanly without writing the expected output file) — wrapper sees exit 0 + no stall → reports "completed" → Claude believes failure didn't happen until they check artifact path |
Defense in depth required: (1) wrapper-internal watchdog + (1.5) deliverable-existence check + (2) Claude-session-external cron probe + (3) PreToolUse hook enforcing probe presence before dispatch + (4) Claude post-completion re-verification protocol.
Historical baseline: 2026-05-15 PID 70280 wasted 2h17min in stdin EOF silent-stall before user-led discovery. 2026-05-17 E1 v2 wrapper externally killed → no .stall marker → 1h 50min undetected. 2026-05-19 #52+#53 cowork: codex completed successfully but Claude checked artifact paths 3s after dispatch start (before codex started exec calls) and treated the early snapshot as final → wrote a "codex no-show" report based on stale data; the actual codex report was written ~5 min later. Root cause: Claude treated pre-completion snapshot as truth instead of treating the wrapper completion notification as the trigger to re-verify.
scripts/dispatch_codex.sh <log_path> <stall_min> <prompt> [expected_output_path] (project-specific implementation). Contract:
/dev/null (defends against stdin-EOF stall — known codex bug)stall_min minutes → kill -9 codex + write <log_path>.stall markerDISPATCH_STATUS=completed (exit N) OR DISPATCH_STATUS=stalled to logWhy insufficient alone: (a) external kill of the wrapper process leaves no markers; (b) codex can exit 0 with no deliverable written (next layer).
ADDED 2026-05-19 after #52+#53 cowork failure-mode discovery.
When the caller passes the optional expected_output_path argument to the wrapper, after codex exits cleanly the wrapper checks whether the deliverable file exists with size > 0. If NOT:
DISPATCH_STATUS=no_output (codex exit N but expected deliverable not produced) to logEXPECTED_OUTPUT_EXISTS, EXPECTED_OUTPUT_SIZE, DIAGNOSIS=...)MANDATORY when caller knows the expected output path — almost always true for cowork dispatches following a brief that specifies an output file.
Why: codex can complete reasoning, output acknowledgment + analysis to STDOUT (which goes to log), but never invoke write_file for the deliverable — particularly when the prompt asks codex to "write report" but codex interprets it as "describe the report". Exit-code-based success looks healthy in this scenario; deliverable-existence check catches it.
MANDATORY for every wrapper dispatch — no time threshold (see Scope/§"Why no time threshold": stdin-EOF stalls occur at <60s, so wall-time gating is unsafe). Run AT SAME TURN as dispatch_codex.sh invocation:
CronCreate(
cron="*/3 * * * *", # every 3 minutes (was * * * * * pre-2026-05-19)
recurring=true,
durable=false, # session-only OK; cron survives Claude turns
prompt=<probe instructions below>
)
Why CronCreate not ScheduleWakeup (first-principles): ScheduleWakeup is one-shot — cascade continuation depends on Claude manually re-arming each fire, an AND-chain failure mode (any miss breaks the cascade; observed multiple times). Cron fires deterministically every tick without Claude state.
Why 3-min cadence (updated 2026-05-19; was 60s before): 60s cadence produces excessive REPL noise during long codex tasks (30+ fires for a 30-min dispatch, each forcing context refresh). 3-min cadence ~10 fires/30-min still well within 5-min Anthropic prompt cache TTL. Wrapper Layer 1 watchdog (60s log-silence kill) provides faster intra-dispatch stall detection independently; cron probe is the orthogonal external observer for wrapper-death case, where 3-min detection latency is acceptable. Threshold pair: cron */3 + hook 270s = 1.5× cron with ~10% scheduler jitter margin.
Probe prompt contents (each tick):
touch <sentinel_file> — proves probe alive (required by Layer 3 hook)tail -3 <log_file> — latest activitygrep DISPATCH_STATUS <log_file> — completion linels <expected_artifact_path> — has deliverable landedls <log_basename>.stall — watchdog kill markerNOW=$(date +%s); MTIME=$(stat -f %m <log>); echo log_age_seconds=$((NOW-MTIME)) — silent-stall detectionCOUNTER=.claude/codex_probe_terminal_count_<tag>.stall exists OR DISPATCH_STATUS= present OR artifact landed → TERMINAL:
$COUNTER does NOT exist → write 1 to $COUNTER; report [<task>] TERMINAL DETECTED (tick 1/2 — grace; will self-delete on next fire if still terminal). Claude should advance now.$COUNTER exists → report [<task>] TERMINAL DETECTED (tick 2/2 — self-deleting now), then CronDelete <own_id> and rm $COUNTERlog_age > 5 * 60s → STALL → same 2-tick grace pattern using the same counter filePROBE_OK: ...; if $COUNTER exists (state recovered, no longer terminal) → rm $COUNTER to resetPreToolUse hook on Bash matcher, regex matches dispatch_codex.sh:
<sentinel_file> exists AND mtime within threshold (~1.5× cron interval; e.g. 270s for */3 cron — matches project hook 2026-05-19)# OVERRIDE_HOOK_<reason>) present in command → allow (rare cases; auto-mode classifier may block bypass without user authorization)Implementation: project supplies hook script (e.g., scripts/hooks/codex_probe_guard.sh) reading sentinel mtime.
T=0 create cron probe (Layer 2) + touch sentinel
T=+ε dispatch via wrapper (Layer 1; Layer 3 hook validates Layer 2 alive)
T=~3min first cron fire — touch sentinel + status check
T=~6min second cron fire — same
...
T=N min codex completes / stalls / errors
→ probe detects DISPATCH_STATUS / .stall marker / log silence > threshold
→ Fire 1 (tick 1/2): writes counter file, reports "TERMINAL DETECTED (tick 1/2 — grace)"
→ Fire 2 (≥3 min later, tick 2/2): re-reports terminal, then self-CronDelete + rm counter
→ Claude (on any wake): step-0 full state sweep, advance state machine.
Claude MAY also CronDelete explicitly any time after tick 1 (faster cleanup);
redundant-delete is safe (CronDelete is idempotent for already-gone IDs).
Each dispatch gets its own cron probe with a unique tag — recommended format:
codex_probe_<short_task>_<utc_yyyymmddThhmmZ> (e.g., codex_probe_e1_r2_brief_20260518T143022Z).
Why: CronList can disambiguate parallel probes; the timestamp identifies orphan probes during GC.
Parallel dispatches each get their own probe (per unique tag). Multi-account (CODEX_HOME=~/.codex-b) uses the same pattern — one probe per dispatch regardless of account.
Soft cap: ≤4 concurrent probes per Claude session. >4 → serialize (per codex-orchestration § 3 Parallel dispatch threshold — user permission UI floods at ≥3 parallel codex jobs anyway).
Evolution:
CronDeleted on first terminal detect → 2026-05-18 E2 R0 silent-stuck (fire-then-delete is single-point-of-failure; terminal message lost in transit, probe gone, chain stuck)CronDelete after advance → reliable but Claude often forgot, leaving probes firing forever (orphan accumulation, REPL noise)CronDelete + rm counter). Guarantees ≥2 terminal fires (≥3 min apart) before deletion: the redundant fire path eliminates the v1 single-point failure while restoring auto-cleanup.Why this satisfies both constraints:
CronDelete explicitly any time after tick 1 (faster cleanup); redundant delete is safeCounter file convention: .claude/codex_probe_terminal_count_<tag> (gitignored; same <tag> as the probe's unique-tag naming). On non-terminal tick where counter exists (i.e., state flipped back to healthy — should not happen in practice, but defensive), probe rms the counter to reset.
Accumulation defense is now: 2-tick auto-delete (primary) + orphan-probe GC (§next, secondary safety net for v3 counter-file corruption / probe killed mid-cycle).
In multi-hour autonomous mode where Claude session may compact / restart / lose track:
CronList → for each codex_probe_* entry, verify the dispatch is still active (check sentinel file mtime + corresponding log file)DISPATCH_STATUS= line OR .stall marker → probe is orphaned → CronDeleteCronList shows no codex_probe_* entries but a dispatch is ongoing (log file growing) → probe failed to spawn → STOP and re-create probe before any further dispatchEach running probe = 1 cron tick / 3 min ≈ trivial cost (was 1/min pre-2026-05-19; reduced for less REPL noise). With ≤4 concurrent + per-dispatch 30-60 min lifetime, no quota concern for a 20h autonomous session. The concern is orphan accumulation, not active count.
ADDED 2026-05-19 after Claude-side procedural error in #52+#53 cowork: Claude checked artifact paths 3s after dispatch start (before codex started exec calls), saw only the brief file + 1700-byte log with just "I'll read the brief first...", concluded "codex no-show", and wrote a fallback report. Codex actually completed correctly 5 minutes later; the early snapshot was misleading.
The rule (binding when dispatching codex via wrapper):
run_in_background:true + sleep 3 snapshots are NEVER ground truth. Pre-completion snapshots can show: (a) just the brief file, (b) log with only acknowledgment line, (c) no deliverable yet. These are normal mid-dispatch states.DISPATCH_STATUS=... (completed / stalled / no_output)The mistake pattern to avoid:
dispatch (run_in_background:true)
→ sleep 3
→ ls $RUN_DIR/ → only brief + 1700-byte log
→ CONCLUSION (premature): codex no-show
→ write _CLAUDE_REPORT.md saying "codex failed"
→ commit
→ ... 5 min later: completion notification arrives, codex actually completed
correctly — but Claude has already moved past
Right pattern:
dispatch (run_in_background:true)
→ keep working on parallel tasks (Claude's own analysis, etc.)
→ wait for explicit completion notification
→ THEN re-check the run dir for deliverable
→ form judgment based on artifact + log DISPATCH_STATUS, not premature snapshot
If a partial-progress snapshot is needed, it should be informational only, with explicit "this is mid-dispatch, not conclusive" framing in any user-facing report.
CronDelete after advance → under v3 the probe self-cleans on the 2nd terminal tick, so explicit CronDelete after advance is just faster cleanup (tidier, not required)CronDeletes on the FIRST terminal tick (deprecated v1 pattern, 2026-05-18 — the single fire's message can be lost in transit → silent-stuck); v3 self-deletes only on the SECOND terminal tick, after a grace re-reporttail -f log | grep -m 1 thinking it'll exit on completion (it won't — log goes quiet, pipe hangs)expected_output_path for cowork briefs that specify an output file — gives up Layer 1.5 protection for no good reason.codex-orchestration § 3 Call Efficiency Rules — dispatch invocation basicscodex-account-switching — multi-account isolationautonomous-chain-control-flow — wake-signal semantic handling in autonomous chains (this skill = probe mechanics; that one = how Claude reacts to a probe fire)loop skill — alternative for recurring tasks (NOT a substitute; codex dispatch monitoring needs cron probe pattern specifically)CLAUDE.mdtools
Collect and audit Codex token usage with a bundled Python CLI and optional Windows batch launchers. Use this skill when the user asks to check Codex token usage, generate daily token audit logs, calculate monthly CostUSD totals, review Codex spending, or run Codex token usage scripts on Windows, Bash, WSL, or Linux.
tools
Use when giving the user an INLINE reply that carries a trade-off, a decision, a verdict, or a non-trivial finding (decision brief / round verdict / failure root-cause). NOT for "done"/status confirmations, one-line answers, or pure data dumps. Forces a compact decision-brief shape and blocks internal tool-name / file-path bleed into user-facing text.
development
Use for cross-file or cross-chapter terminology audits and corpus-wide term unification in thesis/paper sources — extract candidate term drift, build a decision queue, classify each occurrence, apply accepted replacements safely, and verify counts/build. Trigger on "术语审计", "术语统一", "术语一致性", "逐词审", "这个词全文怎么用", "把 X 全文改成 Y", "terminology audit", or "unify term X". Do NOT use for ordinary prose drafting or a single known-location edit; use academic-writing for prose quality and claim-boundary judgment.
testing
Use BEFORE proposing any multi-hour task, recommending compromise-vs-clean choice, citing a "named" rule/mechanism, or skipping plan iteration. Five guards against Claude's training-time biases that systematically override user-aligned principles. Triggered by proposal, recommendation, citation, or plan authoring.