/SKILL.md
Autonomous long-running iteration for Codex CLI. Use when the user wants Codex to plan or run an unattended improve-verify loop toward a measurable or verifiable outcome, especially for overnight runs; it also covers repeated debugging, fixing, security auditing, and ship-readiness workflows. Do not use for ordinary one-shot coding help or casual Q&A.
npx skillsauth add leo-lilinxiao/codex-autoresearch codex-autoresearchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Autonomous goal-directed iteration. Modify -> Verify -> Keep/Discard -> Repeat.
loop, plan, debug, fix, security, ship, or exec, and parse any inline config from the prompt.references/core-principles.md and references/structured-output-spec.md. For active execution modes (loop, debug, fix, security, ship, exec), also load references/runtime-hard-invariants.md.references/session-resume-protocol.md for every interactive launch or existing-run control path, before deciding fresh vs resumablereferences/environment-awareness.md before choosing hardware-sensitive workreferences/interaction-wizard.md for every new interactive launch (loop, debug, fix, security, ship) before execution beginsreferences/results-logging.md only when debugging TSV/state semantics or helper behavior directlylessons, pivot, health-check, parallel, web-search, hypothesis-perspectives).<skill-root>/scripts/...), not the target repo root. In the common repo-local install this means commands such as python3 .agents/skills/codex-autoresearch/scripts/autoresearch_init_run.py --repo <primary_repo> --workspace-root <workspace_root> .... New-run helpers (autoresearch_init_run.py and autoresearch_runtime_ctl.py launch/create-launch) require both --repo <primary_repo> and --workspace-root <workspace_root>. Existing-run control-plane helpers (autoresearch_resume_check.py, autoresearch_resume_prompt.py, autoresearch_supervisor_status.py, autoresearch_health_check.py, autoresearch_runtime_ctl.py status/stop/start) require --repo <primary_repo> and resolve the workspace-owned Results directory from the repo-local pointer plus canonical context. autoresearch_launch_gate.py --repo <primary_repo> is the pre-wizard gate: it returns fresh for a clean repo with no prior artifacts and otherwise uses the same pointer/context recovery path.| Mode | Purpose | Primary Reference |
|------|---------|-------------------|
| loop | Run the autonomous improvement loop | references/loop-workflow.md |
| plan | Convert a vague goal into a launch-ready config | references/plan-workflow.md |
| debug | Hunt bugs with evidence and hypotheses | references/debug-workflow.md |
| fix | Iteratively reduce errors to zero | references/fix-workflow.md |
| security | Run a structured security audit | references/security-workflow.md |
| ship | Gate and execute a ship workflow | references/ship-workflow.md |
| exec | Non-interactive CI/CD mode with JSON output | references/exec-workflow.md |
Use Mode: <name> in the prompt to force a specific subworkflow.
For the generic loop, the following fields are needed internally. Codex infers them from the user's natural language input and repo context, then fills gaps through guided conversation:
GoalScopeMetricDirectionVerifyOptional but recommended:
GuardIterationsRun tagStop conditionFor every new interactive run, use the wizard contract in references/interaction-wizard.md.
$codex-autoresearch for interactive autoresearch launches and follow-up controls.get_goal; reuse a matching non-complete current goal, or call create_goal with the confirmed objective when no goal exists. If an existing goal cannot be reused, surface it in the confirmation summary before launch and do not create a second one. Mark the goal complete with update_goal only when the autoresearch stop condition is actually satisfied; mark it blocked only when the run truly cannot continue without external input or an environment change. Use the shared helper scripts (autoresearch_init_run.py --repo <primary_repo> --workspace-root <workspace_root>, autoresearch_record_iteration.py, autoresearch_select_parallel_batch.py, autoresearch_supervisor_status.py --repo <primary_repo>) and do not create launch/runtime control artifacts.autoresearch_runtime_ctl.py launch --repo <primary_repo> --workspace-root <workspace_root> to persist the confirmed launch manifest and start the detached runtime controller in one step, then return a short handoff summary instead of tailing or polling the run unless the user explicitly asked you to wait. Do not create or mutate official Codex goals for background runs; the runtime controller owns detached continuation. The runtime itself should execute non-interactive codex exec sessions with the generated runtime prompt supplied on stdin. Detached sessions default to danger_full_access (--dangerously-bypass-approvals-and-sandbox) unless the user explicitly asks for the sandboxed workspace_write path. If the mini-wizard outcome is "fresh start", call autoresearch_runtime_ctl.py launch --repo <primary_repo> --workspace-root <workspace_root> --fresh-start so prior persistent run-control artifacts are archived as part of the same handoff.autoresearch-results/state.json internally before continuing. Background start already performs that sync automatically before it relaunches nested Codex sessions; autoresearch_set_session_mode.py remains an internal/scripted recovery helper, not a normal user-facing step.workspace_root from the launch context: if Codex started inside a git repo, use that repo root; otherwise use the current launch directory. Do not silently widen to a parent workspace just because sibling repos or old artifacts exist. Only widen when the user explicitly confirms a broader multi-repo workspace, and show the resulting Results directory in the confirmation summary.autoresearch-results/ artifacts at the same time.python3 <skill-root>/scripts/autoresearch_hooks_ctl.py status and then follow the readiness flow in references/interaction-wizard.md. Capture the first startup_tip_needed value from that status; if it is true, include one product-facing launch tip in the confirmation summary. If setup is missing, stale, disabled, or untrusted, run python3 <skill-root>/scripts/autoresearch_hooks_ctl.py install before clarification continues. Treat setup details as internal preparation unless a setup failure blocks launch. Use model-visible goal tools when they are actually available.status, stop, or resume requests, stay on the same skill entry. status and stop apply to background runs only; foreground runs stay in the current session.exec remains the advanced / CI path. It is fully specified upfront and does not use the interactive handoff.loop, debug, fix, security, and ship, scan the repo, run the session-resume launch gate, and ask at least one repo-grounded confirmation round before the run starts. Load and follow references/interaction-wizard.md for every new interactive launch. The launch wizard must include an explicit run-mode choice: foreground or background. exec mode is the exception: it is fully configured upfront and must not stop for a launch question.autoresearch_runtime_ctl.py launch. Background calls autoresearch_runtime_ctl.py launch --repo <primary_repo> --workspace-root <workspace_root>, creating the confirmed launch manifest and detached runtime as a single script-level action; after launch, return a short handoff summary and do not monitor in the foreground unless explicitly asked. Background must not create or update official Codex goals. Detached sessions use the confirmed launch manifest's execution_policy and default to danger_full_access unless the user explicitly asks for sandboxed workspace_write. If the chosen background path is a fresh start after recovery analysis, use autoresearch_runtime_ctl.py launch --repo <primary_repo> --workspace-root <workspace_root> --fresh-start so stale persistent run-control artifacts are archived automatically. exec mode has no launch question; once safety checks pass, it begins immediately.go in either foreground or background mode, do not pause mid-run to ask anything -- not for clarification, not for confirmation, not for permission. If you encounter ambiguity during the loop, apply best practices and keep going. The user may be asleep.git reset --hard HEAD~1 is allowed; otherwise use git revert --no-edit HEAD.Iterations: N.references/autonomous-loop-protocol.md Stop Conditions for the full definition).references/runtime-hard-invariants.md as the primary runtime checklist. Foreground's core persistent artifacts are autoresearch-results/results.tsv, autoresearch-results/state.json, autoresearch-results/context.json, and autoresearch-results/lessons.md; background also uses autoresearch-results/launch.json, autoresearch-results/runtime.json, and autoresearch-results/runtime.log.references/pivot-protocol.md instead of brute-force retrying.autoresearch-results/results.tsv, autoresearch-results/state.json, autoresearch-results/context.json, or runtime-control files. Always call them via the skill-bundle path (<skill-root>/scripts/...); never call bare scripts/autoresearch_*.py from the target repo root unless the skill bundle itself is actually installed there.exec mode, never leave repo-root state artifacts behind. If helper scripts need state, use the exec scratch path and explicitly clean it up before exit. New schema artifacts still belong under the workspace-owned autoresearch-results/ directory; legacy repo-root artifacts trigger the unsupported-layout error unless the user explicitly chooses a fresh start.references/runtime-hard-invariants.md, references/core-principles.md, and the selected mode workflow from disk before the next iteration. Do not rely on memory of those documents after compaction.references/runtime-hard-invariants.md. Use Phase 8.7 of references/autonomous-loop-protocol.md only for the detailed re-anchoring procedure. If any item fails, re-read all loaded runtime docs from disk before continuing.Every mode should follow references/structured-output-spec.md.
Minimum requirement:
exec, emit no prose; every assistant-visible payload must be one of the JSON lines defined in references/exec-workflow.md,$codex-autoresearch
I want to get rid of all the `any` types in my TypeScript code
$codex-autoresearch
I want to make our API faster but I don't know where to start
$codex-autoresearch
pytest is failing, 12 tests broken after the refactor
Codex scans the repo, asks targeted questions to clarify your intent, asks you to choose foreground or background for interactive runs, then starts the loop. You never need to write key-value config.
references/core-principles.mdreferences/runtime-hard-invariants.mdreferences/loop-workflow.mdreferences/autonomous-loop-protocol.mdreferences/interaction-wizard.mdreferences/structured-output-spec.mdreferences/modes.mdreferences/plan-workflow.mdreferences/debug-workflow.mdreferences/fix-workflow.mdreferences/security-workflow.mdreferences/ship-workflow.mdreferences/exec-workflow.mdreferences/results-logging.mdreferences/lessons-protocol.mdreferences/pivot-protocol.mdreferences/web-search-protocol.mdreferences/environment-awareness.mdreferences/parallel-experiments-protocol.mdreferences/session-resume-protocol.mdreferences/health-check-protocol.mdreferences/hypothesis-perspectives.mdtools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.