skills/task-orchestrator/SKILL.md
Autonomous multi-agent task orchestration with dependency analysis, parallel tmux/Codex execution, and self-healing heartbeat monitoring. Use for large projects with multiple issues/tasks that need coordinated parallel execution.
npx skillsauth add jdrhyne/agent-skills task-orchestratorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Autonomous orchestration of multi-agent builds using tmux + Codex with self-healing monitoring.
Load the senior-engineering skill alongside this one for engineering principles.
A JSON file defining all tasks, their dependencies, files touched, and status.
{
"project": "project-name",
"repo": "owner/repo",
"workdir": "/path/to/worktrees",
"created": "2026-01-17T00:00:00Z",
"model": "gpt-5.2-codex",
"modelTier": "high",
"phases": [
{
"name": "Phase 1: Critical",
"tasks": [
{
"id": "t1",
"issue": 1,
"title": "Fix X",
"files": ["src/foo.js"],
"dependsOn": [],
"status": "pending",
"worktree": null,
"tmuxSession": null,
"startedAt": null,
"lastProgress": null,
"completedAt": null,
"prNumber": null
}
]
}
]
}
dependsOn array enforces ordering# 1. Create working directory
WORKDIR="${TMPDIR:-/tmp}/orchestrator-$(date +%s)"
mkdir -p "$WORKDIR"
# 2. Clone repo for worktrees
git clone https://github.com/OWNER/REPO.git "$WORKDIR/repo"
cd "$WORKDIR/repo"
# 3. Create tmux socket
SOCKET="$WORKDIR/orchestrator.sock"
# 4. Initialize manifest
cat > "$WORKDIR/manifest.json" << 'EOF'
{
"project": "PROJECT_NAME",
"repo": "OWNER/REPO",
"workdir": "WORKDIR_PATH",
"socket": "SOCKET_PATH",
"created": "TIMESTAMP",
"model": "gpt-5.2-codex",
"modelTier": "high",
"phases": []
}
EOF
# Fetch all open issues
gh issue list --repo OWNER/REPO --state open --json number,title,body,labels > issues.json
# Group by files mentioned in issue body
# Tasks touching same files should serialize
# For each task, create isolated worktree
cd "$WORKDIR/repo"
git worktree add -b fix/issue-N "$WORKDIR/task-tN" main
SOCKET="$WORKDIR/orchestrator.sock"
# Create session for task
tmux -S "$SOCKET" new-session -d -s "task-tN"
# Launch Codex (uses gpt-5.2-codex with reasoning_effort=high from ~/.codex/config.toml)
# Note: Model config is in ~/.codex/config.toml, not CLI flag
tmux -S "$SOCKET" send-keys -t "task-tN" \
"cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter
#!/bin/bash
# check_progress.sh - Run via heartbeat
WORKDIR="$1"
SOCKET="$WORKDIR/orchestrator.sock"
MANIFEST="$WORKDIR/manifest.json"
STALL_THRESHOLD_MINS=20
check_session() {
local session="$1"
local task_id="$2"
# Capture recent output
local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)
# Check for completion indicators
if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then
echo "DONE:$task_id"
return 0
fi
# Check for errors
if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then
echo "ERROR:$task_id"
return 1
fi
# Check for stall (prompt waiting for input)
if echo "$output" | grep -qE "(\? |Continue\?|y/n|Press any key)"; then
echo "STUCK:$task_id:waiting_for_input"
return 2
fi
echo "RUNNING:$task_id"
return 0
}
# Check all active sessions
for session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do
check_session "$session" "$session"
done
When a task is stuck, the orchestrator should:
Waiting for input → Send appropriate response
tmux -S "$SOCKET" send-keys -t "$session" "y" Enter
Error/failure → Capture logs, analyze, retry with fixes
# Capture error context
tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log"
# Kill and restart with error context
tmux -S "$SOCKET" kill-session -t "$session"
tmux -S "$SOCKET" new-session -d -s "$session"
tmux -S "$SOCKET" send-keys -t "$session" \
"cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter
No progress for 20+ mins → Nudge or restart
# Check git log for recent commits
cd "$WORKDIR/$task_id"
LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null)
# If no commits in threshold, restart
# Add to cron (every 15 minutes)
cron action:add job:{
"label": "orchestrator-heartbeat",
"schedule": "*/15 * * * *",
"prompt": "Check orchestration progress at WORKDIR. Read manifest, check all tmux sessions, self-heal any stuck tasks, advance to next phase if current is complete. Do NOT ping human - fix issues yourself."
}
# 1. Fetch issues
gh issue list --repo OWNER/REPO --state open --json number,title,body > /tmp/issues.json
# 2. Analyze for dependencies (files mentioned, explicit deps)
# Group into phases:
# - Phase 1: Critical/blocking issues (no deps)
# - Phase 2: High priority (may depend on Phase 1)
# - Phase 3: Medium/low (depends on earlier phases)
# 3. Within each phase, identify:
# - Parallel batch: Different files, no deps → run simultaneously
# - Serial batch: Same files or explicit deps → run in order
Write manifest.json with all tasks, dependencies, file mappings.
# Create worktrees for Phase 1 tasks
for task in phase1_tasks; do
git worktree add -b "fix/issue-$issue" "$WORKDIR/task-$id" main
done
# Launch tmux sessions
for task in phase1_parallel_batch; do
tmux -S "$SOCKET" new-session -d -s "task-$id"
tmux -S "$SOCKET" send-keys -t "task-$id" \
"cd $WORKDIR/task-$id && codex --model gpt-5.2-codex-high --yolo '$PROMPT'" Enter
done
Heartbeat checks every 15 mins:
# When task completes successfully
cd "$WORKDIR/task-$id"
git push -u origin "fix/issue-$issue"
gh pr create --repo OWNER/REPO \
--head "fix/issue-$issue" \
--title "fix: Issue #$issue - $TITLE" \
--body "Closes #$issue
## Changes
[Auto-generated by Codex orchestrator]
## Testing
- [ ] Unit tests pass
- [ ] Manual verification"
# After all PRs merged or work complete
tmux -S "$SOCKET" kill-server
cd "$WORKDIR/repo"
for task in all_tasks; do
git worktree remove "$WORKDIR/task-$id" --force
done
rm -rf "$WORKDIR"
| Status | Meaning |
|--------|---------|
| pending | Not started yet |
| blocked | Waiting on dependency |
| running | Codex session active |
| stuck | Needs intervention (auto-heal) |
| error | Failed, needs retry |
| complete | Done, ready for PR |
| pr_open | PR created |
| merged | PR merged |
{
"project": "nuri-security-framework",
"repo": "jdrhyne/nuri-security-framework",
"phases": [
{
"name": "Phase 1: Critical",
"tasks": [
{"id": "t1", "issue": 1, "files": ["ceo_root_manager.js"], "dependsOn": []},
{"id": "t2", "issue": 2, "files": ["ceo_root_manager.js"], "dependsOn": ["t1"]},
{"id": "t3", "issue": 3, "files": ["workspace_validator.js"], "dependsOn": []}
]
},
{
"name": "Phase 2: High",
"tasks": [
{"id": "t4", "issue": 4, "files": ["kill_switch.js", "container_executor.js"], "dependsOn": []},
{"id": "t5", "issue": 5, "files": ["kill_switch.js"], "dependsOn": ["t4"]},
{"id": "t6", "issue": 6, "files": ["ceo_root_manager.js"], "dependsOn": ["t2"]},
{"id": "t7", "issue": 7, "files": ["container_executor.js"], "dependsOn": []},
{"id": "t8", "issue": 8, "files": ["container_executor.js", "egress_proxy.js"], "dependsOn": ["t7"]}
]
}
]
}
Parallel execution in Phase 1:
Parallel execution in Phase 2:
--model gpt-5.2-codex-highWhen using codex exec --full-auto, the sandbox:
git push fails with "Could not resolve host"~/nuri_workspaceThe heartbeat should check for:
username@hostname path %, worker is donegit log @{u}.. --oneline shows commits not on remoteWhen detected, the orchestrator (not the worker) should:
gh pr create# In heartbeat, for each task:
cd /tmp/orchestrator-*/task-tN
if tmux capture-pane shows shell prompt; then
# Worker finished, check for unpushed work
if git log @{u}.. --oneline | grep -q .; then
git push -u origin HEAD
gh pr create --title "$(git log --format=%s -1)" --body "Closes #N" --base main
fi
fi
development
Manage Zendesk tickets, users, and support workflows through the Zendesk API. Use when searching tickets, updating support state, checking users, or exporting queue data.
tools
Query and manage Salesforce CRM data via the Salesforce CLI (`sf`). Run SOQL/SOSL queries, inspect object schemas, create/update/delete records, bulk import/export, execute Apex, deploy metadata, and make raw REST API calls.
development
Best practices for Remotion video creation in React — compositions, sequences, animation, timing, and rendering. Use when building, reviewing, or debugging Remotion videos.
testing
Create structured plans for multi-task projects that can be used by the task-orchestrator skill. Use when breaking down complex work into parallel and sequential tasks with dependencies.