archived/skills/burst-supervisor/SKILL.md
Long-running iterative supervisor — dispatch work items to polecat workers across multiple bursts with state recovery.
npx skillsauth add nicsuzor/academicops burst-supervisorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Iterative supervisor for long-running workflows. Dispatches work items to polecat workers in small batches across multiple invocations, with full state recovery between bursts.
State lives in the tracking task body. The tracking task's body carries structured state as a YAML code block (queue, dispatches, counters) plus a human-readable progress log. PKB strips unrecognized frontmatter fields, so all supervisor state must be in the body. Any agent can resume by reading the task — no external state files needed.
Dispatch via polecat. Workers are invoked through polecat run -t <task-id>, which provides worktree isolation, agent invocation (Claude or Gemini via -g), auto-finish, and transcript capture. The supervisor never calls agent CLIs directly.
Burst model. Each invocation is idempotent: check active dispatches, evaluate completed results, dispatch new items, persist state, HALT. Workers run asynchronously — the supervisor never waits for them. Schedule recurring bursts with /loop 20m /burst-supervisor <tracking-task-id>. Each /loop tick re-enters the same phases and picks up wherever the last burst left off.
/burst-supervisor <tracking-task-id> # Resume existing supervisor
/burst-supervisor init <workflow-description> # Initialize new supervisor
Read the tracking task:
task = mcp__pkb__get_task(id=<tracking-task-id>)
Parse the supervisor state from the YAML code block in the task body (fenced with ```yaml ... ```). PKB strips unrecognized frontmatter fields, so supervisor state lives in the body, not frontmatter. If no YAML code block with supervisor state exists, this task hasn't been initialized — error and halt.
Key state fields (inside the body YAML block):
queue[] — ordered work items with per-item statusactive_dispatches[] — tasks sent to workers, not yet returnedconfig — burst size, max attempts, worker typeplan — aggregate counters (completed, failed, remaining)For each entry in supervisor.active_dispatches:
mcp__pkb__get_task(id=dispatch.task_id)done or merge_ready: proceed to evaluation (Phase 2a)in_progress:
failedstatus=pending, increment attemptsactive or ready (never claimed):
pending.No code. No regex. No scoring functions. Evaluation is a semantic judgment call — the supervisor agent reads the worker's output and decides whether it meets the workflow's criteria. This is what agents are good at.
For each completed worker task:
Step 1: Gather evidence. Read the actual work product:
gh pr diff <pr-number>git diff main..polecat/<task-id> -- <relevant-files>Step 2: Read the evaluation criteria. The tracking task body contains an Evaluation Criteria section written in plain language. These criteria are workflow-specific — the person who initialized the supervisor wrote them. Examples:
Read these criteria. They are the standard — not a checklist to mechanically tick, but guidance for your judgment.
Step 3: Make the call. Apply your judgment to the evidence against the criteria. There are three outcomes:
The work meets the criteria. It doesn't need to be perfect — it needs to be good enough that further revision would yield diminishing returns.
queue_item.status = "done"
queue_item.result = "accepted"
Log in the progress section: Evaluated {item.id}: ACCEPTED. {one-sentence rationale}
The work has specific, addressable problems. You can articulate what's wrong and what "fixed" looks like. Do not revise for style preferences or minor issues — only for substantive gaps against the criteria.
queue_item.status = "pending"
queue_item.result = "revision_needed"
queue_item.attempts += 1
Create a new worker task with revision instructions:
mcp__pkb__create_task(
title="[Burst] {workflow}: {item.id} (revision {attempts})",
parent=<tracking-task-id>,
project=<project>,
assignee="polecat",
body=<revision instructions>
)
The revision instructions must include:
Log in the progress section: Evaluated {item.id}: REVISION NEEDED (attempt {N}). {what was wrong}
Log in the decisions section: {timestamp}: {item.id} sent for revision — {specific feedback summary}
The work has fundamental problems that revision won't fix, OR the item has exhausted its retry budget (attempts >= config.max_attempts, default 3).
queue_item.status = "failed"
queue_item.result = "failed"
Add to the Escalations section in the tracking task body:
### {item.id} — ESCALATED ({timestamp})
**Attempts**: {N}
**Last worker**: {task-id}
**Problem**: {clear description of why this can't be resolved by another worker attempt}
**Recommendation**: {what a human should do — e.g., "rewrite the spec manually", "clarify requirements first"}
Log in the progress section: Evaluated {item.id}: FAILED after {N} attempts. Escalated.
These apply regardless of workflow type:
Read the work, not just the metadata. Don't accept based on task status alone. The worker may have marked itself done without actually completing the work well. Read the actual output.
Judge against the stated criteria, not your own preferences. The workflow author defined what "good" means. Evaluate against that, not against what you would have done differently.
Be specific in revision feedback. "Not good enough" is useless. "The user expectations section lists 3 items but none are testable — each should have a clear pass/fail condition" is actionable.
Err toward accepting adequate work. The goal is throughput on a long queue, not perfection on each item. Accept work that meets the criteria even if you'd do it differently. Reserve revision for substantive gaps.
Escalate early if the approach is wrong. If a worker's output shows it fundamentally misunderstood the task (not just quality issues), don't waste retry budget. Escalate with a clear note about what went wrong so the human can adjust the worker instructions or handle it manually.
Own every problem you discover (P#30). If a worker finds that a queue item was already completed, or that a task is malformed, or that the queue snapshot is stale — that is YOUR problem. Do not dismiss it as "a PKB issue" or "not a burst-supervisor bug." File a follow-up task to investigate the systemic cause (e.g., "why aren't completed tasks being finalized?"). Nothing is someone else's responsibility.
Calculate available slots:
available = config.items_per_burst - len(active_dispatches)
For each pending item (up to available slots):
worker_task = mcp__pkb__create_task(
title="[Burst] {workflow}: {item.id}",
parent=<tracking-task-id>,
project=<project>,
assignee="polecat",
priority=1,
body=<rendered worker instructions>
)
The worker instructions are rendered from the Worker Instructions template in the tracking task body, with {source}, {item.id}, and any other item-specific variables substituted.
Invoke each worker through polecat infrastructure:
# Claude worker (default)
polecat run -t <worker-task-id> -p <project> -c burst-supervisor
# Gemini worker
polecat run -t <worker-task-id> -p <project> -c burst-supervisor -g
Use config.worker_type to determine the runner:
"claude" or "claude-cli" → polecat run -t <id> -p <project> -c burst-supervisor"gemini" or "gemini-cli" → polecat run -t <id> -p <project> -c burst-supervisor -gLaunch all workers in parallel using separate Bash tool calls with run_in_background: true. Do not wait for workers to finish. Each polecat run handles the full lifecycle (claim, worktree, agent, auto-finish) autonomously. The supervisor's job is to dispatch and HALT — the next /loop iteration will check results.
Do NOT call gemini -p, claude -p, or any agent CLI directly. Always go through polecat run.
HALT if
polecatdoes not resolve. If thepolecatcommand is not found (e.g., alias not loaded in a non-interactive shell), do NOT fall back to a direct invocation. Log the failure and HALT — this is an infrastructure bug that must be fixed, not worked around (P#25, P#9).
After dispatch (do not wait for completion):
status=in_progressworker_task=<worker-task-id>dispatched=<timestamp>active_dispatchesUpdate the tracking task with new state:
plan.completed = count(queue where status == "done")
plan.failed = count(queue where status == "failed")
plan.in_progress = count(queue where status == "in_progress")
plan.remaining = count(queue where status == "pending")
burst_count += 1
last_burst = <now>
### Burst {N} -- {timestamp}
- Checked: {n} dispatches ({accepted} accepted, {revised} revised, {failed} failed)
- Dispatched: {n} new items → {worker_type}
- Progress: {done}/{total} complete, {in_progress} in progress, {remaining} remaining
Use the Edit tool to update the YAML code block in the tracking task body with the new state, and append the progress log below it. For PKB-tracked fields (status, assignee, tags), use mcp__pkb__update_task.
Output a progress summary to the terminal:
Burst {N} complete.
Queue: {done}/{total} done, {in_progress} in flight, {remaining} pending, {failed} failed
Active dispatches: {list of task IDs}
If all items are done or failed: Mark tracking task as done. Summarize final results.
If work remains and no /loop is already scheduled: Create one:
/loop 20m /burst-supervisor <tracking-task-id>
If a /loop is already active (check with CronList), just HALT — the next tick will resume.
HALT. Do not loop or wait — each invocation is one burst. The /loop handles recurrence.
When invoked with init:
claude or gemini) from the user's instruction. If not specified, ASK the user — do not default silently. Dispatching the wrong runner type wastes all worker execution and cannot be recovered.ls specs/*.md for spec-audit) to build the item listmcp__pkb__create_task(
title="Supervisor: {workflow description}",
project=<project>,
parent=<epic-id if applicable>,
assignee="polecat",
tags=["supervisor", "long-running"],
body=<tracking task body with workflow config + empty progress log>
)
Initialize state as a YAML code block in the body (from [[aops-f22cf622]]):
version: 1workflow: <name>queue: [...] (populated from scan)active_dispatches: []config: {max_attempts: 3, items_per_burst: 3, worker_type: "<REQUIRED: claude or gemini — must match user's instruction, never default silently>", review_mode: "auto"}plan: {total_items: N, completed: 0, failed: 0, in_progress: 0, remaining: N}
PKB strips unrecognized frontmatter fields — state MUST live in the body.Proceed to first burst (Phase 2-5)
On load, check supervisor.last_burst. If it was updated < 5 minutes ago AND there are active dispatches, warn:
Another supervisor burst may be running (last burst: {timestamp}, {N} active dispatches).
Proceed anyway? [y/N]
This is advisory, not a hard lock.
# Supervisor: {workflow description}
## Mission
{User-provided description of what this supervisor is doing}
## Workflow Config
### Queue Source
{How to populate the queue, e.g., "Scan specs/*.md, one item per file"}
### Worker Instructions
{Template for what each worker should do. Use {source} and {item.id} as placeholders.}
**Default commit verification block** (MUST be included in all worker templates for Gemini workers):
> Before finishing, verify your changes are committed:
>
> 1. `git add -A && git status` — check what's staged
> 2. `git commit -m "feat: <description>"` — commit changes
> 3. `git log --oneline main..HEAD` — confirm commits exist
> 4. If no commits show, do NOT proceed to PR — something went wrong.
### Evaluation Criteria
{Checklist the supervisor uses to evaluate worker output}
## Progress
{Burst logs appended here by Phase 4b}
## Decisions
{Supervisor decision log — why items were re-queued, skipped, etc.}
## Escalations
{Items that exceeded max_attempts or need human judgment}
| Pattern | Use case | Dispatch model |
| ---------------------- | -------------------------------- | -------------------------------------------------------- |
| polecat swarm | Parallel batch, drain queue | Pull (workers claim from queue) |
| polecat supervise | LLM-driven parallel rounds | Push (supervisor selects + dispatches) |
| burst-supervisor | Iterative long-running workflows | Push (supervisor creates tasks + dispatches via polecat) |
| swarm-supervisor skill | Full lifecycle orchestration | Push (decompose → review → dispatch) |
The burst-supervisor is for workflows that process N items iteratively across multiple sessions — spec audits, document reviews, email processing. It creates worker tasks in PKB and dispatches via polecat run, giving workers full worktree isolation and autonomous execution.
/pull — Single task workflow (what each worker runs internally)polecat run — Single autonomous polecat workerpolecat supervise — LLM-driven parallel dispatch (supervisor_loop.py)swarm-supervisor — Full lifecycle orchestration skilltools
Streamlit implementation of the analyst presentation layer. Use when building or updating a Streamlit dashboard that displays pre-computed research data. This is the Streamlit-specific HOW for the tech-agnostic principles in the aops-tools analyst skill — display only, never transform.
tools
Python plotting and statistical-modelling libraries (matplotlib, seaborn, statsmodels) for the analyst presentation and statistical-methodology layers. Use when producing publication-quality figures or fitting statistical models in Python. Library-specific HOW for the tech-agnostic principles in the aops-tools analyst skill.
tools
dbt (data build tool) implementation of the analyst transformation layer. Use when a project has a dbt/ directory or you need to build, test, or document SQL transformations as version-controlled, reproducible dbt models. This is the dbt-specific HOW for the tech-agnostic principles in the aops-tools analyst skill.
development
Core academicOps skill — institutional memory, strategic coordination, workflow routing, and framework governance. Merges butler (chief-of-staff) with framework development conventions.