toolkit/packages/skills/autonomous-loops/SKILL.md
Six proven autonomous agent loop patterns with guard rails. Provides reusable patterns for generate->validate->fix, explore->hypothesize->test, and other autonomous workflows. Includes the reviewer-never-authored principle for quality assurance. Use when: (1) Building autonomous agent workflows, (2) Designing self-correcting pipelines, (3) Implementing agent retry/fix loops, (4) Setting up multi-agent review processes, (5) User asks about agent loop patterns.
npx skillsauth add stevengonsalvez/agents-in-a-box autonomous-loopsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The agent that reviews work must never be the agent that authored it.
This is the single most important principle for autonomous quality. Self-review is unreliable -- the same blind spots that caused the error will miss it during review.
Implementation:
subagent_type or name) for reviewThe most common autonomous loop. Generate output, validate against criteria, fix if needed.
+----------+ +----------+ +----------+
| Generate |---->| Validate |---->| Fix |--+
| | | | | | |
+----------+ +----+-----+ +----------+ |
| Pass |
v |
+----------+ |
| Accept |<-------------------+
+----------+ (max 3 iterations)
When to use: Code generation, document creation, configuration authoring
MAX_ITERATIONS = 3
for iteration in range(MAX_ITERATIONS):
if iteration == 0:
output = generate(prompt, context)
else:
output = fix(output, validation_errors, context)
is_valid, errors = validate(output, acceptance_criteria)
if is_valid:
return accept(output)
return escalate_to_human(output, errors)
Guard rails:
For debugging and investigation. Gather evidence, form theory, validate.
+----------+ +-------------+ +----------+
| Explore |---->| Hypothesize |---->| Test |--+
| (gather | | (form | | (verify | |
| evidence)| | theory) | | theory) | |
+----------+ +-------------+ +----+-----+ |
| Fail |
v |
+----------+ |
| Refine |--+
| hypothesis|
+----------+
When to use: Bug investigation, root cause analysis, codebase exploration
Guard rails:
For multi-step implementation tasks.
+----------+ +----------+ +----------+ +----------+
| Plan |---->| Execute |---->| Verify |---->| Adjust |--+
| (steps) | | (step N) | | (tests) | | (plan) | |
+----------+ +----------+ +----------+ +----------+ |
^ |
+------------------------------------------------------------+
When to use: Feature implementation, refactoring, migration tasks
Guard rails:
For creative or design tasks where multiple approaches are valid.
+------------+ +------------+ +----------+
| Diverge |---->| Converge |---->| Select |
| (generate | | (evaluate | | (pick |
| N options)| | trade-offs)| | best) |
+------------+ +------------+ +----------+
When to use: Architecture decisions, API design, UI alternatives
Guard rails:
For building up content or code incrementally.
+----------+ +----------+ +----------+
| Seed |---->| Expand |---->| Prune |--+
| (minimal | | (add | | (remove | |
| version)| | features)| | bloat) | |
+----------+ +----------+ +----------+ |
^ |
+--------------------------+
(until scope complete)
When to use: MVP development, documentation, test suite building
Guard rails:
For reactive, event-driven agent workflows.
+----------+ +----------+ +----------+ +----------+
| Observe |---->| Orient |---->| Decide |---->| Act |
| (monitor | | (analyze | | (choose | | (execute |
| events) | | context)| | action) | | action) |
+----------+ +----------+ +----------+ +----------+
^ |
+----------------------------------------------------+
When to use: Monitoring, incident response, CI/CD automation
Guard rails:
| Task Type | Recommended Pattern | |-----------|-------------------| | Code generation / editing | Generate -> Validate -> Fix | | Bug investigation | Explore -> Hypothesize -> Test | | Feature implementation | Plan -> Execute -> Verify -> Adjust | | Architecture / design | Diverge -> Converge -> Select | | Incremental building | Seed -> Expand -> Prune | | Monitoring / ops | OODA |
Patterns can be nested. For example:
Apply these to ALL patterns:
documentation
Report reflect drain spend over a time window — tokens split by cached (cache_read), uncached writes (cache_creation), and io (input+output), with a $ estimate, grouped by day / outcome / model / transcript. Reads the drainer's cost log and surfaces outlier runs and cache-reuse health (the 41.5M-token failure mode = low cache reuse + high cache writes). Use to answer "what is reflection costing me" for the last day / week.
development
Show fleet status — every claude session running on the host, merged across ainb + claude-peers broker + background jobs. Use when you need to enumerate sessions before composing an action, see which sessions have a peer registered (broker-routable) vs tmux-only, check the `summary` of each session, or pipe the list into jq for filtering. Default output: text table. Pass --format json for LLM consumption.
testing
Ordered multi-step prompts to fleet targets, ack-gated between steps via JSONL assistant-turn-end detection. Use for cycles like disconnect→reconnect→verify, or any flow where step N+1 requires step N to have completed first. The skill BLOCKS until each target's transcript shows the next assistant turn finishing OR per-step timeout fires (default 300s).
development
Center control panel — enumerate every claude session that is blocked waiting on something: a user answer (AskUserQuestion fired), an API error retry, an idle assistant turn-end with no follow-up, or an explicit WAITING: marker. Returns rich JSON with signal kind + context per session. Use this when you've stepped away from the fleet and want one place to see everything that wants your attention and answer it.