internal/embed/claude/skills/autonomous-iteration/SKILL.md
Use when user mentions autonomous iteration, metric-driven optimization, $research-loop plan, $research-loop debug, $research-loop fix, $research-loop security, $research-loop ship, $research-loop scenario, $research-loop predict, $research-loop learn, $research-loop reason, $research-loop probe, or mentions "research-loop" with a goal/metric. Autonomous Goal-directed Iteration — apply Karpathy's autoresearch principles: modify, verify, keep/discard, repeat. Supports bounded mode via Iterations: N inline config.
npx skillsauth add moralespanitz/research-loop autonomous-iterationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Port of Autoresearch (uditgoenka) for Research Loop. Adds Karpathy-style constraint-driven autonomous iteration to scientific experiment workflows.
Core loop: Modify → Verify → Keep/Discard → Repeat.
The existing loop skill handles PROPOSE→MUTATE→BENCHMARK→ANNOTATE experiment
cycles for hypothesis testing. This skill extends Research Loop with a
metric-driven optimization loop — one change per iteration, mechanical
verification, automatic rollback via git revert.
| Skill | When to Use |
|-------|-------------|
| loop | You have a hypothesis and want to run experiments with ranking |
| autonomous-iteration | You have a measurable metric and want automated optimization |
The two skills compose: use loop to design the experiment, then use
autonomous-iteration to execute the optimization loop.
The autonomous-iteration loop grants the agent broad iterative authority — read, edit, run shell, commit. Every command operates inside fixed guardrails:
experiment: prefix; each discard is git revert-clean.rm -rf /, fork bombs, fetch-and-execute (curl ... | sh), embedded
credentials, and unannounced outbound writes.Iterations: N over unbounded loops.CRITICAL — READ THIS FIRST BEFORE ANY ACTION:
For ALL commands ($research-loop, $research-loop plan,
$research-loop debug, $research-loop fix, $research-loop security,
$research-loop ship, $research-loop scenario, $research-loop predict,
$research-loop learn, $research-loop reason, $research-loop probe):
| Command | Required Context | If Missing → Ask |
|---------|-----------------|-----------------|
| $research-loop | Goal, Scope, Metric, Direction, Verify | Batch 1 (4 questions) + Batch 2 (3 questions) |
| $research-loop plan | Goal | Ask per references/plan-workflow.md |
| $research-loop debug | Issue/Symptom, Scope | 4 batched questions per references/debug-workflow.md |
| $research-loop fix | Target, Scope | 4 batched questions per references/fix-workflow.md |
| $research-loop security | Scope, Depth | 3 batched questions per references/security-workflow.md |
| $research-loop ship | What/Type, Mode | 3 batched questions per references/ship-workflow.md |
| $research-loop scenario | Scenario, Domain | 4-8 adaptive questions per references/scenario-workflow.md |
| $research-loop predict | Scope, Goal | 3-4 batched questions per references/predict-workflow.md |
| $research-loop learn | Mode, Scope | 4 batched questions per references/learn-workflow.md |
| $research-loop reason | Task, Domain | 3-5 adaptive questions per references/reason-workflow.md |
| $research-loop probe | Topic | 4-7 adaptive questions per references/probe-workflow.md |
Never start any loop, phase, or execution without completing interactive setup when context is missing. This is a BLOCKING prerequisite.
| Subcommand | Purpose |
|------------|---------|
| $research-loop | Run the autonomous loop (default) |
| $research-loop plan | Interactive wizard to build Scope, Metric, Direction & Verify from a Goal |
| $research-loop security | Autonomous security audit: STRIDE threat model + OWASP Top 10 |
| $research-loop ship | Universal shipping workflow: 8 phases |
| $research-loop debug | Autonomous bug-hunting loop: scientific method + iterative investigation |
| $research-loop fix | Autonomous fix loop: iteratively repair errors until zero remain |
| $research-loop scenario | Scenario-driven use case generator: 12 exploration dimensions |
| $research-loop predict | Multi-persona swarm prediction: pre-analyze from multiple expert perspectives |
| $research-loop learn | Autonomous codebase documentation engine: scout, learn, generate/update |
| $research-loop reason | Adversarial refinement: multi-agent generate→critique→synthesize→blind judge |
| $research-loop probe | Adversarial requirement / assumption interrogation |
The core Modify→Verify→Keep/Discard→Repeat loop.
Load: references/core-loop.md for full protocol.
Usage:
# Unlimited — iterate until plateau or interrupted
$research-loop
Goal: Increase test coverage from 72% to 90%
Scope: src/**/*.ts
Metric: coverage % (higher is better)
Verify: npm test -- --coverage | grep "All files"
# Bounded — exactly N iterations
$research-loop
Goal: Reduce bundle size below 200KB
Iterations: 25
Scientific-method debug hunting. Doesn't stop at one bug — keeps investigating until the codebase is clean or interrupted.
Load: references/debug-workflow.md for full protocol.
Takes a broken state and iteratively repairs it until everything passes.
Load: references/fix-workflow.md for full protocol.
Runs a comprehensive security audit using the autonomous loop pattern. Generates a full STRIDE threat model, maps attack surfaces, then iteratively tests each vulnerability vector.
Load: references/security-workflow.md for full protocol.
Ship anything through a structured 8-phase workflow.
Load: references/ship-workflow.md for full protocol.
Autonomous scenario exploration engine that generates, expands, and stress-tests use cases from a seed scenario.
Load: references/scenario-workflow.md for full protocol.
Multi-perspective code analysis using swarm intelligence. Simulates 3-5 expert personas that independently analyze, debate, and reach consensus.
Load: references/predict-workflow.md for full protocol.
Scouts codebase structure, learns patterns and architecture, generates/updates comprehensive documentation.
Load: references/learn-workflow.md for full protocol.
Isolated multi-agent adversarial refinement loop for subjective domains.
Load: references/reason-workflow.md for full protocol.
Multi-persona probe that interrogates user and codebase until net-new constraints saturate, then emits ready-to-run research-loop config.
Load: references/probe-workflow.md for full protocol.
Converts a plain-language goal into a validated, ready-to-execute research-loop configuration.
Load: references/plan-workflow.md for full protocol.
$research-loop → run the loop$research-loop plan → run the planning wizardBy default, loops continue until the metric plateaus (no improvement for
15 consecutive measured iterations), then ask the user whether to stop,
continue, or change strategy. To run exactly N iterations instead, add
Iterations: N to your inline config.
Unlimited (default):
$research-loop
Goal: Increase test coverage to 90%
Bounded (N iterations):
$research-loop
Goal: Increase test coverage to 90%
Iterations: 25
In unlimited mode, tracks whether the best metric is still improving. If 15
consecutive measured iterations pass without a new best, the loop pauses and
asks the user to decide: stop, continue, or change strategy. Configure with
Plateau-Patience: N (default 15), or disable with Plateau-Patience: off.
By default, guards are pass/fail (exit code 0 = pass). For guards that measure a number, you can set a regression threshold instead:
$research-loop
Goal: Increase test coverage to 95%
Verify: npx jest --coverage 2>&1 | grep 'All files' | awk '{print $4}'
Guard: npx esbuild src/index.ts --bundle --minify | wc -c
Guard-Direction: lower is better
Guard-Threshold: 5%
This means: "optimize coverage, but reject any change that grows bundle size more than 5% from baseline."
If the user provides Goal, Scope, Metric, and Verify inline → extract them and proceed to step 5.
CRITICAL: If ANY critical field is missing, use direct prompting to collect them interactively. Never proceed without completing this setup.
Scan the codebase first for smart defaults, then ask ALL questions in batched direct prompting calls (max 4 per call).
Batch 1 — Core config (4 questions in one call):
| # | Header | Question | Options |
|---|--------|----------|---------|
| 1 | Goal | "What do you want to improve?" | "Test coverage (higher)", "Bundle size (lower)", "Performance (faster)", "Code quality (fewer errors)" |
| 2 | Scope | "Which files can be modified?" | Suggested globs from project structure |
| 3 | Metric | "What number tells you if it got better?" | Detected options from project tooling |
| 4 | Direction | "Higher or lower is better?" | "Higher is better", "Lower is better" |
Batch 2 — Verify + Guard + Launch (3 questions):
| # | Header | Question | Options |
|---|--------|----------|---------|
| 5 | Verify | "What command produces the metric?" | Suggested commands from detected tooling |
| 6 | Guard | "Any command that must ALWAYS pass?" | "npm test", "tsc --noEmit", "npm run build", "Skip" |
| 7 | Launch | "Ready to go?" | "Launch (unlimited)", "Launch with iteration limit", "Edit config", "Cancel" |
references/results-logging.md)LOOP (FOREVER or N times):
1. Review: Read current state + git history + results log
2. Ideate: Pick next change based on goal, past results, what hasn't been tried
3. Modify: Make ONE focused change to in-scope files
4. Commit: Git commit the change (before verification)
5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
6. Guard: If guard is set, run the guard command
7. Decide:
- IMPROVED + guard passed → Keep commit, log "keep", advance
- IMPROVED + guard FAILED → Revert, try to rework (max 2 attempts)
- SAME/WORSE → Git revert, log "discard"
- CRASHED → Try to fix (max 3 attempts), else log "crash" and move on
8. Log: Record result in results log
9. Repeat: Go to step 1
- If unbounded: NEVER STOP. NEVER ASK "should I continue?"
- If bounded (N): Stop after N iterations, print final summary
experiment: prefix.
Use git revert (not git reset --hard) for rollbacks.| Domain | Metric | Scope | Verify Command | Guard |
|--------|--------|-------|----------------|-------|
| Backend code | Tests pass + coverage % | src/**/*.ts | npm test | — |
| Frontend UI | Lighthouse score | src/components/** | npx lighthouse | npm test |
| ML training | val_bpb / loss | train.py | uv run train.py | — |
| Blog/content | Word count + readability | content/*.md | Custom script | — |
| Performance | Benchmark time (ms) | Target files | npm run bench | npm test |
| Refactoring | Tests pass + LOC reduced | Target module | npm test && wc -l | npm run typecheck |
| Security | OWASP + STRIDE coverage | API/auth/middleware | $research-loop security | — |
| Shipping | Checklist pass rate (%) | Any artifact | $research-loop ship | Domain-specific |
| Debugging | Bugs found + coverage | Target files | $research-loop debug | — |
| Fixing | Error count (lower) | Target files | $research-loop fix | npm test |
| Documentation | Validation pass rate | docs/*.md | $research-loop learn | npm test |
| Subjective refinement | Judge consensus | Any content | $research-loop reason | — |
Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.
testing
Plan and execute a structured replication workflow for a paper, claim, or benchmark with environment selection and integrity checks.
testing
End-to-end paper generation pipeline ported from AutoResearchClaw (Aiming Lab). 14 phases covering topic initiation through export/publish, with human- in-the-loop gates and quality gating at each handoff. Use this when the user wants a full paper pipeline run — topic to submission-ready manuscript. Delegates to researcher/reviewer/writer/verifier subagents for stage execution and to autonomous-iteration for experiment optimization loops.
testing
Run a structured literature review on a topic using parallel search, evidence tables with quality scoring, and primary-source synthesis.
development
Publication-quality figure generation for research papers. Decision agent selects figure type (code plot vs architecture diagram). Generates Matplotlib/Seaborn code for quantitative figures with iterative improvement loop. Style-matches conference templates (NeurIPS, ICML, ICLR). Use when the paper-pipeline reaches the figure generation phase, or when a user requests figures for an existing draft.