skills/autoresearch/SKILL.md
Autonomous iteration toward a measurable outcome. Use when the user wants to optimize a numeric metric through repeated modify-verify cycles — reduce bundle size, increase test coverage, improve query time, lower readability score. Not for exploratory research, subjective judgment, or tasks without a verification command.
npx skillsauth add xoai/sage autoresearchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Autonomous iteration toward a measurable outcome. The agent modifies code, commits, runs a verify command, keeps improvements, reverts regressions — repeating until a target is hit, a budget is exhausted, or the user interrupts.
Core principles (from Karpathy's autoresearch pattern):
Before the loop can start, capture these (skip if already provided):
| Field | Required | Example |
|-------|----------|---------|
| Goal | Yes | "Reduce bundle below 200KB" |
| Metric name | Yes | bundle_kb |
| Direction | Yes | lower or higher |
| Target | Optional | 200 |
| Verify command | Yes | pnpm build && measure.sh |
| Writable scope | Recommended | src/**/*.ts |
| Frozen scope | Recommended | package.json, *.lock |
| Per-run budget | Yes (default 120s) | 120 seconds |
| Max iterations | Optional | 100 |
| Termination | Auto | target if target given, else interrupt |
Present as a brief for user approval:
Sage: Autoresearch session configured.
Goal: [goal statement]
Metric: [name] ([direction]), target: [target or "none — runs until interrupted"]
Verify: [command]
Scope: writable [globs], frozen [globs]
Budget: [seconds]s per run, [max iterations or "unlimited"]
[A] Start — begin autonomous iteration
[R] Revise — change configuration
Each iteration follows 8 phases. Read references/loop-protocol.md
for per-phase detail.
| # | Phase | Actor | What happens |
|---|-------|-------|-------------|
| 1 | REVIEW | agent | Read current state, recent history (last 20 iterations from JSONL) |
| 2 | IDEATE | agent | Propose ONE change, ≤1 sentence. If stuck, load references/stuck-recovery.md |
| 3 | MODIFY | agent | Make the change. Stay within writable scope. |
| 4 | COMMIT | runtime | git add -A && git commit on autoresearch/<slug> branch |
| 5 | VERIFY | runtime | Run verify command with wall-clock budget |
| 6 | DECIDE | runtime | Parse METRIC, compare to best → keep / discard / crash |
| 7 | LOG | runtime+agent | Append JSONL, rebuild TSV, agent updates living doc |
| 8 | REPEAT | runtime | Check termination → loop or exit |
Decision rules (Phase 6):
crash, reset to HEADcrash, resetcrash, resetkeep, advance branchdiscard, resetThe Python runtime at core/autoresearch/ handles deterministic phases
(COMMIT, VERIFY, DECIDE, LOG, REPEAT). The agent handles creative
phases (REVIEW, IDEATE, MODIFY).
Running the runtime:
python -m core.autoresearch run --brief .sage/work/<slug>/brief.md --project .
Harness contract: The verify command must print METRIC name=number
to stdout. See references/harness-conventions.md.
All state lives in .sage/work/<YYYYMMDD-slug>/:
| File | Role |
|------|------|
| brief.md | Configuration (goal, metric, scope, budget) |
| autoresearch.md | Living doc — ideas tried, wins, dead ends |
| autoresearch.jsonl | Structured log (one line per iteration) |
| results.tsv | Human-readable view (derived from JSONL) |
| runs/NNNN-*.log | Per-iteration stdout+stderr |
| .autoresearch-state.json | Crash recovery state (not committed) |
On resume (new session, context reset, platform switch):
autoresearch.md for high-level contextautoresearch.jsonl for recent historygit log on the branchSee references/session-continuity.md for full protocol.
Session end: Store a structured summary in sage-memory:
Session start: Search sage-memory for priors on this repo + metric. Inject into IDEATE as "known-good starting points" and "known dead ends."
| Gate | When | Check |
|------|------|-------|
| scope | After MODIFY | Changed files ⊆ writable, frozen untouched |
| pre-verify | After COMMIT | git status is clean |
| metric-parseable | After VERIFY | At least one METRIC line in stdout |
| budget | During VERIFY | Wall-clock ≤ per_run_seconds |
Gates are enforced by the runtime, not by prose. The agent cannot bypass them.
references/loop-protocol.md — per-phase inputs, outputs, failure modesreferences/metric-design.md — what makes a good metricreferences/harness-conventions.md — METRIC line contractreferences/stuck-recovery.md — escape local minimareferences/crash-handling.md — retry vs skip decision treereferences/session-continuity.md — resume protocoldevelopment
Branch-per-initiative git discipline for all delivery workflows. Defines branch naming by workflow, the propose-confirm creation protocol, dirty-tree and detached-HEAD handling, the always user-gated merge protocol, worktree support for parallel sessions, and abandonment cleanup. Activates only in git repositories — silently inactive everywhere else. Use when starting /build, /fix, /architect, or /build-x at Standard+ scope, when resuming an initiative, when offering a merge at a completion checkpoint, or when the user wants a second concurrent initiative.
development
Drives task-by-task execution from an approved plan with quality gates between each task. Reads the plan, finds the next incomplete task, dispatches implementation, validates, updates progress, and continues. Use after a plan is approved and the user says "go", "start building", "execute the plan", or "implement the feature".
testing
Preserves and restores context across agent sessions using plan file checkboxes as source of truth. Use when starting a new session, resuming previous work, ending a session, or when the user says "continue from last time", "what was I doing", or "save progress".
tools
Captures agent mistakes, corrections, and discovered gotchas so they are not repeated. Use when: (1) a command or operation fails unexpectedly, (2) the user corrects the agent, (3) the agent discovers non-obvious behavior through debugging, (4) an API or tool behaves differently than expected, (5) a better approach is found for a recurring task. Also searches past learnings before starting tasks to avoid known pitfalls. Activate alongside the sage-memory skill — they share the same MCP backend but serve different purposes (sage-memory = codebase knowledge, sage-self-learning = agent mistakes and gotchas).