Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

xoai/autoresearch

Name: autoresearch
Author: xoai

skills/autoresearch/SKILL.md

npx skillsauth add xoai/sage autoresearch

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Autoresearch

Autonomous iteration toward a measurable outcome. The agent modifies code, commits, runs a verify command, keeps improvements, reverts regressions — repeating until a target is hit, a budget is exhausted, or the user interrupts.

Core principles (from Karpathy's autoresearch pattern):

One change per iteration
Commit before verify
Metrics must be mechanical (deterministic, fast, parseable)
Keep improvements, revert regressions — no exceptions
The branch is sacred — never touch main
State survives crashes — resume from last known good
Memory spans sessions — what worked/failed carries forward

When to Use

Task has a measurable numeric metric (size, time, count, score, coverage)
A verify command exists that outputs the metric deterministically
"Better" means the number going consistently in one direction
The agent can make changes autonomously within a defined scope

When NOT to Use

Subjective goals ("make the UI prettier")
No verify command available
Metric requires manual evaluation
Task needs human judgment per iteration
Exploratory research without a target

Elicitation Checklist

Before the loop can start, capture these (skip if already provided):

| Field | Required | Example | |-------|----------|---------| | Goal | Yes | "Reduce bundle below 200KB" | | Metric name | Yes | bundle_kb | | Direction | Yes | lower or higher | | Target | Optional | 200 | | Verify command | Yes | pnpm build && measure.sh | | Writable scope | Recommended | src/**/*.ts | | Frozen scope | Recommended | package.json, *.lock | | Per-run budget | Yes (default 120s) | 120 seconds | | Max iterations | Optional | 100 | | Termination | Auto | target if target given, else interrupt |

Present as a brief for user approval:

Sage: Autoresearch session configured.

  Goal: [goal statement]
  Metric: [name] ([direction]), target: [target or "none — runs until interrupted"]
  Verify: [command]
  Scope: writable [globs], frozen [globs]
  Budget: [seconds]s per run, [max iterations or "unlimited"]

[A] Start — begin autonomous iteration
[R] Revise — change configuration

The 8-Phase Loop

Each iteration follows 8 phases. Read references/loop-protocol.md for per-phase detail.

| # | Phase | Actor | What happens | |---|-------|-------|-------------| | 1 | REVIEW | agent | Read current state, recent history (last 20 iterations from JSONL) | | 2 | IDEATE | agent | Propose ONE change, ≤1 sentence. If stuck, load references/stuck-recovery.md | | 3 | MODIFY | agent | Make the change. Stay within writable scope. | | 4 | COMMIT | runtime | git add -A && git commit on autoresearch/<slug> branch | | 5 | VERIFY | runtime | Run verify command with wall-clock budget | | 6 | DECIDE | runtime | Parse METRIC, compare to best → keep / discard / crash | | 7 | LOG | runtime+agent | Append JSONL, rebuild TSV, agent updates living doc | | 8 | REPEAT | runtime | Check termination → loop or exit |

Decision rules (Phase 6):

Exit code ≠ 0 → crash, reset to HEAD
No METRIC line → crash, reset
nan/inf → crash, reset
Metric improved → keep, advance branch
Metric equal or worse → discard, reset

Runtime Integration

An optional Python runtime handles the deterministic phases (COMMIT, VERIFY, DECIDE, LOG, REPEAT); the agent handles the creative phases (REVIEW, IDEATE, MODIFY). The runtime was extracted from core into its own package in Phase 3 (like sage-memory) — install it with sage add xoai/sage-autoresearch.

Running the runtime — probe first, and degrade LOUDLY if it is absent (announce + log to decisions.md; never silently skip):

if python3 -c 'import autoresearch' 2>/dev/null; then
  python3 -m autoresearch run --brief .sage/work/<slug>/brief.md --project .
else
  echo "Sage: autoresearch runtime not installed — running in degraded (manual)"
  echo "mode. For the deterministic runtime: sage add xoai/sage-autoresearch"
fi

Harness contract: The verify command must print METRIC name=number to stdout. See references/harness-conventions.md.

Session State

All state lives in .sage/work/<YYYYMMDD-slug>/:

| File | Role | |------|------| | brief.md | Configuration (goal, metric, scope, budget) | | autoresearch.md | Living doc — ideas tried, wins, dead ends | | autoresearch.jsonl | Structured log (one line per iteration) | | results.tsv | Human-readable view (derived from JSONL) | | runs/NNNN-*.log | Per-iteration stdout+stderr | | .autoresearch-state.json | Crash recovery state (not committed) |

Session Resume

On resume (new session, context reset, platform switch):

Read autoresearch.md for high-level context
Read last 20 lines of autoresearch.jsonl for recent history
Verify last JSONL commit matches git log on the branch
Continue from next iteration number

See references/session-continuity.md for full protocol.

Memory Integration

Session end: Store a structured summary in sage-memory:

Winning patterns (what worked)
Losing patterns (what didn't)
Best achieved value
Iteration count

Session start: Search sage-memory for priors on this repo + metric. Inject into IDEATE as "known-good starting points" and "known dead ends."

Quality Gates

| Gate | When | Check | |------|------|-------| | scope | After MODIFY | Changed files ⊆ writable, frozen untouched | | pre-verify | After COMMIT | git status is clean | | metric-parseable | After VERIFY | At least one METRIC line in stdout | | budget | During VERIFY | Wall-clock ≤ per_run_seconds |

Gates are enforced by the runtime, not by prose. The agent cannot bypass them.

References

references/loop-protocol.md — per-phase inputs, outputs, failure modes
references/metric-design.md — what makes a good metric
references/harness-conventions.md — METRIC line contract
references/stuck-recovery.md — escape local minima
references/crash-handling.md — retry vs skip decision tree
references/session-continuity.md — resume protocol

xoai/autoresearch

skills/autoresearch/SKILL.md

Autonomous iteration toward a measurable outcome. Use when the user wants to optimize a numeric metric through repeated modify-verify cycles — reduce bundle size, increase test coverage, improve query time, lower readability score. Not for exploratory research, subjective judgment, or tasks without a verification command.

22 stars

testing

Updated Jul 12, 2026

$ install --global

skillsauth

npx skillsauth add xoai/sage autoresearch

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 12, 2026, 2:21 AM129.9s13 files scanned

SKILL.md

name:: autoresearch
description:: >
version:: 1.0.0
type:: process
activates-when:: >
improve a measurable quantity. Also:: iterate until", "keep trying",

Autoresearch

Core principles (from Karpathy's autoresearch pattern):

One change per iteration
Commit before verify
Metrics must be mechanical (deterministic, fast, parseable)
Keep improvements, revert regressions — no exceptions
The branch is sacred — never touch main
State survives crashes — resume from last known good
Memory spans sessions — what worked/failed carries forward

When to Use

Task has a measurable numeric metric (size, time, count, score, coverage)
A verify command exists that outputs the metric deterministically
"Better" means the number going consistently in one direction
The agent can make changes autonomously within a defined scope

When NOT to Use

Subjective goals ("make the UI prettier")
No verify command available
Metric requires manual evaluation
Task needs human judgment per iteration
Exploratory research without a target

Elicitation Checklist

Before the loop can start, capture these (skip if already provided):

Present as a brief for user approval:

Sage: Autoresearch session configured.

  Goal: [goal statement]
  Metric: [name] ([direction]), target: [target or "none — runs until interrupted"]
  Verify: [command]
  Scope: writable [globs], frozen [globs]
  Budget: [seconds]s per run, [max iterations or "unlimited"]

[A] Start — begin autonomous iteration
[R] Revise — change configuration

The 8-Phase Loop

Each iteration follows 8 phases. Read references/loop-protocol.md for per-phase detail.

Decision rules (Phase 6):

Exit code ≠ 0 → crash, reset to HEAD
No METRIC line → crash, reset
nan/inf → crash, reset
Metric improved → keep, advance branch
Metric equal or worse → discard, reset

Runtime Integration

Running the runtime — probe first, and degrade LOUDLY if it is absent (announce + log to decisions.md; never silently skip):

if python3 -c 'import autoresearch' 2>/dev/null; then
  python3 -m autoresearch run --brief .sage/work/<slug>/brief.md --project .
else
  echo "Sage: autoresearch runtime not installed — running in degraded (manual)"
  echo "mode. For the deterministic runtime: sage add xoai/sage-autoresearch"
fi

Harness contract: The verify command must print METRIC name=number to stdout. See references/harness-conventions.md.

Session State

All state lives in .sage/work/<YYYYMMDD-slug>/:

Session Resume

On resume (new session, context reset, platform switch):

Read autoresearch.md for high-level context
Read last 20 lines of autoresearch.jsonl for recent history
Verify last JSONL commit matches git log on the branch
Continue from next iteration number

See references/session-continuity.md for full protocol.

Memory Integration

Session end: Store a structured summary in sage-memory:

Winning patterns (what worked)
Losing patterns (what didn't)
Best achieved value
Iteration count

Session start: Search sage-memory for priors on this repo + metric. Inject into IDEATE as "known-good starting points" and "known dead ends."

Quality Gates

Gates are enforced by the runtime, not by prose. The agent cannot bypass them.

References

references/loop-protocol.md — per-phase inputs, outputs, failure modes
references/metric-design.md — what makes a good metric
references/harness-conventions.md — METRIC line contract
references/stuck-recovery.md — escape local minima
references/crash-handling.md — retry vs skip decision tree
references/session-continuity.md — resume protocol

Related Skills

xoai/fix

testing

VerifiedTrustedCommunity

Root cause diagnosis with evidence, Reproducing test, Minimal patch

23SKILL.mdUpdated Jul 12, 2026

xoai/continue

tools

VerifiedTrustedCommunity

Session resumption with context

23SKILL.mdUpdated Jul 12, 2026

xoai/configure

tools

VerifiedTrustedCommunity

Configure Sage preset and project settings. Switch between base, startup, enterprise, or opensource constitution presets. Use when the user says "configure sage", "change preset", or "sage settings".

23SKILL.mdUpdated Jul 12, 2026

xoai/build

development

VerifiedTrustedCommunity

Brief (medium+ tasks), Spec, Implementation plan

23SKILL.mdUpdated Jul 12, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/xoai/sage.git

# Copy into Claude Code skills folder (global)
cp -r sage/skills/autoresearch ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

xoai/sage

22 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT