Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

paulrberg/autoresearch

Name: autoresearch
Author: paulrberg

skills/autoresearch/SKILL.md

npx skillsauth add paulrberg/dot-agents autoresearch

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Autoresearch

Run isolated experiments, measure them consistently, keep verified improvements, and stop on explicit resource or convergence limits.

Session Contract

Resolve the objective, primary metric and direction, benchmark and correctness commands, allowed/off-limits paths, run/runtime/command/cost/regression limits, convergence window, and reporting cadence before the baseline. Infer safe facts from the request and repository; ask only when a missing choice changes the experiment.

Defaults: 20 runs, two hours wall time, 10 minutes per benchmark, five minutes per correctness check, no new paid API spend, and convergence after five consecutive valid runs without a new retained best. Explicit --max-runs and --max-runtime values are hard limits.

Isolation

Prefer a dedicated branch in a separate Git worktree. Record its starting commit, path, initial status, allowed paths, and session files in autoresearch.md. If isolation is unavailable, require a clean worktree or explicit authorization to share it.

Never run repository-wide clean, checkout, stash, or reset commands. Revert only paths changed by the current experiment from its recorded pre-run state, and remove only newly created in-scope paths. Preserve unrelated files and all session evidence.

Session Module

Create autoresearch.md, deterministic autoresearch.sh, optional autoresearch.checks.sh, append-only autoresearch.jsonl, and optional autoresearch.ideas.md. Resolve the module from this SKILL.md and initialize the JSONL before the baseline:

uv run "<skill-dir>/scripts/autoresearch-session.py" init \
  --file autoresearch.jsonl --metric <name> --direction <higher|lower> \
  --max-runs <n> --max-runtime-seconds <seconds> \
  --max-cost <amount> --convergence-runs <n>

The first config record declares direction. Record each completed attempt only after the agent assigns its status:

uv run "<skill-dir>/scripts/autoresearch-session.py" record \
  --file autoresearch.jsonl --metric <number> \
  --status <keep|discard|crash|checks_failed> \
  [--commit <id>] [--description <text>] \
  [--elapsed-seconds <n>] [--estimated-cost <amount>]

Zero and negative metrics are valid values. The agent owns keep versus discard; the module validates records and uses the declared direction. When the primary metric changes, pass --metric-name <new> and --direction <higher|lower> on the first new record. The module appends a new segment config.

Use status --format json for best/delta/MAD/confidence, counts, convergence, budgets, and exact progress rendering:

uv run "<skill-dir>/scripts/autoresearch-session.py" status --file autoresearch.jsonl

scripts/confidence.sh [jsonl] and scripts/summary.sh [jsonl] remain compatibility adapters. Malformed records or violated invariants fail; noisy, equivalent, or agent-discarded results are reported facts, not helper failures.

Experiment Loop

Inspect all in-scope source plus relevant tests or profiles. Create isolation and session files, then record an unchanged baseline.
Before each run, snapshot allowed paths. Choose one focused hypothesis, implement it, and run the benchmark within its timeout.
Parse the declared metric. Missing metrics, crashes, timeouts, and failed correctness checks cannot be improvements.
Run correctness checks for every candidate the agent might retain.
Use the session status plus repeated measurements to judge noise or equivalence. Keep only a verified improvement within all hard constraints; prefer simpler code when results are equivalent. Otherwise perform the scoped revert.
Append the agent-assigned record and update autoresearch.md when evidence changes the retained best or rules out an approach.
Incorporate user steering between completed runs. Stop at the first hard limit, user interruption, satisfied target, or helper-reported convergence.
Read references/loop-rules.md only for ambiguous keep/discard judgment, noise handling, backlog maintenance, or thrash recovery.

Progress and Completion

Send sparse updates at the baseline, every five settled runs or material best change, and the final stop. Render the module's exact bar, counts, metrics, budgets, and convergence facts; never infer progress from time or activity. Include the next agent-chosen hypothesis without recording it as settled work.

Finish with ### 🏁 Autoresearch complete — <stop reason>, baseline/best/delta/confidence, status counts, kept-file tree, exact checks, worktree/branch, and remaining cleanup or integration. Keep METRIC lines, JSONL, commands, and diagnostics undecorated. A resource limit is not convergence.

paulrberg/autoresearch

skills/autoresearch/SKILL.md

Use for autoresearch or "optimize X overnight/in a loop"; sets up bounded iterative trials for a measurable optimization target.

5 stars

research

Updated Jul 23, 2026

$ install --global

skillsauth

npx skillsauth add paulrberg/dot-agents autoresearch

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 23, 2026, 6:12 AM159.5s7 files scanned

SKILL.md

argument-hint:: <goal> [--max-runs N] [--max-runtime DURATION]
disable-model-invocation:: false
name:: autoresearch
user-invocable:: true

Autoresearch

Run isolated experiments, measure them consistently, keep verified improvements, and stop on explicit resource or convergence limits.

Session Contract

Isolation

Session Module

uv run "<skill-dir>/scripts/autoresearch-session.py" init \
  --file autoresearch.jsonl --metric <name> --direction <higher|lower> \
  --max-runs <n> --max-runtime-seconds <seconds> \
  --max-cost <amount> --convergence-runs <n>

The first config record declares direction. Record each completed attempt only after the agent assigns its status:

uv run "<skill-dir>/scripts/autoresearch-session.py" record \
  --file autoresearch.jsonl --metric <number> \
  --status <keep|discard|crash|checks_failed> \
  [--commit <id>] [--description <text>] \
  [--elapsed-seconds <n>] [--estimated-cost <amount>]

Use status --format json for best/delta/MAD/confidence, counts, convergence, budgets, and exact progress rendering:

uv run "<skill-dir>/scripts/autoresearch-session.py" status --file autoresearch.jsonl

Experiment Loop

Inspect all in-scope source plus relevant tests or profiles. Create isolation and session files, then record an unchanged baseline.
Before each run, snapshot allowed paths. Choose one focused hypothesis, implement it, and run the benchmark within its timeout.
Parse the declared metric. Missing metrics, crashes, timeouts, and failed correctness checks cannot be improvements.
Run correctness checks for every candidate the agent might retain.
Use the session status plus repeated measurements to judge noise or equivalence. Keep only a verified improvement within all hard constraints; prefer simpler code when results are equivalent. Otherwise perform the scoped revert.
Append the agent-assigned record and update autoresearch.md when evidence changes the retained best or rules out an approach.
Incorporate user steering between completed runs. Stop at the first hard limit, user interruption, satisfied target, or helper-reported convergence.
Read references/loop-rules.md only for ambiguous keep/discard judgment, noise handling, backlog maintenance, or thrash recovery.

Progress and Completion

Related Skills

paulrberg/naming-refactor

development

VerifiedTrustedCommunity

Refactor naming and repository structure exhaustively while preserving behavior and external contracts.

5SKILL.mdUpdated Jul 26, 2026

paulrberg/naming-refactor

paulrberg/chrome-devtools

tools

VerifiedTrustedCommunity

Uses Chrome DevTools via MCP for efficient debugging, troubleshooting and browser automation. Use when debugging web pages, automating browser interactions, analyzing performance, or inspecting network requests. This skill does not apply to `--slim` mode (MCP configuration).

5SKILL.mdUpdated Jul 22, 2026

paulrberg/chrome-devtools

paulrberg/fresh-eyes-sweep

testing

VerifiedTrustedCommunity

Audit an entire repository with fresh eyes for correctness errors, bugs, omissions, duplication, inconsistencies, and other evidenced mistakes; fix every safe issue and verify the result.

5SKILL.mdUpdated Jul 16, 2026

paulrberg/fresh-eyes-sweep

paulrberg/night-shift

development

VerifiedTrustedCommunity

Autonomous overnight codebase improvement with bounded runtime, evidence-gated changes, and verification.

5SKILL.mdUpdated Jul 14, 2026

paulrberg/night-shift

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/paulrberg/dot-agents.git

# Copy into Claude Code skills folder (global)
cp -r dot-agents/skills/autoresearch ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

paulrberg/dot-agents

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT