Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

paulrberg/autoresearch

Name: autoresearch
Author: paulrberg

skills/autoresearch/SKILL.md

npx skillsauth add paulrberg/agent-skills autoresearch

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Autoresearch

Run isolated experiments, measure them consistently, keep verified improvements, and stop on explicit resource or convergence limits.

Session Contract

Resolve these before the baseline. Infer them from the request and repository when safe; ask only for a missing choice that changes the experiment:

objective and primary metric, including direction;
benchmark command and correctness checks;
files allowed to change and paths that are off limits;
maximum runs, wall-clock runtime, per-command timeout, paid-service cost, and acceptable regression budgets;
convergence rule and any user-requested reporting cadence.

Defaults when the user gives none: 20 runs, two hours wall time, 10 minutes per benchmark, five minutes per correctness check, no new paid API spend, and convergence after five consecutive valid runs without a new best result. An explicit --max-runs N is exact unless another hard limit is reached first; --max-runtime DURATION (for example 90m or 2h) overrides the wall-clock limit the same way.

Isolation

Prefer a dedicated branch in a separate Git worktree so experiments cannot overwrite unrelated files. Record the starting commit, worktree path, initial status, allowed paths, and session-file paths in autoresearch.md. If isolation is unavailable, require a clean worktree or explicit authorization to share it.

Never use broad cleanup commands such as git clean -fd, git checkout -- ., or a hard reset. Revert only the paths changed by the current experiment, using the recorded pre-experiment state; remove only newly created in-scope files identified by that snapshot. Preserve unrelated tracked and untracked files.

Session Files

Create these inside the experiment worktree:

autoresearch.md: objective, metrics, limits, commands, scope, off-limits paths, baseline, best result, and concise tried/learned notes.
autoresearch.sh: deterministic benchmark that emits METRIC name=value lines.
autoresearch.checks.sh: correctness checks, only when correctness constraints require it.
autoresearch.jsonl: append-only run evidence. Each run record needs a numeric metric (primary metric value; 0 for crashes), a status of keep, discard, crash, or checks_failed, and an integer segment that increments when the primary metric changes; lines without status are treated as config and skipped.
autoresearch.ideas.md: optional backlog for deferred hypotheses.

Use set -euo pipefail in shell helpers. For noisy fast benchmarks, report a median from repeated samples. Keep correctness-check time outside the primary metric. The bundled helpers scripts/confidence.sh [jsonl-path] (MAD-based confidence for the current segment) and scripts/summary.sh [jsonl-path] (session dashboard) read these records.

Workflow

Inspect every in-scope source and the relevant tests or profiling data. Create the isolated worktree/branch and session files, then record a no-change baseline.
For each run, snapshot the allowed paths, choose one focused hypothesis, implement it, and execute the benchmark within the per-command timeout.
Parse the declared primary metric. A missing metric, crash, timeout, or failed correctness check is not an improvement.
Run correctness checks for every benchmark candidate that would otherwise be kept.
Compare against the best valid result:
- Keep a result only when the primary metric improves and every hard constraint passes. Re-run marginal/noisy wins before accepting them.
- Prefer simpler code when results are equivalent; otherwise revert the current experiment's paths only.
Append one JSONL record with run number, commit or snapshot ID, metric, status, segment, elapsed time, estimated paid cost, description, and confidence. Update autoresearch.md when a result changes the best value or rules out an approach.
Between completed run cycles, incorporate user steering immediately. Do not wait for the entire session when the user changes scope, limits, or priorities.
Stop at the first hard limit, user interruption, satisfied target, or convergence condition. Read references/loop-rules.md only for ambiguous keep/discard calls, noise handling, backlog maintenance, or thrash recovery.

Progress and Completion

For long runs, send sparse updates at the baseline, every five completed runs or major best-result change, and final stop. Ground every claim in the current session's logs: current/best metric, runs completed, elapsed time, cost used, and next hypothesis.

Finish with the baseline, best verified result and delta, kept changes, limits reached, checks run, discarded approaches worth remembering, worktree/branch location, and any cleanup or integration action the user still owns. Do not claim convergence when the session merely hit a resource limit.

paulrberg/autoresearch

skills/autoresearch/SKILL.md

Use for autoresearch or "optimize X overnight/in a loop"; sets up bounded iterative trials for a measurable optimization target.

68 stars

research

Updated Jul 18, 2026

$ install --global

skillsauth

npx skillsauth add paulrberg/agent-skills autoresearch

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 18, 2026, 5:51 AM94.3s5 files scanned

SKILL.md

argument-hint:: <goal> [--max-runs N] [--max-runtime DURATION]
disable-model-invocation:: false
name:: autoresearch
user-invocable:: true

Autoresearch

Run isolated experiments, measure them consistently, keep verified improvements, and stop on explicit resource or convergence limits.

Session Contract

Resolve these before the baseline. Infer them from the request and repository when safe; ask only for a missing choice that changes the experiment:

objective and primary metric, including direction;
benchmark command and correctness checks;
files allowed to change and paths that are off limits;
maximum runs, wall-clock runtime, per-command timeout, paid-service cost, and acceptable regression budgets;
convergence rule and any user-requested reporting cadence.

Isolation

Session Files

Create these inside the experiment worktree:

autoresearch.md: objective, metrics, limits, commands, scope, off-limits paths, baseline, best result, and concise tried/learned notes.
autoresearch.sh: deterministic benchmark that emits METRIC name=value lines.
autoresearch.checks.sh: correctness checks, only when correctness constraints require it.
autoresearch.jsonl: append-only run evidence. Each run record needs a numeric metric (primary metric value; 0 for crashes), a status of keep, discard, crash, or checks_failed, and an integer segment that increments when the primary metric changes; lines without status are treated as config and skipped.
autoresearch.ideas.md: optional backlog for deferred hypotheses.

Workflow

Inspect every in-scope source and the relevant tests or profiling data. Create the isolated worktree/branch and session files, then record a no-change baseline.
For each run, snapshot the allowed paths, choose one focused hypothesis, implement it, and execute the benchmark within the per-command timeout.
Parse the declared primary metric. A missing metric, crash, timeout, or failed correctness check is not an improvement.
Run correctness checks for every benchmark candidate that would otherwise be kept.
Compare against the best valid result:
- Keep a result only when the primary metric improves and every hard constraint passes. Re-run marginal/noisy wins before accepting them.
- Prefer simpler code when results are equivalent; otherwise revert the current experiment's paths only.
Append one JSONL record with run number, commit or snapshot ID, metric, status, segment, elapsed time, estimated paid cost, description, and confidence. Update autoresearch.md when a result changes the best value or rules out an approach.
Between completed run cycles, incorporate user steering immediately. Do not wait for the entire session when the user changes scope, limits, or priorities.
Stop at the first hard limit, user interruption, satisfied target, or convergence condition. Read references/loop-rules.md only for ambiguous keep/discard calls, noise handling, backlog maintenance, or thrash recovery.

Progress and Completion

Related Skills

paulrberg/frontend-design

development

VerifiedTrustedCommunity

Use when creating or substantially redesigning web interfaces, landing pages, dashboards, components, or other frontend UI where visual direction and implementation quality matter. Produces subject-specific art direction, accessible responsive code, and rendered visual verification.

68SKILL.mdUpdated Jul 22, 2026

paulrberg/frontend-design

paulrberg/claude-handoff

development

VerifiedTrustedCommunity

Orchestrate one to five Sonnet subagents to implement an approved Claude Code plan.

68SKILL.mdUpdated Jul 21, 2026

paulrberg/claude-handoff

paulrberg/coingecko-open-page

tools

VerifiedTrustedCommunity

Open the CoinGecko historical-data page for a coin/date in Chromium via Chrome DevTools MCP.

68SKILL.mdUpdated Jul 12, 2026

paulrberg/coingecko-open-page

paulrberg/codex-handoff

tools

VerifiedTrustedCommunity

Orchestrate one to five Codex CLI agents to implement an approved Claude Code plan.

68SKILL.mdUpdated Jul 11, 2026

paulrberg/codex-handoff

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/paulrberg/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/autoresearch ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

paulrberg/agent-skills

68 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT