skills/pi-autoresearch-loop/SKILL.md
```markdown --- name: pi-autoresearch-loop description: Autonomous experiment loop for pi that continuously tries optimizations, measures results, and keeps what works triggers: - autoresearch - autonomous experiment loop - optimize automatically - run experiment loop - continuous optimization - benchmark and improve - start autoresearch session - keep what works discard what doesnt --- # pi-autoresearch — Autonomous Experiment Loop > Skill by [ara.so](https://ara.so) — Daily 2
npx skillsauth add aradotso/trending-skills skills/pi-autoresearch-loopInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
---
name: pi-autoresearch-loop
description: Autonomous experiment loop for pi that continuously tries optimizations, measures results, and keeps what works
triggers:
- autoresearch
- autonomous experiment loop
- optimize automatically
- run experiment loop
- continuous optimization
- benchmark and improve
- start autoresearch session
- keep what works discard what doesnt
---
# pi-autoresearch — Autonomous Experiment Loop
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection
Autonomous experiment loop extension for [pi](https://github.com/antiwork/pi). Continuously proposes changes, benchmarks them, commits wins, reverts losses, and repeats — forever. Works for any measurable target: test speed, bundle size, build time, LLM training loss, Lighthouse scores.
---
## Installation
```bash
pi install https://github.com/davebcn87/pi-autoresearch
Manual install:
cp -r extensions/pi-autoresearch ~/.pi/agent/extensions/
cp -r skills/autoresearch-create ~/.pi/agent/skills/
Then /reload in pi.
/skill:autoresearch-create
The agent will:
autoresearch.md and autoresearch.shEvery session is fully recoverable from two files:
| File | Purpose |
|------|---------|
| autoresearch.jsonl | Append-only log — one JSON line per run (metric, status, commit, description) |
| autoresearch.md | Living document — objective, what's been tried, dead ends, key wins |
A fresh agent with zero memory can read these two files and continue exactly where the previous session left off.
| File | Purpose |
|------|---------|
| autoresearch.md | Session document — objective, metrics, files in scope, experiment history |
| autoresearch.sh | Benchmark script — pre-checks, runs the workload, outputs METRIC name=number lines |
| autoresearch.checks.sh | (optional) Backpressure checks — tests, types, lint. Failures block keep |
init_experimentOne-time session configuration. Call once at session start.
await init_experiment({
name: "vitest-speed",
metric: "seconds",
unit: "s",
direction: "lower", // "lower" | "higher"
});
run_experimentRuns any shell command, times wall-clock duration, captures stdout/stderr.
const result = await run_experiment({
command: "pnpm test --run",
timeout_seconds: 120, // optional, default 300
checks_timeout_seconds: 300, // optional, for checks script
});
// result: { exit_code, duration_seconds, stdout, stderr }
log_experimentRecords result, auto-commits on keep, updates the status widget and dashboard.
await log_experiment({
metric_value: 42.3,
status: "keep", // "keep" | "discard" | "crash" | "checks_failed"
description: "Enable parallel test workers in vitest config",
commit_message: "perf: parallel vitest workers → 42.3s (-18%)",
});
Once started, the agent runs this cycle indefinitely:
propose change → edit files → run_experiment → measure metric
↓
metric improved?
YES → log_experiment(keep) → auto-commit → update autoresearch.md
NO → log_experiment(discard) → git revert → try next idea
↓
repeat forever (until interrupted)
Interrupt anytime with Escape, then ask for a summary of what was tried.
autoresearch.sh must output at least one METRIC line:
#!/bin/bash
set -euo pipefail
# Pre-checks
[ -f package.json ] || { echo "No package.json"; exit 1; }
# Run workload
pnpm test --run
# Output metric — required format
echo "METRIC seconds=$SECONDS"
Multiple metrics are supported:
echo "METRIC duration_seconds=42.3"
echo "METRIC test_count=847"
echo "METRIC memory_mb=512"
The primary metric (set in init_experiment) drives keep/discard decisions. Others are recorded for analysis.
Create autoresearch.checks.sh to guard correctness after every passing benchmark:
#!/bin/bash
set -euo pipefail
pnpm test --run # full test suite
pnpm typecheck # TypeScript
pnpm lint # ESLint / Biome
Behavior:
checks_failed, changes reverted (same as crash)checks_failed separately from crash so you can distinguish correctness failures from benchmark errorsAlways visible above the editor:
🔬 autoresearch 12 runs 8 kept │ best: 42.3s
Open with /autoresearch — full results table with status, metric values, descriptions, and best run highlighted.
Ctrl+X — toggle dashboardEscape — close dashboard / interrupt loop// Test speed
{
command: "pnpm test --run",
metric: "seconds",
direction: "lower",
scope: ["vitest.config.ts", "src/**/*.test.ts"],
}
// Bundle size
{
command: "pnpm build && du -sb dist | cut -f1",
metric: "bytes",
direction: "lower",
scope: ["vite.config.ts", "src/index.ts"],
}
// LLM training loss
{
command: "uv run train.py --epochs 1",
metric: "val_bpb",
direction: "lower",
scope: ["train.py", "model.py", "config.yaml"],
}
// Build speed
{
command: "pnpm build",
metric: "seconds",
direction: "lower",
scope: ["tsconfig.json", "vite.config.ts"],
}
// Lighthouse performance
{
command: "lighthouse http://localhost:3000 --output=json | jq '.categories.performance.score'",
metric: "score",
direction: "higher",
scope: ["src/pages/index.tsx", "public/"],
}
The skill writes and maintains this file throughout the session:
# autoresearch: vitest-speed
## Objective
Reduce test suite wall-clock time. Baseline: 51.7s.
## Metric
- Name: seconds
- Direction: lower is better
- Baseline: 51.7s
- Best so far: 42.3s (run 8)
## Files in scope
- vitest.config.ts
- src/**/*.test.ts
## What's been tried
- [kept] Run 8: Enable parallel workers → 42.3s (-18%)
- [discarded] Run 5: Increase pool size to 16 → 53.1s (+3%)
- [kept] Run 3: Disable coverage in CI → 47.8s (-8%)
## Dead ends
- Increasing pool beyond 8 causes memory pressure, net negative
## Next ideas
- [ ] Try forks pool instead of threads
- [ ] Investigate slow test files with --reporter=verbose
One JSON object per line:
{"run":1,"metric_value":51.7,"status":"keep","description":"baseline","commit":"a1b2c3d","timestamp":"2025-01-15T10:00:00Z"}
{"run":2,"metric_value":49.2,"status":"keep","description":"disable coverage","commit":"e4f5g6h","timestamp":"2025-01-15T10:03:21Z"}
{"run":3,"metric_value":53.1,"status":"discard","description":"increase pool to 16","commit":null,"timestamp":"2025-01-15T10:07:45Z"}
{"run":4,"metric_value":null,"status":"crash","description":"invalid vitest config syntax","commit":null,"timestamp":"2025-01-15T10:09:12Z"}
Read the log programmatically:
import { readFileSync } from "fs";
const runs = readFileSync("autoresearch.jsonl", "utf-8")
.trim()
.split("\n")
.map((line) => JSON.parse(line));
const kept = runs.filter((r) => r.status === "keep");
const best = kept.reduce((a, b) =>
a.metric_value < b.metric_value ? a : b
);
console.log(`Best: ${best.metric_value} — ${best.description}`);
The agent can resume from either file. Recommended resume prompt:
Read autoresearch.jsonl and autoresearch.md, then continue the experiment loop.
Don't restart — pick up from run N and keep going.
Or use the skill:
/skill:autoresearch-create resume
┌──────────────────────┐ ┌──────────────────────────┐
│ Extension (global) │ │ Skill (per-domain) │
│ │ │ │
│ run_experiment │◄────│ command: pnpm test │
│ log_experiment │ │ metric: seconds (lower) │
│ widget + dashboard │ │ scope: vitest configs │
│ │ │ ideas: pool, parallel… │
└──────────────────────┘ └──────────────────────────┘
│
▼
autoresearch.jsonl ← append-only run log
autoresearch.md ← living session document
The extension is domain-agnostic infrastructure. The skill encodes domain knowledge. One extension serves unlimited domains.
Loop not starting after skill runs
autoresearch.sh is executable: chmod +x autoresearch.shMETRIC name=number line on successbash autoresearch.sh manually to debugWidget not showing
/reload in pi to reload the extension~/.pi/agent/extensions/pi-autoresearch/run_experiment times out
timeout_seconds in your run_experiment callChecks script blocking everything
autoresearch.checks.sh exit codes manually: bash autoresearch.checks.shchecks_timeout_seconds if tests are slowSession lost after context reset
autoresearch.jsonl + autoresearch.md to resumeMetric value not captured
METRIC line must be on stdout, not stderrMETRIC name=number (no spaces around =)MIT
development
```markdown --- name: compose-performance-skills description: Install and use the skydoves/compose-performance-skills agent skill library to diagnose and fix Jetpack Compose performance issues including stability, recomposition, lazy layouts, modifiers, side effects, and build configuration. triggers: - "my composable recomposes too often" - "LazyColumn drops frames during scroll" - "diagnose Compose stability issues" - "fix unnecessary recomposition in Jetpack Compose" - "optimize Com
development
Headless iOS Simulator manager with host-side HID input injection, 60fps streaming, and device farm web UI for iOS 26
development
```markdown --- name: claude-code-game-studios description: Turn Claude Code into a full 49-agent game dev studio with 72 workflow skills, automated hooks, and a real studio hierarchy for Godot, Unity, and Unreal projects. triggers: - "set up claude code game studios" - "use ai agents for game development" - "set up game dev studio with claude" - "add game studio agents to my project" - "how do I use claude code for game dev" - "set up godot unity unreal ai workflow" - "49 agents g
development
```markdown --- name: xq-py-quantum-vm description: Python implementation of the Quip Network's quantum virtual machine (xqvm) triggers: - quantum virtual machine python - xqvm quip network - quantum circuit simulation python - xq-py quantum vm - quip network quantum python - simulate quantum gates python - quantum vm xqvm - xqvm-py quantum circuit --- # xq-py Quantum Virtual Machine > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. `xqvm-py` is a Python impl