skills/experiment/SKILL.md
Automated optimization loop with scalar fitness function. Proposes changes in isolated worktrees, measures with a metric command, keeps improvements, discards failures. Supports convergence detection and diminishing returns.
npx skillsauth add SethGammon/Citadel experimentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The user provides three things:
npm run build 2>&1 | tail -1 | grep -oP '\d+')If any input is missing, ask for it. The metric MUST output a single number to stdout.
Baseline: {value} ({metric command})For each iteration (up to budget):
isolation: "worktree")node scripts/run-with-timeout.js 300)Iteration {N}: {value} ({delta from baseline}) → {KEEP|DISCARD}
Change: {one-line description of what was tried}
After each iteration, check:
Write results to .planning/research/experiment-{slug}.md:
# Experiment: {Description}
> Metric: `{command}`
> Direction: {lower|higher} is better
> Scope: {glob pattern}
> Budget: {N iterations}
> Date: {ISO date}
## Results
| Iteration | Value | Delta | Verdict | Change |
|-----------|-------|-------|---------|--------|
| baseline | {N} | — | — | — |
| 1 | {N} | {+/-} | KEEP | {desc} |
| 2 | {N} | {+/-} | DISCARD | {desc} |
## Outcome
- **Start**: {baseline}
- **End**: {final value}
- **Improvement**: {percentage}
- **Iterations**: {kept}/{total}
- **Stop reason**: {convergence|diminishing|budget}
## Kept Changes
{List of changes that were kept, with commit hashes}
Also log to .planning/telemetry/agent-runs.jsonl:
{"event":"experiment-complete","slug":"{slug}","baseline":0,"final":0,"improvement":"0%","kept":0,"total":0,"timestamp":"ISO"}
| Goal | Metric Command |
|------|---------------|
| Reduce bundle size | npm run build 2>&1 \| grep -oP 'Total size: \K\d+' |
| Reduce type errors | npx tsc --noEmit 2>&1 \| grep -c 'error TS' |
| Increase test pass rate | npm test 2>&1 \| grep -oP '\d+ passing' |
| Reduce file count | find src -name '*.ts' \| wc -l |
| Reduce line count | wc -l src/**/*.ts \| tail -1 \| awk '{print $1}' |
Disclosure: "Running experiment loop on [target] with fitness: [function]. Each iteration commits. Budget: [N iterations]."
Reversibility: amber — modifies source files across iterations; each iteration is committed; undo with git revert on kept commits.
Trust gates:
.planning/research/experiment-{slug}.md with all iteration rows filledMetric command outputs nothing or non-numeric text: Treat as a metric failure. Ask the user to provide a command that outputs a single number to stdout before starting iterations.
No worktree support (e.g., shallow clone): Fall back to branch isolation. Create a branch, run changes there, measure, then delete or merge the branch. Never modify the working tree directly.
If .planning/research/ does not exist: Create it before writing the experiment report. If .planning/ itself doesn't exist, create the full path or output the report inline.
Budget exhausted with zero kept iterations: Report outcome as "no improvement found". This is a valid result — do not continue past the budget.
---HANDOFF---
- Experiment: {description}
- Result: {baseline} → {final} ({improvement}%)
- Kept: {N}/{total} iterations
- Stop reason: {reason}
- Report: .planning/research/experiment-{slug}.md
- Reversibility: amber — undo kept iterations with `git revert` on each kept commit
---
development
GitHub issue and PR investigator. Pulls open issues/PRs, classifies them, searches the codebase for root cause or reviews contributed code, proposes fixes with file:line references, and optionally implements fixes. Use for investigating GitHub issues and reviewing PRs; do NOT use for general code review unrelated to GitHub issues.
development
Unified telemetry hub. Shows current session cost, today's spend, all-time totals, hook activity, trust level, and a directory of every telemetry command available. Also the control surface to toggle telemetry on/off and tune thresholds. Single entry point for anyone asking "what does this cost" or "what telemetry does Citadel have".
devops
Manages recurring and one-off scheduled tasks. Session-scoped scheduling via CronCreate/CronDelete/CronList. Documents the cloud path for tasks that need to survive machine sleep or network drops.
tools
Browser-based QA verification. Launches a real browser, navigates the app, clicks buttons, fills forms, and tests user flows. Works as a standalone skill or as a phase end condition in campaigns. Requires Playwright (optional dependency, graceful skip if not installed).