skills/experiment/SKILL.md
Automated optimization loop with scalar fitness function. Proposes changes in isolated worktrees, measures with a metric command, keeps improvements, discards failures. Supports convergence detection and diminishing returns.
npx skillsauth add special-place-administrator/citadel_codex experimentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
experiment is an automated optimization loop with a scalar fitness function.
It takes a hypothesis, runs isolated experiments in git worktrees, measures results
with a metric command, and keeps improvements or discards failures. Think of it as
automated A/B testing for code changes.
The user provides three things:
npm run build 2>&1 | tail -1 | grep -oP '\d+')If any input is missing, ask for it. The metric MUST output a single number to stdout.
Baseline: {value} ({metric command})For each iteration (up to budget):
isolation: "worktree")Iteration {N}: {value} ({delta from baseline}) → {KEEP|DISCARD}
Change: {one-line description of what was tried}
After each iteration, check:
Write results to .citadel/research/experiment-{slug}.md:
# Experiment: {Description}
> Metric: `{command}`
> Direction: {lower|higher} is better
> Scope: {glob pattern}
> Budget: {N iterations}
> Date: {ISO date}
## Results
| Iteration | Value | Delta | Verdict | Change |
|-----------|-------|-------|---------|--------|
| baseline | {N} | — | — | — |
| 1 | {N} | {+/-} | KEEP | {desc} |
| 2 | {N} | {+/-} | DISCARD | {desc} |
## Outcome
- **Start**: {baseline}
- **End**: {final value}
- **Improvement**: {percentage}
- **Iterations**: {kept}/{total}
- **Stop reason**: {convergence|diminishing|budget}
## Kept Changes
{List of changes that were kept, with commit hashes}
Also log to .citadel/telemetry/agent-runs.jsonl:
{"event":"experiment-complete","slug":"{slug}","baseline":0,"final":0,"improvement":"0%","kept":0,"total":0,"timestamp":"ISO"}
| Goal | Metric Command |
|------|---------------|
| Reduce bundle size | npm run build 2>&1 \| grep -oP 'Total size: \K\d+' |
| Reduce type errors | npx tsc --noEmit 2>&1 \| grep -c 'error TS' |
| Increase test pass rate | npm test 2>&1 \| grep -oP '\d+ passing' |
| Reduce file count | find src -name '*.ts' \| wc -l |
| Reduce line count | wc -l src/**/*.ts \| tail -1 \| awk '{print $1}' |
---HANDOFF---
- Experiment: {description}
- Result: {baseline} → {final} ({improvement}%)
- Kept: {N}/{total} iterations
- Stop reason: {reason}
- Report: .citadel/research/experiment-{slug}.md
---
development
GitHub issue and PR investigator. Pulls open issues/PRs, classifies them, searches the codebase for root cause or reviews contributed code, proposes fixes with file:line references, and optionally implements fixes. Handles both issues and pull requests.
development
Generate and verify tests — happy path, edge cases, error paths — using the project's own framework and patterns
development
Four-phase root cause analysis: observe, hypothesize, verify, fix. Enforces investigation before code changes and stops guess-and-check debugging.
testing
First-run experience for the harness. Detects the project stack, scaffolds the .citadel/ state directory, generates configuration, runs one real task as a demo, and prints a reference card of all available skills. Gets someone from install to first `do` command in 5 minutes.