Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

special-place-administrator/experiment

Name: experiment
Author: special-place-administrator

skills/experiment/SKILL.md

npx skillsauth add special-place-administrator/citadel_codex experiment

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

experiment — Metric-Driven Optimization Loop

Identity

experiment is an automated optimization loop with a scalar fitness function. It takes a hypothesis, runs isolated experiments in git worktrees, measures results with a metric command, and keeps improvements or discards failures. Think of it as automated A/B testing for code changes.

Inputs

The user provides three things:

scope: Files to modify (glob pattern, e.g., "src/api/**/*.ts")
metric: Shell command that outputs a single number (e.g., npm run build 2>&1 | tail -1 | grep -oP '\d+')
budget: Iteration cap (default: 5) or time cap (e.g., "10 minutes")

If any input is missing, ask for it. The metric MUST output a single number to stdout.

Protocol

Step 1: BASELINE

Stash any uncommitted changes (restore on exit)
Run the metric command. Record the baseline value.
Determine direction: does lower = better (bundle size, error count) or higher = better (FPS, test count)? Ask the user if ambiguous.
Log: Baseline: {value} ({metric command})

Step 2: ITERATE

For each iteration (up to budget):

Create isolation: Spawn a sub-agent in a worktree (isolation: "worktree")
Propose change: The agent modifies files within scope to improve the metric. Provide context: baseline value, metric direction, scope, what previous iterations tried.
Measure: Run the metric command in the worktree
Gate: Run typecheck. If it fails, discard immediately.
Evaluate:
- Improved? → KEEP. Merge the worktree branch. New baseline = new value.
- Same or worse? → DISCARD. Delete the worktree.

Log iteration:

Iteration {N}: {value} ({delta from baseline}) → {KEEP|DISCARD}
Change: {one-line description of what was tried}

Step 3: CONVERGENCE CHECK

After each iteration, check:

Local optimum: Last 3 iterations all discarded → stop ("no more improvements found")
Diminishing returns: Last kept improvement was < 0.5% → stop ("diminishing returns")
Budget exhausted: Iteration count or time exceeded → stop

Step 4: REPORT

Write results to .citadel/research/experiment-{slug}.md:

# Experiment: {Description}

> Metric: `{command}`
> Direction: {lower|higher} is better
> Scope: {glob pattern}
> Budget: {N iterations}
> Date: {ISO date}

## Results

| Iteration | Value | Delta | Verdict | Change |
|-----------|-------|-------|---------|--------|
| baseline  | {N}   | —     | —       | —      |
| 1         | {N}   | {+/-} | KEEP    | {desc} |
| 2         | {N}   | {+/-} | DISCARD | {desc} |

## Outcome
- **Start**: {baseline}
- **End**: {final value}
- **Improvement**: {percentage}
- **Iterations**: {kept}/{total}
- **Stop reason**: {convergence|diminishing|budget}

## Kept Changes
{List of changes that were kept, with commit hashes}

Also log to .citadel/telemetry/agent-runs.jsonl:

{"event":"experiment-complete","slug":"{slug}","baseline":0,"final":0,"improvement":"0%","kept":0,"total":0,"timestamp":"ISO"}

Common Metrics

| Goal | Metric Command | |------|---------------| | Reduce bundle size | npm run build 2>&1 \| grep -oP 'Total size: \K\d+' | | Reduce type errors | npx tsc --noEmit 2>&1 \| grep -c 'error TS' | | Increase test pass rate | npm test 2>&1 \| grep -oP '\d+ passing' | | Reduce file count | find src -name '*.ts' \| wc -l | | Reduce line count | wc -l src/**/*.ts \| tail -1 \| awk '{print $1}' |

Safety Rules

NEVER modify files outside scope
ALWAYS use worktree isolation for changes
ALWAYS run typecheck before keeping a change
Restore stashed changes on exit (even on error)
If the metric command fails, treat as DISCARD (not crash)

Exit Protocol

---HANDOFF---
- Experiment: {description}
- Result: {baseline} → {final} ({improvement}%)
- Kept: {N}/{total} iterations
- Stop reason: {reason}
- Report: .citadel/research/experiment-{slug}.md
---

special-place-administrator/experiment

skills/experiment/SKILL.md

Automated optimization loop with scalar fitness function. Proposes changes in isolated worktrees, measures with a metric command, keeps improvements, discards failures. Supports convergence detection and diminishing returns.

data-ai

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add special-place-administrator/citadel_codex experiment

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 12:59 AM22.9s1 file scanned

SKILL.md

name:: experiment
description:: >-
user-invocable:: true
auto-trigger:: false
last-updated:: 2026-03-21

experiment — Metric-Driven Optimization Loop

Identity

Inputs

The user provides three things:

scope: Files to modify (glob pattern, e.g., "src/api/**/*.ts")
metric: Shell command that outputs a single number (e.g., npm run build 2>&1 | tail -1 | grep -oP '\d+')
budget: Iteration cap (default: 5) or time cap (e.g., "10 minutes")

If any input is missing, ask for it. The metric MUST output a single number to stdout.

Protocol

Step 1: BASELINE

Stash any uncommitted changes (restore on exit)
Run the metric command. Record the baseline value.
Determine direction: does lower = better (bundle size, error count) or higher = better (FPS, test count)? Ask the user if ambiguous.
Log: Baseline: {value} ({metric command})

Step 2: ITERATE

For each iteration (up to budget):

Create isolation: Spawn a sub-agent in a worktree (isolation: "worktree")
Propose change: The agent modifies files within scope to improve the metric. Provide context: baseline value, metric direction, scope, what previous iterations tried.
Measure: Run the metric command in the worktree
Gate: Run typecheck. If it fails, discard immediately.
Evaluate:
- Improved? → KEEP. Merge the worktree branch. New baseline = new value.
- Same or worse? → DISCARD. Delete the worktree.

Log iteration:

Iteration {N}: {value} ({delta from baseline}) → {KEEP|DISCARD}
Change: {one-line description of what was tried}

Step 3: CONVERGENCE CHECK

After each iteration, check:

Local optimum: Last 3 iterations all discarded → stop ("no more improvements found")
Diminishing returns: Last kept improvement was < 0.5% → stop ("diminishing returns")
Budget exhausted: Iteration count or time exceeded → stop

Step 4: REPORT

Write results to .citadel/research/experiment-{slug}.md:

# Experiment: {Description}

> Metric: `{command}`
> Direction: {lower|higher} is better
> Scope: {glob pattern}
> Budget: {N iterations}
> Date: {ISO date}

## Results

| Iteration | Value | Delta | Verdict | Change |
|-----------|-------|-------|---------|--------|
| baseline  | {N}   | —     | —       | —      |
| 1         | {N}   | {+/-} | KEEP    | {desc} |
| 2         | {N}   | {+/-} | DISCARD | {desc} |

## Outcome
- **Start**: {baseline}
- **End**: {final value}
- **Improvement**: {percentage}
- **Iterations**: {kept}/{total}
- **Stop reason**: {convergence|diminishing|budget}

## Kept Changes
{List of changes that were kept, with commit hashes}

Also log to .citadel/telemetry/agent-runs.jsonl:

{"event":"experiment-complete","slug":"{slug}","baseline":0,"final":0,"improvement":"0%","kept":0,"total":0,"timestamp":"ISO"}

Common Metrics

Safety Rules

NEVER modify files outside scope
ALWAYS use worktree isolation for changes
ALWAYS run typecheck before keeping a change
Restore stashed changes on exit (even on error)
If the metric command fails, treat as DISCARD (not crash)

Exit Protocol

---HANDOFF---
- Experiment: {description}
- Result: {baseline} → {final} ({improvement}%)
- Kept: {N}/{total} iterations
- Stop reason: {reason}
- Report: .citadel/research/experiment-{slug}.md
---

Related Skills

special-place-administrator/triage

development

VerifiedTrustedCommunity

GitHub issue and PR investigator. Pulls open issues/PRs, classifies them, searches the codebase for root cause or reviews contributed code, proposes fixes with file:line references, and optionally implements fixes. Handles both issues and pull requests.

SKILL.mdUpdated Apr 23, 2026

special-place-administrator/triage

special-place-administrator/test-gen

development

VerifiedTrustedCommunity

Generate and verify tests — happy path, edge cases, error paths — using the project's own framework and patterns

SKILL.mdUpdated Apr 23, 2026

special-place-administrator/test-gen

special-place-administrator/systematic-debugging

development

VerifiedTrustedCommunity

Four-phase root cause analysis: observe, hypothesize, verify, fix. Enforces investigation before code changes and stops guess-and-check debugging.

SKILL.mdUpdated Apr 23, 2026

special-place-administrator/systematic-debugging

special-place-administrator/setup

testing

VerifiedTrustedCommunity

First-run experience for the harness. Detects the project stack, scaffolds the .citadel/ state directory, generates configuration, runs one real task as a demo, and prints a reference card of all available skills. Gets someone from install to first `do` command in 5 minutes.

SKILL.mdUpdated Apr 23, 2026

special-place-administrator/setup

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/special-place-administrator/citadel_codex.git

# Copy into Claude Code skills folder (global)
cp -r citadel_codex/skills/experiment ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

special-place-administrator/citadel_codex

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT