skills/rad-experiment/SKILL.md
Knowledge about rad-experiment CLI and cc.experiment COBs — publishing, reproducing, and curating optimization experiments in Radicle repos. Use when working with rad-experiment, experiment COBs, publish-tape, autoresearch publishing, or cc.experiment.
npx skillsauth add deanh/rad-pi rad-experimentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
TRIGGER when: user mentions rad-experiment, experiment COBs, publish-tape, publish-evo, autoresearch publishing, cc.experiment, or benchmark optimization workflows.
This skill provides knowledge about Experiment COBs (cc.experiment) — a Collaborative Object type for AI-generated optimization experiments in Radicle repositories.
Experiment COBs are first-class collaborative objects that capture optimization experiments — a before/after benchmark comparison produced by an AI agent or human. They enable:
publish-tape) and evo (publish-evo) session formatscc.experiment
Stored under refs/cobs/cc.experiment/<EXPERIMENT-ID> in the Git repository.
Each experiment COB contains:
| Field | Description |
|-------|-------------|
| description | Hypothesis: what was tried and why |
| base | Base (baseline) commit OID |
| oid | Candidate (head) commit OID |
| metrics | Primary + secondary metrics with measurements |
| env | Auto-detected environment (CPU arch, OS, CPU brand, RAM) |
| schema_version | COB schema version (current: 5) |
| reproductions | Independent reproductions by other peers |
| labels | Curation labels (delegates only) |
| redacted | Whether the experiment has been redacted |
Each metric in the experiment:
| Field | Description |
|-------|-------------|
| name | Metric name (must match autoresearch.yaml / optimize.yaml) |
| unit | Unit string (e.g. "ms", "µs", "" for unitless) |
| criteria | lower_is_better or higher_is_better |
| baseline | Baseline measurement (median × 1000, std, samples, n) |
| candidate | Candidate measurement (median × 1000, std, samples, n) |
| is_primary | Whether this is the primary optimization target |
1.500 s → 1500, 14.327 ms → 14327. This avoids floating-point drift in COB serialization.+5.00%; a throughput climb of 1.0 → 1.1 also shows as +10.00%.Publish a new experiment with explicit measurements (publishes immediately — no confirmation prompt):
rad-experiment publish \
--base <SHA> --head <SHA> \
--metric duration_ms \
--baseline-median 1500 --baseline-n 5 \
--candidate-median 1425 --candidate-n 5 \
--description "Hoist allocation out of inner loop"
With secondary metrics, per-run samples, and environment overrides:
rad-experiment publish \
--base 9b32764 --head 5574144 \
--metric duration_ms \
--baseline-median 1500 --baseline-std 23 \
--baseline-samples 1488,1502,1497,1510,1503 --baseline-n 5 \
--candidate-median 1425 --candidate-std 18 \
--candidate-samples 1420,1432,1418,1428,1425 --candidate-n 5 \
--secondary "binary_size_bytes:1000000:950000" \
--description "Hoist allocation"
When publishing from a source without autoresearch.yaml at the base commit (e.g. publish-tape), provide --unit and --criteria:
rad-experiment publish \
--base 9b32764 --head 5574144 \
--metric total_us --unit µs --criteria lower_is_better \
--baseline-median 15200 --baseline-n 5 \
--candidate-median 13800 --candidate-n 5
Import a pi-autoresearch autoresearch.jsonl session file as COBs. This is the primary integration point for autoresearch workflows.
# Dry-run — show what would be published
rad-experiment publish-tape autoresearch.jsonl --dry-run
# Publish every unpublished keep result
rad-experiment publish-tape autoresearch.jsonl --yes
For each segment (delimited by type:config header lines), the first result is the segment baseline. Every subsequent result with status:keep becomes a published experiment. Discards, crashes, and checks_failed results are skipped (their code was already reverted).
Idempotent: an index file at <jsonl_parent>/.cc-experiment/published.json tracks which (base,head) pairs have been published. Re-running the command only publishes new results.
Import an evo-hq/evo .evo/ session directory as COBs:
rad-experiment publish-evo .evo --dry-run
rad-experiment publish-evo .evo --yes
rad-experiment list # all experiments, grouped by branch
rad-experiment list --json # JSONL output for piping to jq
rad-experiment list --reproduced # only reproduced
rad-experiment list --unmerged # branches not yet in main
rad-experiment list --landable # branches that 3-way merge cleanly
rad-experiment list --author z6MkfEaY # by author (DID prefix)
rad-experiment list --label shipped # by label
rad-experiment list --since 2026-04-01 # since date
rad-experiment list --delegates-only # only delegates
rad-experiment show <ID>
rad-experiment show <ID> --json
rad-experiment show <ID> --diff # include code diff
WARNING: reproduction runs untrusted code. It checks out a branch you may not control and executes its bench_cmd. Review the candidate diff first, or run inside a container/VM.
# Auto mode — re-runs benchmarks from autoresearch.yaml
rad-experiment reproduce <ID>
rad-experiment reproduce <ID> --runs 10
# Manual mode — provide your own measurements
rad-experiment reproduce <ID> \
--baseline-median 1498 --baseline-n 5 \
--candidate-median 1430 --candidate-n 5 \
--notes "warm cache, perf governor"
Stateless helper — runs benchmarks on a worktree, outputs JSON. Does not touch the COB store.
rad-experiment benchmark \
--worktree /tmp/repo-base --config autoresearch.yaml \
--runs 5 --label baseline > /tmp/baseline.json
Stateless helper — computes direction-aware deltas from two benchmark JSON files.
rad-experiment compute-delta \
--baseline /tmp/baseline.json --candidate /tmp/candidate.json \
--config autoresearch.yaml \
--base-commit 9b32764 --head-commit 5574144 \
--description "Hoist allocation" --pending=false
Add or remove labels (delegates only):
rad-experiment label <ID> shipped
rad-experiment label <ID> reviewed nominated
rad-experiment label <ID> nominated --remove
List all labels in use across the repo:
rad-experiment labels
rad-experiment labels --json
Mark an experiment as unreliable (not a delete — still replicates, but hidden by default):
rad-experiment redact <ID>
rad-experiment redact <ID> --reason "benchmark used a stale input dataset"
publish, publish-tape, publish-evo, reproduce, label, redact) call announce_refs_for to broadcast new refs to peerspublish additionally pins base and candidate commits under refs/heads/experiments/{oid} via git push rad so peers receive the actual git objectsrad syncThe main workflow for publishing autoresearch results to the community-computer network has two required steps:
The autoresearch branch must be pushed so peers can access the git objects (commits, diffs):
git push rad <branch-name>
Or push the current branch:
git push rad HEAD
After the autoresearch session completes (or at any point during it), publish the kept experiments:
rad-experiment publish-tape autoresearch.jsonl --yes
# 1. Run autoresearch (handled by the autoresearch skill)
# - Creates autoresearch.md, autoresearch.sh
# - Runs experiments, keeps improvements, discards regressions
# - All results logged to autoresearch.jsonl
# 2. Push the branch to Radicle
git push rad autoresearch/optimize-liquid
# 3. Publish the experiment tape
rad-experiment publish-tape autoresearch.jsonl --yes
# 4. Sync with network
rad sync --announce
status:keep are publishedchecks_failed results are skippedautoresearch.yaml required at the base commitpublish-tape is idempotent — only new results are publishedThe benchmark configuration file at the repository root (also accepted as optimize.yaml for backward compatibility):
bench_cmd: ./autoresearch.sh
metrics:
- name: total_us
unit: µs
criteria: lower_is_better
regex: "METRIC total_µs=(\\d+)"
- name: compile_µs
unit: µs
criteria: lower_is_better
regex: "METRIC compile_µs=(\\d+)"
bench_dir defaults to "bench" if omittedbuild_cmd and test_cmd fields for pre-benchmark steps# Recommended — prebuilt binary
curl -sSf https://community.computer/install | sh
# Or build from source
rad clone rad:z3trgPnc9KqoFHpZj8KD9s7iX7nwX
cd radicle-experiment
cargo install --path .
# Verify
rad-experiment --version
Ensure rad-experiment is on $PATH. Extensions that need it can register { name: "rad-experiment" } with detectTools() from rad-shared.ts.
development
This skill should be used when the user asks to "initialize a radicle repo", "rad init", "create a patch", "open a patch", "rad patch", "clone from radicle", "rad clone", "work with radicle issues", "rad issue", "start radicle node", "rad node", "seed a repository", "sync with radicle", "push to radicle", "collaborate on radicle", or mentions RIDs, DIDs, patches, seeding, or peer-to-peer code collaboration.
tools
Knowledge about Radicle Plan COBs (me.hdh.plan) - a custom Collaborative Object type for storing implementation plans in Radicle repositories. Use when working with rad-plan, plan COBs, or implementation planning in Radicle.
data-ai
Autonomous issue worker loops - direct issue implementation and label-driven plan creation. Use when you want to run autonomous agents that process Radicle issues, create plans, and orchestrate execution.
data-ai
Knowledge about Radicle Context COBs (me.hdh.context) - a custom Collaborative Object type for storing AI session observations in Radicle repositories. Use when working with rad-context, context COBs, or preserving session learnings.