skills/slot-machine/SKILL.md
Use when a well-specified task has meaningful design choices and you want to maximize quality by comparing multiple independent attempts. Works for coding, writing, and custom task types. Triggers on "slot-machine", "best-of-N", "pull the lever", "parallel implementations", or when quality matters more than speed and the spec is clear enough for independent work.
npx skillsauth add pejmanjohn/slot-machine slot-machineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Best-of-N parallel implementation for any task type.
Run N independent attempts at the same spec in parallel. Review each. Pick the best — or synthesize the best elements into a single winner.
Core principle: LLMs are probabilistic. More attempts = better outcomes. Trade compute for quality.
Announce at start: "I'm using the slot-machine skill ({profile_name} profile) to run N parallel implementations."
Standard multi-agent patterns split DIFFERENT tasks across agents (frontend, backend, tests in parallel). Every major tool does this — it's table stakes.
Slot-machine gives the SAME spec to N agents and compares their FULL attempts. The value isn't parallelism — it's competition and selection. Each slot is an independent attempt at the same task, not a piece of a divided workload. This applies to any task type — coding, writing, or custom profiles.
If you want to split a plan into parallel tasks, use superpowers:dispatching-parallel-agents instead.
digraph when_to_use {
"Have a clear spec?" [shape=diamond];
"Design choices exist?" [shape=diamond];
"Quality worth the compute?" [shape=diamond];
"Use slot-machine" [shape=box, style=bold];
"Write spec first (brainstorm)" [shape=box];
"Single implementation" [shape=box];
"Single implementation (mechanical)" [shape=box];
"Have a clear spec?" -> "Design choices exist?" [label="yes"];
"Have a clear spec?" -> "Write spec first (brainstorm)" [label="no"];
"Design choices exist?" -> "Quality worth the compute?" [label="yes"];
"Design choices exist?" -> "Single implementation (mechanical)" [label="no"];
"Quality worth the compute?" -> "Use slot-machine" [label="yes"];
"Quality worth the compute?" -> "Single implementation" [label="no"];
}
Use when:
Don't use when:
Project config can live in AGENTS.md, CLAUDE.md, or both; treat them as equal first-class sources. Read whichever exists, or both if both exist. When both exist, merge non-conflicting slot-machine config from both files. The prefixed project keys are slot-machine-profile and slot-machine-slots; the remaining settings from the table below use their bare names (slots, quiet, cleanup, manual_handoff, auto_synthesize, max_retries, approach_hints, and the *_model overrides). If both files define the same key, prefer the file for the active host: AGENTS.md in Codex, CLAUDE.md in Claude. User can override inline (e.g., "slot-machine this with 3 slots").
| Setting | Default | Description |
|---------|---------|-------------|
| slots | 3 | Number of parallel attempts |
| approach_hints | true | Give each slot a different architectural direction |
| auto_synthesize | true | Allow judge to combine elements from multiple slots |
| max_retries | 1 | Re-run failed slots (0 = no retry) |
| manual_handoff | false | Stop after per-slot review and hand reviewed candidates back to the user for manual selection and merge |
| cleanup | true | Delete worktrees after completion |
| quiet | false | Suppress progress tables — only show final verdict + output path. For autonomous loops. |
| implementer_model | inherit | Model for implementer subagents (inherits from session if not set) |
| reviewer_model | inherit | Model for reviewer subagents (inherits from session if not set) |
| judge_model | inherit | Model for judge subagent (inherits from session if not set) |
| synthesizer_model | inherit | Model for synthesizer subagent (inherits from session if not set) |
The orchestrator trace is the normalized execution record for the orchestrator itself. It is not a shared slot context channel and must not be repurposed as one; slot-local prompts, outputs, and review artifacts remain separate from this trace.
Read references/orchestrator-trace.md before creating or updating trace/history artifacts.
Read references/harness-execution.md before using Claude or Codex external harness execution paths.
Read references/result-artifacts.md before writing final run artifacts, history pointers, or handoff files.
Core per-run trace artifacts:
{RUN_DIR}/events.jsonl{RUN_DIR}/state.jsonCross-run discovery artifacts:
.slot-machine/history/active.json.slot-machine/history/latest.json.slot-machine/history/index.jsonlInline contract:
{RUN_DIR}/events.jsonl is append-only. Every event record must include schema_version, seq, ts, run_id, phase, and event; include slot, attempt, and data when applicable.run_started, phase_entered, artifact_written, slot_dispatched, slot_finished, slot_retry_scheduled, precheck_started, precheck_finished, review_dispatched, review_finished, judge_dispatched, judge_finished, synthesis_dispatched, synthesis_finished, cleanup_started, cleanup_finished, run_finished, run_failed.{RUN_DIR}/state.json is the current snapshot and must include "schema_version", "run_id", "status", "current_phase", "run_dir", "events_path", "state_path", "result_path", and "last_event_seq"..slot-machine/history/active.json is the pointer/current-run metadata document, not a copy of {RUN_DIR}/state.json. While active it must include "schema_version", "status": "running", "run_id", "run_dir", "events_path", "state_path", "started_at", and "updated_at"..slot-machine/history/active.json must be the idle sentinel:{"schema_version": 1, "status": "idle"}
{RUN_DIR}/events.jsonl, rewrite {RUN_DIR}/state.json and refresh .slot-machine/history/active.json so it keeps pointing at the current snapshot.result.json must carry "events_path" and "state_path" alongside the run artifact fields..slot-machine/history/latest.json and append .slot-machine/history/index.jsonl on every terminal path. Use status: "finished" for judged completion and manual handoff completion. Use status: "failed" for blocked or failed terminal exits while keeping the canonical per-run {RUN_DIR} paths.events.jsonl or state.json.Maintenance rule: Any change that adds a new orchestration phase, terminal path, retry path, or required artifact must update the trace documentation and trace-aware tests in the same change. SKILL.md and skills/slot-machine/SKILL.md must stay byte-for-byte synchronized after trace changes.
Profiles define the task-specific content for a slot-machine run: approach hints, agent prompts, isolation strategy, and pre-check commands. SKILL.md is a domain-agnostic orchestration engine — all task-specific content comes from the active profile.
--profile X or profile: XAGENTS.md, CLAUDE.md, or both set slot-machine-profile: X. Treat them as equal first-class sources and read whichever exists, or both if both exist. Merge non-conflicting slot-machine config from both files; if both define the same key, prefer the active host file: AGENTS.md in Codex, CLAUDE.md in Claude../profiles/ folders in the project~/.slot-machine/profiles/ (community or personal profiles)profiles/ in the slot-machine skill directory (the built-in profiles)codingResolve profile names against exact roots. Do not rely on open-ended recursive globs as the primary lookup path.
PROJECT_PROFILE_ROOT="$PWD/profiles"USER_PROFILE_ROOT="$HOME/.slot-machine/profiles"SKILL_PROFILE_ROOT="profiles/" under the physical slot-machine skill directoryREAL_SKILL_DIR=$(cd "{skill_dir}" 2>/dev/null && pwd -P)
On Claude-hosted runs, when you need the installed skill directory and only have the installed path, canonicalize that installed path the same way before using Glob or Read. Built-in profile discovery must work whether the installed skill directory is a real directory or a symlink. Preserve REAL_SKILL_DIR for all built-in assets, not just profiles: the supported Codex runtime helper lives at "$REAL_SKILL_DIR/scripts/codex-slot-runner.py".X, check exact directories in precedence order: PROJECT_PROFILE_ROOT/X, then USER_PROFILE_ROOT/X, then REAL_SKILL_DIR/profiles/X.extends: X appears, rerun the same named-profile lookup for X. Do not assume the base profile lives beside the extending profile.Read on exact numbered files once you know the resolved directory. If you need to enumerate built-in files inside the canonicalized skill directory, scope the lookup to the physical path, for example:
find -L "$REAL_SKILL_DIR/profiles/$PROFILE_NAME" -maxdepth 1 -type f -name '*.md' | sort
A scoped Glob inside REAL_SKILL_DIR/profiles/$PROFILE_NAME is also acceptable after canonicalization.**/profiles/... fallbacks.extends: X, resolve base profile X using the deterministic procedure above, then read base profile X firstextends: X, resolve X immediately using the same precedence order. Built-in coding and writing profiles at the skill layer are valid base fallbacks.0-profile.md, 1-implementer.md, 2-reviewer.md, 3-judge.md, and 4-synthesizer.md. For each file, use the extending profile's copy if present; otherwise use the resolved base profile's file.BLOCKED and explain the missing profile or file. Do not continue setup-time discovery loops, repeated directory scans, or repeated config reads.If the selected profile or any inherited base cannot be resolved:
{RUN_DIR}/result.json (see Result artifact).slot-machine/runs/latest to that run directoryBLOCKED with the missing profile/base and stop before slot parsing or dispatchSKILL.md injects these variables into ALL profile prompts. If a variable isn't relevant for the active profile (e.g., {{PRE_CHECK_RESULTS}} for writing), pass an empty string.
| Variable | Description |
|----------|-------------|
| {{SPEC}} | Full text of the spec/brief |
| {{APPROACH_HINT}} | The hint assigned to this slot |
| {{PROJECT_CONTEXT}} | README, architecture notes, AGENTS.md / CLAUDE.md conventions, reference materials |
| {{SLOT_NUMBER}} | This slot's number |
| {{PRE_CHECK_RESULTS}} | Output from pre-check commands (empty string if pre_checks is null) |
| {{IMPLEMENTER_REPORT}} | The implementer's status report |
| {{WORKTREE_PATH}} | Path to this slot's worktree or output file |
| {{ALL_SCORECARDS}} | All reviewer scorecards concatenated |
| {{WORKTREE_PATHS}} | List of all slot worktree/output paths |
| {{SLOT_COUNT}} | Number of successful slots |
| {{SYNTHESIS_PLAN}} | The judge's synthesis plan |
| {{BASE_SLOT_PATH}} | The worktree/output path of the base slot |
| {{APPROACH_HINT_USED}} | The approach hint given to the implementer (used in reviewer context) |
| {{TEST_COMMAND}} | How to run the test suite (empty string if not applicable) |
When filling {{TEST_COMMAND}} for Python repos, prefer python3 -m pytest ... unless the project already standardizes on another command. Do not assume a bare python executable exists.
Slots can be configured per-slot instead of using the same profile implementer for all. Two axes compose with +:
/superpowers:test-driven-development, $superpowers:test-driven-development, /ce:work) — methodology guidance. Accept both Claude-style / and Codex-style $ prefixes at parse time. Injected into the prompt of whatever harness runs the slot.claude, codex, gemini) — which AI system executes. No skill prefix. Determines the dispatch mechanism.+ composes them. default means profile implementer + approach hint.slot-machine-profile and slot-machine-slots from whichever exists, or both if both exist. Merge non-conflicting slot-machine config from both files; if both define the same key, prefer the active host file: AGENTS.md in Codex, CLAUDE.md in Claude:
slot-machine-profile: coding
slot-machine-slots:
- /superpowers:test-driven-development
- $superpowers:test-driven-development + codex
- claude
- codex
- default
(normalized_skill, harness):
default → (null, null) — profile implementer + hint/superpowers:test-driven-development → ("superpowers:test-driven-development", null) — skill-only slot$superpowers:test-driven-development → ("superpowers:test-driven-development", null) — same normalized skill-only slotclaude → (null, "claude") — Claude harness with generic promptcodex → (null, "codex") — Codex harness with generic prompt/superpowers:test-driven-development + codex → ("superpowers:test-driven-development", "codex") — Codex with skill$superpowers:test-driven-development + codex → ("superpowers:test-driven-development", "codex") — same normalized Codex-with-skill slotSlot definitions accept both Claude-style /skill-name and Codex-style $skill-name syntax. Normalize each parsed skill internally to a host-neutral skill reference with no leading sigil, such as superpowers:test-driven-development.
When dispatching that host-neutral skill reference to a harness, translate it into the harness-native syntax:
superpowers:test-driven-development → /superpowers:test-driven-developmentsuperpowers:test-driven-development → $superpowers:test-driven-developmentThe skill is invoked natively by the target harness — Claude and Codex load their own copy of the skill, not a text summary. The user is responsible for ensuring the skill is installed on the target harness.
Approach hints only apply to default slots. Skill-based slots do NOT get approach hints — the skill IS the diversity mechanism. When mixing skill and default slots, assign hints only to the default slots.
Apply this warning after normalizing the parsed skill name to its host-neutral form. If the normalized skill name matches a known multi-agent orchestrator (superpowers:subagent-driven-development, superpowers:executing-plans), warn the user: "⚠ {skill} is a multi-agent orchestrator — running it inside a slot creates nested pipelines (slower, redundant review). Consider using a single-session skill like /superpowers:test-driven-development instead." Do not block — the user may have a reason.
When the user says "all my skills", "all implementation skills", or uses --discover, the orchestrator scans for available slot-compatible skills and proposes a slot configuration.
| User says | Discovery fires? |
|-----------|-----------------|
| /slot-machine this | No — default profile + hints |
| /slot-machine this with 3 slots | No — default hints |
| /slot-machine this with /superpowers:test-driven-development and codex | No — explicit list |
| /slot-machine this with all my skills | Yes |
| /slot-machine this using all implementation skills | Yes |
| /slot-machine --discover | Yes |
Discovery ONLY fires on explicit "all my/implementation skills" language or --discover. Never as a suggestion. Never as a default.
superpowers:subagent-driven-development, superpowers:executing-planswhich codex, which gemini via BashI scanned your installed skills and detected these slot-compatible workflows:
1. /superpowers:test-driven-development — test-first development
2. /ce:work — pattern-matching execution
3. codex — OpenAI Codex (external harness)
Use all 3 as slots? Or adjust?
User confirms or edits. Save selection to ~/.slot-machine/config.md:
## Discovered Implementation Skills
- /superpowers:test-driven-development
- /ce:work
- codex
"All my skills" loads the saved list without re-scanning. User can re-trigger a fresh scan with --discover.
Create run directory. Create the run storage directory and add .slot-machine/ to .gitignore if not already present:
RUN_DIR_REL=".slot-machine/runs/$(date +%Y-%m-%d)-{feature_slug}"
RUN_DIR="$PWD/$RUN_DIR_REL"
TRACE_EVENTS_FILE="$RUN_DIR/events.jsonl"
TRACE_STATE_FILE="$RUN_DIR/state.json"
HISTORY_DIR=".slot-machine/history"
ACTIVE_TRACE_FILE="$HISTORY_DIR/active.json"
LATEST_TRACE_FILE="$HISTORY_DIR/latest.json"
HISTORY_INDEX_FILE="$HISTORY_DIR/index.jsonl"
mkdir -p "$RUN_DIR" "$HISTORY_DIR"
grep -q '.slot-machine/' .gitignore 2>/dev/null || echo '.slot-machine/' >> .gitignore
Immediately bootstrap the run snapshot and active-run pointer metadata:
cat > "$TRACE_STATE_FILE" << JSON
{
"schema_version": 1,
"run_id": "{run_id}",
"status": "running",
"current_phase": "setup",
"run_dir": "$RUN_DIR",
"events_path": "$TRACE_EVENTS_FILE",
"state_path": "$TRACE_STATE_FILE",
"result_path": "$RUN_DIR/result.json",
"last_event_seq": 0
}
JSON
cat > "$ACTIVE_TRACE_FILE" << JSON
{
"schema_version": 1,
"status": "running",
"run_id": "{run_id}",
"run_dir": "$RUN_DIR",
"events_path": "$TRACE_EVENTS_FILE",
"state_path": "$RUN_DIR/state.json",
"started_at": "{started_at_iso8601}",
"updated_at": "{started_at_iso8601}"
}
JSON
Persist RUN_DIR as the absolute path for this run. All review, verdict, and result artifacts must be written via that absolute path, not a cwd-relative redirect. Before every artifact write later in the run, re-run mkdir -p "$RUN_DIR" so artifact persistence never depends on shell state.
All artifacts from this run will be saved to {RUN_DIR}/.
This is the first setup checkpoint. Create RUN_DIR before profile resolution, further exploratory reads, repeated profile scans, or slot-introspection loops. If you cannot create it, emit BLOCKED with the missing prerequisite instead of continuing setup-time introspection.
Load profile. Follow the Profile Loading section to determine the active profile name, resolve its directory once, and read 0-profile.md for config. If the active profile has extends: X, resolve the base profile immediately and build the effective file set (0-profile.md through 4-synthesizer.md) before moving on. Do not keep scanning profile directories or re-reading config once those effective files are known. If resolution fails at any point, write the blocked-mode {RUN_DIR}/result.json, refresh .slot-machine/runs/latest, report BLOCKED with the exact missing profile or file, and stop before slot parsing or dispatch. Report to user: "Using profile: {profile_name}" and, if applicable, "inherits: {base_profile_name}".
Parse slot definitions. Check for slot definitions in precedence order: (1) inline in the user's command, (2) slot-machine-slots in AGENTS.md, CLAUDE.md, or both, treating them as equal first-class sources and reading whichever exists or both if both exist, merging non-conflicting slot-machine config from both files and preferring the active host file if both define the same key, (3) fall back to profile defaults. Record the slot list — each slot is (normalized_skill, harness) or default. Check harness availability (see below).
Check harness availability and detect model. For each slot that specifies a harness:
codex: Run which codex via Bash. If not found, warn: 'Codex CLI not found — slot {i} will fall back to the native host path. Install: npm install -g @openai/codex'. Change the slot's harness to null (falls back to the native host with the same skill guidance if any). If found, read the Codex model version from ~/.codex/config.toml (look for model = "..." line). Record this as the slot's model identifier (e.g., gpt-5.4).claude: Run which claude via Bash and record the reported or configured Claude model identifier if available; otherwise record unknown. For native Claude host rows in the execution matrix, that is enough because the slot stays on the host-native path. For explicit Claude slots that use the external Claude harness path, do not silently fall back — dispatch the slot through claude -p and normalize the actual command outcome per slot.claude-opus-4-6) or the configured implementer_model override.Validate the spec. The spec (plan, requirements doc, or inline description) must be concrete enough for independent attempts. If ambiguous — stop and ask for clarification before spending compute.
Red flags that mean "not ready":
Gather project context. Collect what implementers need:
Keep context focused — don't dump everything. Implementers should get just enough to orient themselves.
Prepare isolation. Check the profile's isolation field:
worktree: The project MUST be a git repository with at least one commit before Phase 2 can create worktrees. If the directory is not a git repo or has no commits:
git init && git add -A && git commit -m "initial commit"
Without this, isolation: "worktree" on Agent calls will fail and agents will not get isolated workspaces.
Record the original checkout before dispatching any slots so Phase 4 can restore it if needed:
ORIGINAL_HEAD=$(git rev-parse HEAD)
ORIGINAL_BRANCH=$(git symbolic-ref --short -q HEAD || true)
file: No git repo required. Each slot will write its output to {RUN_DIR}/slot-{i}.md.Run pre-checks (if configured). Read the active profile's 0-profile.md frontmatter for the pre_checks field.
null → skip this step.{test_command} with the detected test command. These establish the baseline. If baseline checks fail, stop and fix first.Assign approach hints. If approach_hints is enabled, read hints from the active profile's 0-profile.md. Randomly assign one hint per slot (without replacement). Each hint steers toward a different approach — the profile defines what diversity means for this task type.
Report setup to user using this format (top-level markdown, not inside a code block):
Slot Machine — {profile_name} profile
Feature: {feature_name}
Slots: {N} | /ce:work (claude-opus-4-6), /ce:work + codex (gpt-5.4), codex (gpt-5.4), 2x default hints (claude-opus-4-6)
When all slots use profile defaults (no slot definitions):
Slots: {N} | Hints: {hint_1}, {hint_2}, ...
Formatting rules for ALL orchestrator output (apply throughout Phases 1-4):
DONE, PASS, FAIL, HIGH, MEDIUM, LOW---) — no blockquotes (they render as dim italics in terminals)#) for Final Output header onlyDispatch all N slots in a SINGLE parallel wave from a SINGLE message. Group 1 native-host slots run through the host's native execution path in the assigned isolated slot workspace. Group 2 external-harness slots launch external CLI processes in parallel using the active profile isolation. This is critical — start the full wave together for true parallel execution.
Orchestrator trace emission rules for this phase and the downstream lifecycle:
phase_entered when moving into setup, implementation, review, judgment, synthesis, manual_handoff, cleanup, or finalization.slot_dispatched, slot_finished, and slot_retry_scheduled for slot lifecycle changes. slot_finished must record one of DONE, DONE_WITH_CONCERNS, BLOCKED, or NEEDS_CONTEXT.precheck_started and precheck_finished around required precheck commands.review_dispatched and review_finished for reviewer lifecycle changes.judge_dispatched and judge_finished for judge lifecycle changes.synthesis_dispatched and synthesis_finished for synthesis lifecycle changes.cleanup_started and cleanup_finished for cleanup lifecycle changes.artifact_written immediately after each required artifact write completes.run_finished on successful terminal completion, or run_failed on terminal failure.events.jsonl or state.json.Use this execution matrix to choose the path per slot:
| Active host | Slot harness | Execution path |
|-------------|--------------|----------------|
| Claude | Claude | Native Claude orchestration/subagent path |
| Claude | Codex | Profile-isolated slot workspace + codex exec |
| Codex | Codex | Native Codex slot workspace + codex exec |
| Codex | Claude | Profile-isolated slot workspace + claude -p |
Group 1 — Native-host slots: slots whose harness_ref is empty or matches the active host. Dispatch them through the host's native execution path in the assigned isolated slot workspace.
Group 2 — External-harness slots: slots whose harness_ref names the other CLI. If profile isolation is worktree, create one worktree per slot and launch one external CLI process per worktree. If profile isolation is file, create one per-slot run directory/output target under {RUN_DIR} and launch one external CLI process there, telling it where to write the output file.
Host-relative routing is deterministic: claude on Claude and codex on Codex are Group 1 native-host slots; claude on Codex and codex on Claude are Group 2 external-harness slots. Apply the same rule to skill + harness slots.
Never launch Codex slots as background Bash jobs from the orchestrator. The Codex wrapper agent must wait for codex exec to finish, harvest the final report or synthesize one from post-run inspection, and only then return control to the review/judge pipeline.
Path A — Native-host default slots (no skill, no harness):
Unchanged from Phase 1. Read 1-implementer.md from the active profile, fill universal {{VARIABLES}}, include the assigned approach hint, and dispatch through the native-host implementer path. On Claude, the native-host implementer path is an Agent tool call with isolation: "worktree" (or omitted for file). On Codex, the native-host implementer path uses the assigned isolated slot workspace and the same prompt contract.
For each slot i (1 to N), use this native-host dispatch contract:
| Parameter | Value |
|-----------|-------|
| description | "Slot {i}: Implement {feature_name}" |
| isolation | "worktree" if profile isolation is worktree; omit if file |
| model | Omit unless user configured implementer_model — inherits from session by default |
| prompt | Read 1-implementer.md from the active profile's folder and fill in all universal {{VARIABLES}} |
The universal variables to fill in the implementer prompt:
| Variable | Source |
|----------|--------|
| {{SPEC}} | Full text of the spec — paste it, don't make the subagent read a file |
| {{APPROACH_HINT}} | The hint assigned to this slot (or omit section if hints disabled) |
| {{PROJECT_CONTEXT}} | README, architecture notes, AGENTS.md / CLAUDE.md conventions, reference materials gathered in Phase 1. Include any user-specified skill guidance. |
| {{TEST_COMMAND}} | How to run the test suite (empty string if not applicable) |
For Python projects, prefer python3 -m pytest ... unless the repo already provides an explicit test command. Do not invent python -m pytest on systems that only guarantee python3.
For file isolation: Each slot writes its output to {RUN_DIR}/slot-{i}.md. Include this path in the prompt so the implementer knows where to write. No worktrees, no git branches.
For worktree isolation — worktree fallback: If isolation: "worktree" fails (e.g., git repo not detected, permission issues), fall back to manual worktree creation:
mkdir -p .slot-machine/worktrees
for i in $(seq 1 $N); do
git worktree add ".slot-machine/worktrees/slot-$i" -b "slot-machine/{feature_name}/slot-$i"
done
Then dispatch implementers WITHOUT isolation: "worktree", pointing each to its worktree directory. Track worktree paths manually for cleanup in Phase 4. For worktree isolation, save each slot's diff to {RUN_DIR}/slot-{i}.diff before cleanup.
Path B — Native-host skill-only slots (e.g., /superpowers:test-driven-development, no harness):
Do NOT read the profile's 1-implementer.md. Dispatch through the native-host implementer path with this prompt:
You are implementing a feature in an isolated workspace.
IMPORTANT: You MUST invoke the normalized host-neutral skill reference `{skill_name}` using the active host's native skill mechanism before beginning implementation. On Claude, translate it to Claude skill syntax and invoke it through Claude's native skill flow. On Codex, translate it to Codex skill syntax and invoke it through the native Codex path. Follow its workflow exactly.
Specification:
{spec}
Project Context:
{project_context}
After implementation is complete, end with this report format:
**Status:** [DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT]
**What I implemented:** [bullet list]
**Files changed:** [list]
**Test results:** [if applicable]
**Concerns (if any):** [issues]
On Claude, this is an Agent tool call with isolation: "worktree" for worktree profiles, or with isolation omitted for file profiles and an explicit {RUN_DIR}/slot-{i}.md destination in the prompt; translate superpowers:test-driven-development to /superpowers:test-driven-development and invoke it through Claude's native skill flow. On Codex, follow the same native-host slot workspace contract and active profile isolation; translate superpowers:test-driven-development to $superpowers:test-driven-development and invoke it through the native Codex path. Do NOT include an approach hint — the skill is the diversity mechanism.
Path C — Native Claude harness slots (harness = claude, active host = Claude):
These are Group 1 native-host slots. Use the native Claude implementation path in the assigned isolated slot workspace. On Claude, that means an Agent tool call with the same generic spec prompt contract used for explicit harness slots. For skill + claude slots, translate the normalized skill to Claude syntax such as /superpowers:test-driven-development.
Path D — Native Codex harness slots (harness = codex, active host = Codex):
These are Group 1 native-host slots. Use the shared Codex slot runtime helper in the assigned isolated slot workspace: "$REAL_SKILL_DIR/scripts/codex-slot-runner.py". The helper is the supported Codex execution path for slot-machine. It shells out to codex exec in the current slot workspace, captures codex-events.jsonl and codex-stderr.txt, writes codex-slot-report.md plus codex-slot-result.json, and records the Codex thread_id for artifacts or manual resume. For skill + codex slots, translate the normalized skill to Codex syntax such as $superpowers:test-driven-development, write the generic Codex prompt to codex-prompt.txt, and invoke the same helper-backed contract described in Path F.
Never launch Codex slots as background Bash jobs. Wait for codex exec to finish and return a normal implementer report before reviewers or the judge can run.
Path E — External Claude harness slots (harness = claude, active host = Codex):
Follow the active profile isolation. If isolation is worktree, create one worktree per slot, cd into that worktree, and launch claude -p directly. If isolation is file, create a per-slot directory under {RUN_DIR}, launch claude -p there, and tell it exactly which {RUN_DIR}/slot-{i}.md file to write. Do not wrap this in a native subagent.
Before using Claude or Codex external harness execution paths, read references/harness-execution.md.
Do not silently fall back to the native host path or to Codex for explicit Claude slots. Launch the configured claude -p command directly and let the slot outcome reflect the real external Claude execution result.
The detailed external Claude template lives in references/harness-execution.md. Use that exact claude -p --output-format stream-json contract, including the isolation-specific write target wording and the standard implementer report shape.
Failure normalization for external Claude runs:
BLOCKED.Not logged in · Please run /login → normalize that slot to BLOCKED with setup guidance.session-env errors) → normalize that slot to BLOCKED.stream-json output → normalize to BLOCKED with the failure details attached.Native skill prefix translation for external Claude:
claude slots.skill + claude slots, translate the host-neutral skill reference to Claude syntax, for example superpowers:test-driven-development → /superpowers:test-driven-development.Path F — External Codex harness slots (harness = codex, active host = Claude):
Follow the active profile isolation. If isolation is worktree, create one worktree per slot, cd into that worktree, and invoke the shared Codex slot runtime helper from that workspace. If isolation is file, create a per-slot directory under {RUN_DIR}, invoke the same helper there, and tell it exactly which {RUN_DIR}/slot-{i}.md file to write via --expected-output-path. Do not re-implement Codex JSON parsing inline; the helper is the supported runtime path.
Before using Claude or Codex external harness execution paths, read references/harness-execution.md.
The helper saves the raw --json JSONL stream to codex-events.jsonl, the stderr stream to codex-stderr.txt, and a normalized result/report pair to codex-slot-result.json and codex-slot-report.md. Do not assume a single event mix — current Codex runs may expose item.completed, turn.completed, or both. Never launch Codex slots as background Bash jobs; wait for codex exec to finish before reviewers or the judge can run.
The detailed helper-backed Codex template lives in references/harness-execution.md. Use that exact codex-prompt.txt plus codex-slot-runner.py contract, including the --prompt-file codex-prompt.txt, --sandbox workspace-write, and file-isolation --expected-output-path handling.
Post-run normalization contract for Codex slots:
codex-slot-result.json is the source of truth. Do not re-parse codex-events.jsonl yourself unless the helper failed before writing the normalized result.status: DONE or DONE_WITH_CONCERNS, use codex-slot-report.md as the authoritative implementer report.status: BLOCKED, normalize the slot to BLOCKED with the helper's failure_reason.thread_id, raw log paths, observed commands, and changed-file list. Persist that metadata into per-slot artifacts and result.json so the user can inspect or resume Codex work later.git status --short --untracked-files=all to build the changed-file list.Status: DONE.Status: DONE_WITH_CONCERNS.turn.completed without a structured agent_message report.codex exec exits zero but there is no structured agent message and no meaningful workspace output from post-run inspection, normalize the slot to BLOCKED.Native skill prefix translation for external Codex:
codex slots.skill + codex slots, translate the host-neutral skill reference to Codex syntax, for example superpowers:test-driven-development → $superpowers:test-driven-development.| Slot definition | Dispatch | Prompt | Isolation | Hint? |
|----------------|----------|--------|-----------|-------|
| default | Native host subagent path | Profile 1-implementer.md + hint | Profile setting | Yes |
| /superpowers:test-driven-development | Native host execution path | "Invoke normalized skill via host-native skill mechanism" + spec | Profile setting (worktree or file) | No |
| claude | Native Claude path on Claude; external Claude harness on Codex | Native-host generic spec prompt or claude -p with spec | Profile setting (worktree or file) | No |
| /superpowers:test-driven-development + claude | Native Claude path on Claude; external Claude harness on Codex | Native-host skill prompt or claude -p with /superpowers:test-driven-development | Profile setting (worktree or file) | No |
| codex | Native Codex path on Codex; external Codex harness on Claude | codex exec with spec | Profile setting (worktree or file) | No |
| /superpowers:test-driven-development + codex | Native Codex path on Codex; external Codex harness on Claude | codex exec with $superpowers:test-driven-development | Profile setting (worktree or file) | No |
After all slots return, process each result:
| Result | Action |
|--------|--------|
| Slot execution succeeded, implementer status DONE | Record worktree path or output file. Save implementer report. Run pre-checks and dispatch reviewer (see Phase 3 streaming). |
| Slot execution succeeded, status DONE_WITH_CONCERNS | Record path. Save report including concerns. Run pre-checks and dispatch reviewer. |
| Slot execution succeeded, status BLOCKED or NEEDS_CONTEXT | If max_retries > 0: retry with additional context using the same execution path type. Else: mark FAILED. |
| Slot execution errored/crashed | If max_retries > 0: retry fresh using the same execution path type. Else: mark FAILED. |
Retry handling: When retrying, keep the same execution path type and do it one slot at a time with additional context addressing the block. Group 1 native-host slots retry via a fresh native subagent. Group 2 external-harness slots retry via a fresh external CLI run in a fresh worktree or fresh {RUN_DIR} slot directory, matching the active profile isolation. Don't try to continue the failed run in place.
Report progress using a top-level markdown table. For writing profiles, show word count. For coding profiles, show test count. Include a one-line summary of each slot's approach (from the hint influence):
Phase 2: Implementation — done
| Slot | Status | Harness | Model | Words/Tests | Approach |
|------|--------|---------|-------|-------------|----------|
| 1 | DONE | Claude | claude-opus-4-6 | 13 tests | /superpowers:test-driven-development |
| 2 | DONE | Codex | gpt-5.4 | 15 tests | /superpowers:test-driven-development + codex |
| 3 | DONE | Claude | claude-opus-4-6 | 21 tests | claude |
| 4 | DONE_WITH_CONCERNS | Codex | gpt-5.4 | 8 tests | codex |
Model is the reported model identifier when available; otherwise use the configured model or unknown.
Do NOT show full implementer reports, self-review findings, or file lists. The table summarizes the essential information. Agent internals are pipeline noise.
Minimum viable: At least 2 successful slots needed for meaningful comparison. If fewer than 2 succeed, report to user and recommend: re-run with different slot count, fix spec issues, or manual implementation.
The review/judgment pipeline is the skill's core value. Baseline testing showed that Claude naturally does parallel dispatch and even synthesis — but it centralizes all evaluation in the orchestrator. This phase delegates evaluation to specialized agents for higher-quality, unbiased assessment.
Do NOT wait for all implementations to finish before starting reviews. As each slot completes, immediately run its pre-checks and dispatch its reviewer. This overlaps review work with implementation — a slot that finishes early gets reviewed while slower slots are still implementing.
For each slot, as it returns successfully:
Run pre-checks for that slot. If pre_checks is null, skip and pass an empty string for {{PRE_CHECK_RESULTS}}. If set, run them from the slot's execution location: for worktree isolation, cd into the slot worktree first; for file isolation, cd into the parent slot directory such as dirname {output_file} first. Every pre-check Bash command must start with the appropriate cd ... && prefix for that slot's isolation mode — do not assume the shell is already in the right directory.
Dispatch its reviewer immediately — do not wait for other slots. Use the native-host review path. On Claude, this is an Agent tool call. On Codex, use codex exec in the assigned review context for that slot's worktree or output-file directory, with the same prompt contract:
| Parameter | Value |
|-----------|-------|
| description | "Review Slot {i} implementation" |
| model | Omit unless user configured reviewer_model — inherits from session by default |
| prompt | Read 2-reviewer.md from the active profile's folder and fill in all universal {{VARIABLES}} |
The universal variables to fill in the reviewer prompt:
| Variable | Source |
|----------|--------|
| {{SPEC}} | Full text of the original spec |
| {{IMPLEMENTER_REPORT}} | The implementer's status report (what they claim they built) |
| {{WORKTREE_PATH}} | Path to this slot's worktree or output file (from Phase 2 results) |
| {{SLOT_NUMBER}} | This slot's number |
| {{PRE_CHECK_RESULTS}} | Pre-check output from the step above (empty string if pre_checks is null) |
| {{APPROACH_HINT_USED}} | The approach hint that was given to this slot's implementer |
The reviewer reads actual content in the worktree/output file — it does NOT have isolation: "worktree" (it inspects existing work, not its own workspace).
When multiple slots complete close together, batch their reviewers into a single message for parallel dispatch — this is faster than dispatching one at a time. The key rule is: don't wait for stragglers. If 2 of 3 slots are done, dispatch their 2 reviewers now rather than waiting for the 3rd.
Collect reviews as they return. Save each reviewer's full scorecard to {RUN_DIR}/review-{i}.md immediately when that reviewer finishes. Use the absolute path from RUN_DIR when persisting the file. If you use Bash, run mkdir -p "$RUN_DIR" immediately before the write; if you use a file-write tool, pass the same absolute path. Do NOT postpone these writes until after the summary table, and do NOT replace the saved scorecard with only your orchestrator summary. Never rely on the current shell directory for artifact redirects.
Before dispatching the judge, verify the review artifacts exist. For every successful slot, confirm {RUN_DIR}/review-{i}.md exists and is non-empty. If any scorecard file is missing, write it before continuing. The judge phase is not allowed to start with missing review artifacts.
Report review results after all reviews are collected, using a top-level markdown table and standout bullets. Do NOT show full reviewer scorecards, evidence chains, or pass-by-pass analysis — those are pipeline internals the judge uses, not the user.
Phase 3: Review — done
| Slot | Compliance | Critical | Important | Minor | Verdict |
|------|------------|----------|-----------|-------|---------|
| 1 | PASS | 0 | 0 | 3 | Contender |
| 2 | PASS | 0 | 1 | 2 | Contender |
| 3 | FAIL | 1 | 0 | 1 | Eliminated |
Standout elements:
Extract standout elements from each reviewer's "Strengths" section. Pick the single most notable strength per slot — the one the judge is most likely to care about.
If manual_handoff is true:
This is the terminal path for the run. Skip the judge/verdict/merge finalization path below and use the manual handoff report instead.
worktree isolation, preserve all successful slot worktreesfile isolation, preserve slot output files and reviewsworktree isolation, restore the user's original checkout before the final report. Manual mode must not leave the main worktree on a slot branch, detached at a slot commit, or merged to a winner.{RUN_DIR}/handoff.md{RUN_DIR}/slot-manifest.json{RUN_DIR}/result.json.slot-machine/runs/latest before finalizing manual-mode result.json# Manual Handoffmanual_handoff is true.Manual handoff output for coding/worktree runs must include at minimum:
# Manual Handoffhandoff.md, slot-manifest.json, and manual-mode result.jsonBefore emitting the final manual handoff report for worktree isolation, restore the main checkout recorded in Phase 1:
if [ -n "${ORIGINAL_BRANCH:-}" ]; then
git switch "$ORIGINAL_BRANCH"
else
git checkout --detach "$ORIGINAL_HEAD"
fi
If the restore fails, report BLOCKED instead of silently leaving the user on the wrong checkout. Manual handoff is only complete when the main worktree is back on the original branch/HEAD and the reviewed slot worktrees remain available for inspection.
Persist the per-slot diff, branch, path, SHA, review, file-change, and test metadata in {RUN_DIR}/result.json under slot_details. For manual handoff, slot_details is the source of truth for per-slot file/test data and artifact metadata in both worktree and file isolation.
In manual mode, write the top-level handoff_path as the canonical absolute .slot-machine/runs/latest/... path for stable discovery, while keeping top-level run_dir as the canonical absolute {RUN_DIR} path.
{RUN_DIR}/slot-manifest.json mirrors the same per-slot metadata as the human-readable handoff summary so manual selection can happen without reading result.json.
If manual_handoff is false:
As soon as all reviews are collected, dispatch the judge — do not pause for orchestrator reporting. The review report table above can be shown after the judge is already running, or combined with the verdict output. The goal is to eliminate idle time between the last review returning and the judge starting.
Make a SINGLE native-host judge dispatch. On Claude, this is an Agent tool call. On Codex, use codex exec in the judge's assigned execution context with the same prompt contract. The judge MUST use the most capable model — this is where architectural judgment matters most:
| Parameter | Value |
|-----------|-------|
| description | "Judge Slot Machine results for {feature_name}" |
| model | Omit unless user configured judge_model — inherits from session by default. The judge benefits from the most capable model available. |
| prompt | Read 3-judge.md from the active profile's folder and fill in all universal {{VARIABLES}} |
The universal variables to fill in the judge prompt:
| Variable | Source |
|----------|--------|
| {{SPEC}} | Full text of the original spec |
| {{ALL_SCORECARDS}} | All reviewer scorecards concatenated |
| {{WORKTREE_PATHS}} | List of all slot worktree/output paths (for targeted inspection) |
| {{SLOT_COUNT}} | Number of successful slots |
The judge returns one of three verdicts:
Save the judge's full verdict and reasoning to {RUN_DIR}/verdict.md before composing the user-facing verdict block. Use the absolute RUN_DIR path, and if you use Bash run mkdir -p "$RUN_DIR" immediately before the write.
Before continuing to the final report, verify {RUN_DIR}/verdict.md exists and is non-empty. If the file is missing, write it before proceeding. The inline verdict shown to the user is not a substitute for the persisted run artifact.
Report the verdict bounded by horizontal rules. This is the most important output — include a one-sentence why summary explaining the decision in plain language. Every slot reference must include full identity: (Harness Model w/ skill).
Phase 4: Verdict
Verdict: SYNTHESIZE | Confidence: HIGH
Slot 3 has the cleanest code. Slot 1 has the best tests. Combining both produces something better than either.
gpt-5.4) — cleanest implementation, no NaN bug, proper drain waiter patternopus-4.6 w/ /ce:work) — 19-test suite: nested scheduling, timing verification, counter trackingFor PICK verdicts:
Verdict: PICK Slot 2 (Claude Code opus-4.6 w/ /ce:work) | Confidence: HIGH
Zero critical issues, strongest test coverage (45 tests), correct lock granularity. No synthesis needed — clear winner.
For worktree isolation:
The judge named a winning slot. Merge its branch:
# From the main working directory
git merge {winning_branch} --no-ff -m "feat: {feature_name} (slot-machine winner: slot {N})"
Run the full test suite to verify the merge is clean.
If tests fail: investigate. The worktree passed tests in isolation — merge conflicts or environment differences are the likely cause. Fix before proceeding.
For file isolation:
The judge produced a concrete synthesis plan (which base slot, what to port from where).
Dispatch the synthesizer as a SINGLE native-host synthesis dispatch. On Claude, this is an Agent tool call. On Codex, use codex exec in the synthesizer's assigned worktree or output-file directory with the same prompt contract:
| Parameter | Value |
|-----------|-------|
| description | "Synthesize best elements for {feature_name}" |
| isolation | "worktree" if profile isolation is worktree; omit if file |
| model | Omit unless user configured synthesizer_model — inherits from session by default |
| prompt | Read 4-synthesizer.md from the active profile's folder and fill in all universal {{VARIABLES}} |
The universal variables to fill in the synthesizer prompt:
| Variable | Source |
|----------|--------|
| {{SPEC}} | Full text of the spec |
| {{SYNTHESIS_PLAN}} | The judge's synthesis plan (which base, what to port) |
| {{WORKTREE_PATHS}} | All slot worktree/output paths the synthesizer needs to read from |
| {{BASE_SLOT_PATH}} | The worktree/output path of the base slot specifically |
Run full test suite to verify (for worktree isolation). For file isolation, the synthesizer writes its output to {RUN_DIR}/output.md (or appropriate extension). Tell the synthesizer this destination path in its prompt.
Post-synthesis review. Dispatch ONE reviewer to check the synthesized result for integration issues:
| Parameter | Value |
|-----------|-------|
| description | "Review synthesis for {feature_name}" |
| model | Omit unless user configured reviewer_model — inherits from session by default |
| prompt | Read 2-reviewer.md from the active profile's folder and fill in {{VARIABLES}} using the synthesis worktree/output |
The reviewer checks:
If the reviewer finds critical issues, fix them before finalizing. Important/minor issues can be noted in the final report.
Finalize the synthesis:
worktree isolation: merge the synthesis branch:
git merge {synthesis_branch} --no-ff -m "feat: {feature_name} (slot-machine synthesis: slot {base} base + elements from slots {donors})"
file isolation: copy the synthesized output file to the target location.If cleanup is true (default):
For worktree isolation: remove all worktrees:
# For each worktree path tracked during the run:
git worktree remove {worktree_path} --force
# The --force handles uncommitted changes in non-winning slots
# Branches are cleaned up automatically if they were only in the worktree
# For any lingering branches:
git branch -D {branch_name}
Slot diffs are preserved in {RUN_DIR}/.
For file isolation: Run artifacts are kept permanently — no cleanup needed. All slot outputs, reviews, and the verdict remain in {RUN_DIR}/.
If manual_handoff is true for worktree isolation, ignore cleanup: true and keep successful worktrees so the user can inspect and merge manually. In manual mode, do NOT write verdict.md; write handoff.md instead, and do not use the judged-run finalization path below. For each successful coding slot, persist {RUN_DIR}/slot-{i}.diff, branch name, head SHA, worktree path, and review artifact path in the manual handoff result metadata.
If cleanup is false, report worktree/output locations so the user can inspect them.
Judged runs only. Manual handoff already terminates with # Manual Handoff, handoff.md, slot-manifest.json, and manual-mode result.json; do not use this section when manual_handoff is true.
The final report has three parts: the H1 header, the output content, and the footer line.
Part 1: H1 header (use markdown # — this is the most visually distinct element):
Part 2: Output content — depends on profile isolation and output length:
For file isolation (writing): Count lines in the final output file. The winner (or synthesis) is saved to {RUN_DIR}/output.md.
Full output at \.slot-machine/runs/{date}-{feature}/output.md``For worktree isolation (coding): Show a file change summary table:
{branch}| File | Lines | What | |------|-------|------| | src/task_queue.py | +142 | TaskQueue class with priority support | | tests/test_task_queue.py | +245 | 45 tests including concurrency |
3 files changed, 474 insertions
45 tests passing
Part 3: Result artifact — always write a machine-readable JSON file to the run directory. Before writing final run artifacts, history pointers, or handoff files, read references/result-artifacts.md.
Judged completion minimum contract for {RUN_DIR}/result.json:
"verdict", "winning_slot", "confidence", "slots", "slots_succeeded", "files_changed", "tests_passing", "run_dir", "events_path", and "state_path".slot_details entries with "slot", "status", "report_path", "thread_id", "events_path", and "stderr_path" so the run can be inspected or resumed later.Post-write discovery updates for judged completion:
.slot-machine/runs/latest so scripts can resolve .slot-machine/runs/latest/result.json..slot-machine/history/latest.json with status: "finished" and the canonical per-run run_dir, state_path, and result_path..slot-machine/history/active.json to the idle sentinel { "schema_version": 1, "status": "idle" }.status: "finished" summary row to .slot-machine/history/index.jsonl.This is always written, every run. Humans ignore it. Autonomous loops and scripts parse it via .slot-machine/runs/latest/result.json. When a slot ran through Codex, persist the Codex thread_id plus the raw event/stderr paths under slot_details so the run can be inspected or resumed with codex resume {thread_id} later.
Use the status: "finished" history records above for judged completion and manual handoff completion. For blocked or failed terminal exits, refresh the same history files with status: "failed" while keeping the canonical per-run {RUN_DIR} paths.
Manual handoff writes the same run artifact path with unresolved result state:
"resolution_mode": "manual", "verdict": null, "winning_slot": null, "confidence": null, "handoff_path": "/abs/path/.slot-machine/runs/latest/handoff.md", "files_changed": null, "tests_passing": null, "run_dir", "events_path", and "state_path".slot_details. Each item includes "slot", "status", "review_path", "review_summary", nested "files_changed", and "tests_passing".worktree isolation, slot_details also carries "diff_path", "worktree_path", "branch", and "head_sha".file isolation, each slot_details item uses "output_path" instead of "worktree_path", and the worktree-only fields ("diff_path", "branch", "head_sha") are omitted or null."thread_id", "events_path", and "stderr_path" under the corresponding slot_details item.files_changed and tests_passing fields are null; per-slot file/test data lives under slot_details..slot-machine/runs/latest, set the top-level "handoff_path" to the absolute latest path for operator convenience.Manual handoff still writes events.jsonl, state.json, .slot-machine/history/latest.json, and .slot-machine/history/index.jsonl, resets .slot-machine/history/active.json to the idle sentinel, and must emit run_finished without any judge or synthesis events.
Only handoff_path points at .slot-machine/runs/latest; run_dir, events_path, and state_path remain canonical per-run {RUN_DIR} paths.
Deprecated shape to avoid in manual mode: "run_dir": "/abs/path/.slot-machine/runs/latest".
Profile-loading failures and other setup-time hard stops must still write the same run artifact path before exiting, then refresh .slot-machine/runs/latest to that run:
For profile-loading failures, set "blocked_stage" to "profile_loading" and describe the unresolved profile or base in "blocked_reason".
Blocked setup-time results use "resolution_mode": "blocked" with "verdict": null, "winning_slot": null, "confidence": null, "slots": 0, "slots_succeeded": 0, "files_changed": null, "tests_passing": null, "run_dir", "events_path", and "state_path".
Blocked setup-time exits still append run_failed, write state.json, refresh .slot-machine/history/latest.json with status: "failed", append a matching status: "failed" row to .slot-machine/history/index.jsonl, and reset .slot-machine/history/active.json to the idle sentinel.
Part 4: Footer — a horizontal rule followed by a one-line summary:
Complete — {word_count} words | {N} slots | {verdict}
Quiet mode: If quiet: true is set, suppress all Phase 2-3 progress tables and standout elements. Only output the Phase 4 verdict block, the Final Output section, and the footer. The run directory still has everything for post-hoc inspection.
If the project has a .slot-machine/ directory (or metrics_dir is configured), write run metrics:
mkdir -p .slot-machine
cat > .slot-machine/run-$(date +%Y%m%d-%H%M%S).json << 'METRICS'
{
"schema_version": 1,
"timestamp": "...",
"feature": "...",
"config": { "slots": N, ... },
"results": { "verdict": "...", ... },
"reviewers": { ... },
"agents": { "total_dispatched": N, ... },
"final_output": { "test_count": N, ... }
}
METRICS
See tests/fixtures/sample-metrics.json for the full schema. Metrics enable tracking improvement across runs — which approach hints win, how often synthesis triggers, whether reviewer differentiation improves over time.
The reviewers section tracks effectiveness per slot: findings_total, findings_acted_on (used by judge), findings_ignored (correct but unused), false_positives. The convergent_findings array lists issues found independently by multiple reviewers — these are the highest-confidence signals. When golden issues are available (from planted-bug test fixtures), precision and recall are computed.
Implementer subagents report one of four statuses in their output:
| Status | Meaning | Orchestrator Action | |--------|---------|-------------------| | DONE | Implementation complete, tests pass | Proceed to review | | DONE_WITH_CONCERNS | Complete but implementer has reservations | Proceed to review — include concerns in reviewer context | | BLOCKED | Can't proceed — architectural uncertainty, missing info | Retry with more context if retries remain. If still blocked, mark failed. | | NEEDS_CONTEXT | Spec is ambiguous or missing information | Provide the missing context and re-dispatch. If context isn't available, mark failed. |
Never ignore BLOCKED or NEEDS_CONTEXT. These indicate real problems. Forcing a retry without changes produces the same failure.
By default, all agents inherit the model from your current session. If you're running Opus, every slot gets Opus. If you're running Sonnet, every slot gets Sonnet. This means you always get the quality level you're paying for.
To override, set model configs in your project's AGENTS.md, CLAUDE.md, or both. Treat them as equal first-class sources and read whichever exists, or both if both exist. Merge non-conflicting slot-machine config from both files; if both define the same key, prefer the active host file: AGENTS.md in Codex, CLAUDE.md in Claude. Inline overrides still win. Only pass a model override to the native-host dispatch mechanism when the user has explicitly configured one — otherwise omit it so the session model is inherited.
| Role | Default | Configurable As | When to override |
|------|---------|-----------------|------------------|
| Implementer | inherit | implementer_model | Downgrade to save cost on mechanical tasks |
| Reviewer | inherit | reviewer_model | Downgrade to save cost on structured evaluation |
| Judge | inherit | judge_model | Upgrade if running a cheaper session model |
| Synthesizer | inherit | synthesizer_model | Upgrade if running a cheaper session model |
Approach hints are defined in the active profile's 0-profile.md. See profiles/coding/0-profile.md for the coding defaults and profiles/writing/0-profile.md for writing defaults.
When approach_hints is enabled (default: true), each slot gets a different hint to encourage genuinely divergent attempts. Assign randomly without replacement. The profile defines what "diversity" means for its task type — architectural diversity for coding, voice/structure diversity for writing.
The #1 baseline failure mode (observed in Task 0 testing without the skill): the orchestrator reads all implementations itself, makes an ad hoc comparison, and writes the synthesis — centralizing everything instead of delegating to specialized reviewer/judge/synthesizer agents. The review and judgment pipeline IS the skill's value. Don't collapse it.
| Thought / Action | What's Wrong | |-----------------|-------------| | "I'll just read all the code and compare them myself" | This is the most common shortcut. Dispatch independent reviewer agents per slot. Your job is orchestration, not evaluation. Blind review by fresh agents prevents bias from seeing all implementations simultaneously. | | "I'll make a quick comparison table instead of formal scorecards" | A comparison table is not a scorecard. Each slot needs independent review with 6 weighted criteria, issue categorization, and a verdict. The judge needs structured input, not your summary. | | "I can write the synthesis myself, I've already read the code" | Dispatch the synthesizer agent. It works from the judge's plan with a single base + targeted ports. Ad hoc synthesis by the orchestrator produces Frankenstein code. | | "I'll split the spec into different tasks for each slot" | That's standard parallel agents, not slot-machine. Each slot gets the FULL spec. | | "3 slots is probably enough" | Use the configured count. User chose N for a reason. Don't second-guess. | | "I'll skip the judge and just merge the highest scorer" | Judge does targeted code inspection. Scorecard alone isn't enough for the decision. | | "Synthesis sounds risky, I'll just PICK" | If multiple slots have complementary strengths, synthesis produces a better result. Trust the process. | | "This spec is probably clear enough" | Validate. Ambiguous specs × N slots = N different wrong implementations = expensive waste. | | "Reviewing each separately is overkill" | Structured review is what makes judgment possible. Ad hoc comparison = ad hoc results. |
All of these mean you're about to shortcut the review/judgment pipeline. That pipeline is the entire point of this skill.
isolation: "worktree" which follows the same underlying git worktree mechanics.Key difference from subagent-driven-development: SDD splits a plan into sequential tasks (one agent per task). Slot-machine gives the ENTIRE spec to N agents and compares their full implementations. They are complementary — you could use SDD within each slot for large features.
testing
Use when a well-specified task has meaningful design choices and you want to maximize quality by comparing multiple independent attempts. Works for coding, writing, and custom task types. Triggers on "slot-machine", "best-of-N", "pull the lever", "parallel implementations", or when quality matters more than speed and the spec is clear enough for independent work.
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".