skills/agents-consilium/SKILL.md
Query external AI agents (Codex, Gemini, OpenCode, Claude Code headless) in parallel for independent second opinions, code review, bug investigation, and consensus on high-stakes decisions. Agents and models are configurable in config.json. Use for architecture choices, security review, or ambiguous problems where independent perspectives matter. Not for simple questions answerable from docs or the codebase — use web search or repo exploration instead.
npx skillsauth add codealive-ai/agents-reflection-skills agents-consiliumInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Query external AI agents for independent, unbiased expert opinions. Each agent has a distinct thinking role and responds in a structured format for easy comparison.
Different frontier models see different things. Each has a slightly different training distribution, tool-use style, and failure mode — so they latch onto different aspects of the same problem.
The skill keeps each agent independent (no debate, no cross-contamination) and lets the caller adjudicate — you get raw parallel perspectives, not a homogenized committee answer.
# 1. See what's configured (XML plan — dry-run, no agents run).
scripts/consensus-query.sh --list-agents
# 2. Ask the consensus (human-readable markdown).
scripts/consensus-query.sh "Should we use Postgres or SQLite for this CLI tool?"
# 3. Agent-friendly output (stable XML, escaped via CDATA).
scripts/consensus-query.sh --xml "Review this function" < src/auth.py
# 4. Code review mode (2 specialists, quoted-code validated, XML or markdown).
scripts/code-review.sh path/to/file.py
git diff HEAD | scripts/code-review.sh --xml --diff
Edit config.json to enable/disable agents or swap models. See config.example.json for a fuller template with multiple backends.
The prompt is a positional argument. Three ways to pass it — prefer file-based forms for anything containing backticks, $, !, or quotes (see Shell escaping):
# (a) RECOMMENDED: From a file via flag — RAW mode
# (no role/principles/template wrapping; agents see the file verbatim).
# Safe for any prompt content. Use this for benchmarks/evals too.
scripts/consensus-query.sh --xml --prompt-file prompt.txt
# (b) RECOMMENDED: From a file via stdin — best for long multi-line prompts.
# With NO positional argument, stdin is treated as the prompt.
# Template wrapping (principles + role + output schema) is applied.
scripts/consensus-query.sh --xml < prompt.txt
cat prompt.txt | scripts/consensus-query.sh --xml
# (c) Inline string — ONLY for short, shell-safe prompts
# (no backticks, no $, no !, no embedded quotes).
scripts/consensus-query.sh --xml "review this design"
When BOTH a positional prompt and stdin are given, stdin is appended to the prompt as --- Input --- context. That is the existing pattern for piping a file under review:
cat src/auth.py | scripts/consensus-query.sh "review this code"
# └── stdin = context ────┘ └── positional = the prompt ─┘
The trap. Technical prompts routinely contain backticks (around config keys, function names, code paths), $ (env vars, regex anchors, $() substitution), ! (interactive history), and unbalanced quotes. Inside a double-quoted positional argument, bash/zsh will execute the contents of backticks and $(...) as commands, expand $variable, and silently splice the result back into the prompt. Failed substitutions become empty strings. The agent then receives a mangled prompt and — worse — codex exec (and claude -p, opencode run) may stall in the foreground waiting on stdin for clarification, holding the process at 0% CPU for hours before you notice.
The exact symptom in ~/.codex/log/codex-tui.log or the agent's transcript:
(eval):1: command not found: index.knn.advanced.approximate_threshold=-1
(eval):1: command not found: POST
Reading additional input from stdin...
This is a shell-quoting bug, not a CLI bug. Single quotes around EOF in a heredoc, or a separate file, are the fix.
| Don't | Do |
|-------|----|
| tool "What does `foo()` do?" — backticks get eval'd by the shell | Use one of the three safe patterns below |
| tool "Explain $PATH precedence" — $PATH expands to your env var | Same |
| tool "Why does $(date) appear?" — $(date) runs as a subshell | Same |
| tool "Don't run !! again" — !! triggers history (zsh interactive) | Same |
| Inline prompts with any of: `, $, $(, !, embedded " | Files or heredocs |
# (1) RAW mode via --prompt-file — safest. No shell interpretation at any layer.
# No template/role wrapping either; agents see the file verbatim.
scripts/consensus-query.sh --xml --prompt-file prompt.md
# (2) Pipe via stdin — template wrapping (principles + role + output schema) IS applied.
cat prompt.md | scripts/consensus-query.sh --xml
scripts/consensus-query.sh --xml < prompt.md
# (3) Heredoc with SINGLE-quoted delimiter — single quotes around EOF disable
# expansion of $, backticks, !, and \ inside the body. This is the
# canonical bash idiom for "embed an arbitrary technical prompt inline".
scripts/consensus-query.sh --xml "$(cat <<'EOF'
Explain how `index.knn.advanced.approximate_threshold=-1` interacts with
`POST /_forcemerge?max_num_segments=1` in OpenSearch 3.5. Cover $variable
expansion semantics in the config parser too.
EOF
)"
# ^^^^ ^^^^
# single quotes around EOF — REQUIRED for safety
If you write <<EOF (no quotes) or <<"EOF", double-quoted-style expansion happens INSIDE the heredoc. Always single-quote the opening delimiter when the body is a prompt.
codex exec / claude -p directly)The same trap applies the moment you build a codex exec -m … "<prompt>" or claude -p "<prompt>" invocation yourself — e.g. because you want a non-default model, a custom -C <dir>, or a one-off -o <transcript>. Wrap the prompt the same way:
# Safe: prompt loaded from a file
codex exec -m gpt-5.5 --full-auto -C /tmp/work -o "$OUT" "$(cat prompt.md)"
# Safe: single-quoted heredoc
codex exec -m gpt-5.5 --full-auto -C /tmp/work -o "$OUT" "$(cat <<'EOF'
…prompt body with `backticks` and $vars…
EOF
)"
# UNSAFE: inline string with shell-special characters
codex exec -m gpt-5.5 --full-auto -C /tmp/work -o "$OUT" \
"Explain \`index.knn.advanced.approximate_threshold=-1\` semantics"
# ^^ even escaping the backticks here is fragile —
# the shell still has to interpret them once
If codex / claude / opencode appears to hang at 0% CPU and the output file is empty (or ~146 bytes of garbage prelude), this is almost always the bug. Check, in order:
tail -50 ~/.codex/log/codex-tui.log (or the agent's transcript / your captured 2> stderr) for command not found: or Reading additional input from stdin....set -x (or prepend echo to a copy) to print the expanded argv. If a backtick'd token vanished, expansion ate it.kill -TERM <pid>; kill -KILL if needed), reissue the prompt via --prompt-file or a single-quoted heredoc.Intellectual independence: Agents are instructed to think from first principles, challenge the framing of questions, and propose alternatives not mentioned in the query. They are free thinkers within the given context, not yes-men.
Role differentiation (set per agent in config.json):
Structured output: All agents respond using a common template (Assessment, Key Findings, Blind Spots, Alternatives, Recommendation with confidence level), making synthesis straightforward.
When formulating queries for consilium, follow these rules to maximize the value of independent opinions:
Agents are spawned in the caller's current working directory with their native agentic toolchain intact. They can:
Read, Grep, Glob, find_references, git log/blame across the real repositoryCLAUDE.md, AGENTS.md, README, config files, tests, call sites, neighboring modulesWhat they cannot do (enforced per backend):
| Backend | Read-only guard |
|---------|-----------------|
| Codex | --sandbox read-only --ask-for-approval never |
| Claude Code | --permission-mode plan |
| OpenCode | --agent plan (plan is opencode's built-in read-only agent) |
| Gemini CLI | --approval-mode plan |
No Edit, Write, Bash(git commit ...), Bash(rm ...), or any write-back tool is authorized. Implementation of recommendations is the caller's job. If a backend tries to escalate (e.g. needs to run a command that violates read-only), the call fails rather than silently escalating.
Agents are declared in config.json at the skill root. Each agent has:
| Field | Purpose |
|-------|---------|
| enabled | Whether it participates in consensus-query |
| backend | CLI that actually runs: codex-cli, gemini-cli, opencode, claude-code |
| model | Model id passed to that CLI |
| role | analyst (deep/precise) or lateral (broad/creative) |
| label | Display name in reports (optional) |
| effort | Reasoning effort. opencode: maps to opencode run --variant (e.g. low, medium, high, max) — provider-specific, see Discovering reasoning variants below. claude-code: maps to claude --effort (low, medium, high, xhigh, max). Other backends ignore it. |
Default config (config.json):
codex (backend=codex-cli, model=gpt-5.5, role=analyst) — enabledgemini-cli (backend=gemini-cli, model=gemini-3.1-pro-preview, role=lateral) — disabledopencode (backend=opencode, model=opencode/gemini-3.1-pro, role=lateral, effort=high) — enabledclaude-code (backend=claude-code, model=opus, effort=max, role=analyst) — disabledopencode-go-minimax (backend=opencode, model=opencode-go/minimax-m2.7, role=lateral, effort=high) — enabledopencode-go-deepseek (backend=opencode, model=opencode-go/deepseek-v4-pro, role=analyst, effort=max) — enabledopencode-go-mimo (backend=opencode, model=opencode-go/mimo-v2.5-pro, role=lateral, effort=high) — enabledopencode-go-kimi (backend=opencode, model=opencode-go/kimi-k2.6, role=analyst, effort=high) — enabledopencode-go-glm (backend=opencode, model=opencode-go/glm-5.1, role=lateral, effort=high) — enabledopencode-openai (backend=opencode, model=openai/gpt-5.5, role=analyst, effort=high) — disabled (reference entry; flip on if you want OpenAI direct)Effort policy: max is used wherever the model exposes it (claude-code, opencode-go/deepseek-v4-pro); high is the fallback for models that top out at high (opencode/gemini-3.1-pro, opencode-go/mimo-v2.5-pro, openai/gpt-5.5 if you don't want xhigh) or expose no variants at all (minimax, kimi, glm — effort is set but ignored by the provider).
Multiple agents can share one backend — the dispatcher passes the entry id through CONSILIUM_AGENT_ID, so each backend script reads its own slice of config.json.
Edit config.json to flip agents on/off or change models. Set CONSILIUM_CONFIG=/path/to/custom.json to use an override file.
The opencode backend works with any provider/model that OpenCode supports. For Gemini 3.1 Pro you have two options:
"model": "opencode/gemini-3.1-pro" — goes through OpenCode Zen. Works out of the box once opencode providers login opencode (or a valid Zen credential) is configured."model": "google/gemini-3.1-pro-preview" — goes straight to Google's v1beta API. Requires GOOGLE_GENERATIVE_AI_API_KEY (OpenCode does not pick up GEMINI_API_KEY for this provider).For OpenAI flagship models (GPT-5.5, GPT-5.4, etc.) there's a third path:
"model": "openai/gpt-5.5" — goes straight to OpenAI's API via the openai/* provider in OpenCode. Requires either an opencode auth login session for OpenAI (oauth) or OPENAI_API_KEY in the environment. The default config ships an opencode-openai entry disabled as a reference; flip enabled=true if you want a GPT-5.5 voice in the consilium. Variants: none / low / medium / high / xhigh — pick xhigh if you want the heaviest reasoning, otherwise high is the safe default.Flip between providers by editing the model field; the rest of the config stays the same.
opencode run --variant <effort> is provider-specific — each model exposes its own set (or none). Don't guess: enumerate them from the CLI before setting effort in config.json.
One-liner — list every model with its supported variants:
opencode models opencode --verbose 2>&1 | python3 -c '
import sys, json
lines = sys.stdin.read().split("\n")
i = 0
while i < len(lines):
line = lines[i].strip()
if line.startswith("opencode/") or line.startswith("opencode-go/"):
model_id, json_lines, depth, started = line, [], 0, False
i += 1
while i < len(lines):
s = lines[i]; json_lines.append(s)
for c in s:
if c == "{": depth += 1; started = True
elif c == "}": depth -= 1
i += 1
if started and depth == 0: break
try:
v = list(json.loads("\n".join(json_lines)).get("variants", {}).keys())
print(f"{model_id}\t{v}")
except Exception: pass
else:
i += 1
'
Swap opencode for opencode-go (or any other provider id) to scan a different namespace; drop the provider arg to scan everything opencode models knows.
Interpreting the result:
['low', 'medium', 'high', 'max']) → set effort to the highest one you want.[] → the model has no reasoning variants. --variant is silently ignored; setting effort in config is harmless but does nothing.opencode run rejects the call. Re-enumerate after upgrading opencode — providers add/remove tiers between releases.Snapshot of the currently configured opencode models (re-run the one-liner if you change the set):
| Model | Variants | effort in default config |
|-------|----------|----------------------------|
| opencode/gemini-3.1-pro | low, medium, high | high (no max) |
| opencode-go/deepseek-v4-pro | low, medium, high, max | max |
| opencode-go/mimo-v2.5-pro | low, medium, high | high (no max) |
| opencode-go/minimax-m2.7 | — | high (ignored) |
| opencode-go/kimi-k2.6 | — | high (ignored) |
| opencode-go/glm-5.1 | — | high (ignored) |
| openai/gpt-5.5 | none, low, medium, high, xhigh | high (entry disabled by default) |
The claude-code backend shells out to claude -p (headless mode, see docs). Useful when you want a second Claude in the consilium — e.g. Opus as analyst cross-checking Codex.
model: a shortname (opus, sonnet, haiku) or full id (claude-opus-4-7, claude-sonnet-4-6).effort: maps to claude --effort — accepts low, medium, high, xhigh, max. Default config sets max for opus; omit the field to fall back to the skill's default of max for the claude-code backend.--permission-mode plan — Claude can freely Read/Grep/Glob/Bash read-only across the project, but cannot Edit/Write. Override with CLAUDE_PERMISSION_MODE only if you know what you're doing.claude /login).Note: claude-code is disabled in the default config to avoid spawning another Claude session accidentally. Flip enabled to true in config.json (or CONSILIUM_CONFIG) when you want it in the consensus run.
All scripts in scripts/ directory. The skill auto-detects its install location.
Per-agent scripts always execute when invoked. The enabled field in config.json is consulted only by consensus-query.sh to build the default agent set (when neither -a nor -x is given). Direct invocation of a per-agent script ignores enabled — that's by design (single source of truth for the run/skip decision lives in the dispatcher).
When -a/-x causes consensus-query.sh to run an enabled=false agent, the dispatcher emits a stderr line like [<Label>] forced via --agents (enabled=false in config) so the override is visible.
# Codex (analyst by default)
scripts/codex-query.sh "question" [context_file]
cat file.py | scripts/codex-query.sh "review this"
# Gemini CLI (lateral by default; disabled in default config)
scripts/gemini-query.sh "question" [context_file]
cat file.py | scripts/gemini-query.sh "review this"
# OpenCode (lateral by default, model per config.json)
scripts/opencode-query.sh "question" [context_file]
cat file.py | scripts/opencode-query.sh "review this"
# Claude Code (analyst by default; disabled in default config)
scripts/claude-query.sh "question" [context_file]
cat file.py | scripts/claude-query.sh "review this"
scripts/consensus-query.sh "architecture question"
cat file.py | scripts/consensus-query.sh "review this code"
scripts/consensus-query.sh --xml "review this" # XML report for agent consumers
scripts/consensus-query.sh --list-agents # dry-run: dump plan, don't query
consensus-query.sh reads config.json, launches every agent with enabled=true in parallel, and prints their responses grouped by label. Add/remove agents permanently by editing the config; for ad-hoc runs use -a/--agents and -x/--exclude (see below).
All scripts accept -h / --help. Both consensus-query.sh and code-review.sh accept:
| Flag | Effect |
|------|--------|
| --xml | Emit <consilium-report> (or <code-review-report>) with each agent wrapped in <agent>…<response><![CDATA[…]]></response></agent>. Stable for agent consumers (no markdown-heading collision). |
| --list-agents (consensus only) | Print <consilium-plan> (every configured agent, enabled/disabled, with backend-available) and exit. No queries are run — use this as an inspection / dry-run. |
| -a, --agents <ID\|GLOB> | Override the active agent set with this id or glob (e.g. 'opencode-go-*'). Repeatable; comma-separated values also accepted (-a codex,opencode-go-kimi). When given, the per-agent enabled flag in config.json is ignored — only matched agents run. Falls back to env CONSILIUM_AGENTS. |
| -x, --exclude <ID\|GLOB> | Subtract matching agents from the active set. Repeatable. Combine with --agents for include-then-exclude composition. Falls back to env CONSILIUM_EXCLUDE. |
Ad-hoc agent selection examples:
# Single agent
scripts/consensus-query.sh -a opencode-go-kimi "Q"
# All OC-Go models (glob)
scripts/consensus-query.sh -a 'opencode-go-*' "Q"
# Everything-except-codex
scripts/consensus-query.sh -x codex "Q"
# Composition: only OC-Go but skip MiniMax
scripts/consensus-query.sh -a 'opencode-go-*' -x opencode-go-minimax "Q"
# Same via env (scriptable)
CONSILIUM_AGENTS='codex,opencode-go-kimi' scripts/consensus-query.sh "Q"
Exit codes (stable across all scripts):
| Code | Meaning |
|------|---------|
| 0 | Success (all queried agents replied; or, for consensus-query.sh, the active agent set may be smaller than the configured set if some are disabled or filtered) |
| 2 | Consensus only: partial failure (≥1 succeeded, ≥1 failed) |
| 3 | Consensus only: every queried agent failed |
| 4 | Config error (missing CLI, invalid config, unknown role/agent id) |
| 5 | Usage error (missing prompt, unknown flag) |
| other | Propagated from the backend CLI (e.g. 124 on timeout) |
scripts/code-review.sh is a focused pipeline for reviewing a single file or a unified diff. It runs exactly two specialist passes — security and correctness — in parallel, then validates each finding's quoted-code against the real source.
Design choices are grounded in the 2024-2026 multi-agent code review literature:
<quoted-code>, and the validator cross-checks it against the source file (quote-valid="true|false").# File on disk (quoted-code validated against the file)
scripts/code-review.sh path/to/file.py
scripts/code-review.sh --xml path/to/file.py
# Unified diff piped on stdin (quoted-code validation is skipped)
git diff HEAD | scripts/code-review.sh --diff
git diff HEAD | scripts/code-review.sh --xml --diff
<finding index="N" severity="critical|high|medium|low" category="security|correctness"
file="..." line-start="N" line-end="N" confidence="0.0..1.0"
source-agent="..." source-role="security|correctness"
quote-valid="true|false">
<title>...</title>
<rationale><![CDATA[includes one reason this might be a false positive]]></rationale>
<suggested-fix><![CDATA[...]]></suggested-fix>
<quoted-code><![CDATA[verbatim source at line-start..line-end]]></quoted-code>
</finding>
Findings are sorted severity desc, confidence desc. No severity filtering by default — triage is the caller's job.
Unified across security + correctness. Specialists score each finding on two axes (worst-case impact × likelihood/reachability) and pick the tier that matches. Synthesized from CVSS v4, OWASP Risk Rating, GitHub Advisory DB, Chromium, MSRC, SEI CERT, SonarQube, Semgrep.
| Severity | Action horizon | Operational definition | Security examples | Correctness examples |
|----------|----------------|------------------------|-------------------|----------------------|
| critical | Merge blocker | RCE / trust-boundary bypass / data loss / guaranteed outage, with a concrete exploit or dataflow trace | SQLi on public endpoint with concatenated query; unsafe deserialization of untrusted input; hardcoded prod credential | Payment/ledger math silently corrupts balances; unconditional null deref on hot request path; race on shared mutable state under prod load |
| high | Fix before release | Critical-tier impact gated by a non-trivial precondition (auth, specific config), OR moderate impact with high reachability | Stored XSS in authenticated admin view; CSRF on state-changing endpoint; path traversal behind login; missing authz on tenant resource | Unhandled exception on documented error path crashing a worker; file/DB-handle leak exhausting pools; retry logic that double-charges |
| medium | Schedule | Limited impact (info disclosure, localized incorrectness, degraded-but-recoverable), OR critical impact gated by implausible preconditions | Stack traces leaked to end users; missing HttpOnly/Secure on non-session cookie; weak-but-not-broken crypto parameter | Incorrect edge-case handling in non-critical helper; missing input validation that callers already satisfy; N+1 query degrading a list endpoint |
| low | Optional / backlog | Cosmetic, stylistic, defense-in-depth; minimal real-world impact | Missing nosniff header where CSP already mitigates; Math.random() for non-security id | Dead code; inconsistent naming; redundant null check after non-null assertion |
Adjustments: downgrade one level on mitigating factors (auth required, non-default config, unusual interaction). Speculative findings stay at the lower tier — upgrade only with a working PoC or trace.
You are the adjudicator. Specialists emit independent findings — your job is to select and synthesize, not re-review (RovoDev 2601.01129, RevAgent 2511.00517).
quote-valid="false") — likely hallucinations.critical = block the merge, high = fix before release, medium = track, low = optional.consensus-query.sh; specialists will be too narrow.code-review.sh is single-stage and caller-judged. For higher-stakes reviews
where you want the union of many panels filtered automatically by an LLM judge,
the skill ships two multi-stage pipelines ported from the ultrareview-bench
(see docs/blog/code-review-2pass-pilot/ if you have access). Each one
prescribes a fixed agent set and stage layout — they're not configurable
per-call, by design, because the configurations were tuned by marginal-uplift
analysis on a 65-issue ground-truth pilot.
Important: these are heavy modes. Don't run them on every diff. Use them when you'd otherwise pull two senior engineers off other work for a deep review, or for code that touches money / auth / persistence.
scripts/superreview.sh — small-swarm + 2 frontier add-ons10 LLM calls; ~$0.90–1.50 on a 12KB file (linear with size). Pareto sweet-spot in the bench at 67.7% recall / 82.7% sev-w on snippet1.cs.
Stage 1: discovery-small (parallel) 7 small/cheap passes
- opencode-go-deepseek-flash analyst uncapped
- opencode-go-qwen36-plus analyst uncapped
- opencode-go-qwen36-plus lateral uncapped
- opencode-go-deepseek-flash architecture uncapped
- opencode-go-deepseek-flash correctness cap=10
- opencode-go-qwen36-plus architecture cap=3
- opencode-go-qwen36-plus security uncapped
Stage 2: discovery-frontier (parallel) 2 hand-picked add-ons
- opencode-gpt5.5-xhigh analyst uncapped
- claude-code (Opus 4.7 max) lateral uncapped
Stage 3: dedup (deterministic union)
Stage 4: judge — claude-sonnet (default)
Usage:
scripts/superreview.sh path/to/file.cs
scripts/superreview.sh --xml path/to/file.cs
git diff HEAD | scripts/superreview.sh --diff
scripts/superreview.sh --dry-run path/to/file.cs # plan + config check, no LLM calls
scripts/superreview.sh --judge claude-code path/to/file.cs # override judge
scripts/ultrareview.sh — broad-grid + specialists + probe21 LLM calls; ~$1.50–3.00 on a 12KB file. Best severity-weighted recall in the bench (86.4%). Slower and more expensive than superreview; use when you need maximum coverage and lowest false-positive rate.
Stage 1: broad (parallel) 4 frontier analysts
- codex (gpt-5.5 high) analyst uncapped
- claude-code (Opus 4.7 max) analyst uncapped
- opencode (gemini-3.1-pro) lateral uncapped
- opencode-go-deepseek (Pro) analyst uncapped
Stage 2: specialists (parallel) 5×3 matrix, uniform cap=10
- 3 small models × 5 roles (security/correctness/performance/architecture/consistency)
Stage 3: probe (sequential) 1 generic gap probe (model picks focus)
- opencode-go-deepseek-flash auditor cap=10
Stage 4: dedup
Stage 5: judge — claude-code (Opus 4.7 max)
fallback: opencode-gpt5.5-xhigh on primary failure
Usage:
scripts/ultrareview.sh path/to/file.cs
scripts/ultrareview.sh --xml path/to/file.cs
scripts/ultrareview.sh --dry-run path/to/file.cs # plan check
scripts/ultrareview.sh --no-fallback path/to/file.cs # disable judge fallback
The Opus-judge fallback is intentional — Claude Code's claude -p backend
timed out at 1200s on 200+ findings during the bench. Setting --no-fallback
lets you treat a primary judge failure as fatal (useful in CI).
Both modes filter findings via the LLM judge before printing. Verdicts are:
new_severity from the judge.The default markdown output groups kept findings by severity. The --xml
form preserves the full <code-review-report> schema and adds a
<judge-summary> element with verdict counts.
config.json entriesThese modes hardcode their agent IDs. The default config.json already has
all of them defined (most are enabled=false, which is fine — multi-stage
modes ignore enabled and look up the entry by id directly):
| ID | Where used |
|---|---|
| codex | ultrareview broad |
| opencode | ultrareview broad |
| claude-code | both, plus ultrareview judge |
| claude-sonnet | superreview judge |
| opencode-go-deepseek | ultrareview broad |
| opencode-go-deepseek-flash | both, specialist + probe |
| opencode-go-qwen36-plus | both |
| opencode-gpt5.5-xhigh | superreview frontier add-on, ultrareview judge fallback |
| opencode-gemini-3-flash | ultrareview specialist |
If any are missing the script exits 4 with the list of missing IDs.
code-review.sh. Multi-stage adds 5-10× cost.code-review.sh.--xml from code-review.sh and parse
findings yourself.Pick by role, not by vendor. The default config has Codex (analyst) + OpenCode/Gemini-3.1-Pro (lateral) enabled; flip claude-code or gemini-cli on in config.json when you want an additional voice.
| Situation | Script | Role(s) involved |
|-----------|--------|-------------------|
| Code review, security audit | per-agent analyst script (codex-query.sh or claude-query.sh) | analyst — precision, edge cases |
| Architecture decision, design choice | consensus-query.sh | analyst + lateral — depth + breadth |
| "Are we solving the right problem?" | per-agent lateral script (opencode-query.sh or gemini-query.sh) | lateral — challenges premises |
| Bug investigation, root cause analysis | per-agent analyst script | analyst — goes deep into implementation |
| Exploring alternatives, brainstorming | per-agent lateral script | lateral — cross-domain analogies |
| High-stakes or irreversible decision | consensus-query.sh | all enabled — reduce blind spots |
| Agent-to-agent integration (downstream parser) | consensus-query.sh --xml | any — stable structured output |
Agents respond with a shared structure. Compare section by section:
When comparing the two responses, classify the pattern and act accordingly:
scripts/consensus-query.sh "We need real-time updates for ~100 concurrent users.
Updates are server-initiated only. Current stack: [describe your stack].
Latency target: under 500ms from event to UI update.
What approach would you recommend and why?"
cat src/services/auth.py | scripts/codex-query.sh \
"Review this authentication service. Focus on whatever concerns you most."
scripts/codex-query.sh "Database query returns empty result.
Direct query with same filter returns 5 documents.
[paste query here]
What's happening?"
CONSILIUM_CONFIG: Path to a custom JSON config (default: <skill>/config.json)CODEX_MODEL: Override Codex model at runtime (default: value from config)GEMINI_MODEL: Override Gemini CLI model at runtime (default: value from config)OPENCODE_MODEL: Override OpenCode model at runtime (default: value from config)OPENCODE_AGENT: Override OpenCode built-in agent (default: plan, read-only)OPENCODE_EFFORT: Override OpenCode reasoning effort (default: config effort field, or high)CLAUDE_MODEL: Override Claude Code model at runtime (alias like opus or full id)CLAUDE_PERMISSION_MODE: Override Claude Code permission mode (default: plan)CLAUDE_EFFORT: Override Claude Code reasoning effort (default: config effort field, or max if both unset). Levels: low, medium, high, xhigh, max.CODEX_EFFORT: Override Codex reasoning effort (default: config effort field, or high if both unset). Levels: minimal, low, medium, high, xhigh.GEMINI_API_KEY: Required for the gemini-cli backend (v1beta model access)GOOGLE_GENERATIVE_AI_API_KEY: Required if the opencode backend uses google/... modelsOPENAI_API_KEY: Required if the opencode backend uses openai/... models and OpenCode is not already logged in via opencode auth loginAGENT_TIMEOUT: Timeout seconds (default: 1200)codex --version) — for the codex-cli backendopencode --version) — for the opencode backend. For Zen models (opencode/...) run opencode providers login opencode once; for Google direct models (google/...) set GOOGLE_GENERATIVE_AI_API_KEY; for OpenAI direct models (openai/...) either run opencode auth login and pick OpenAI, or set OPENAI_API_KEY.gemini --version) — for the gemini-cli backend (optional; falls back to direct API)claude --version, claude /login) — for the claude-code backendGEMINI_API_KEY environment variable — required only when gemini-cli backend is enabled (get key at https://ai.google.dev/gemini-api/docs/api-key)development
First Principles Framework (FPF) — thinking amplifier. Use when user wants to think through a complex problem, architect a system, evaluate alternatives, decompose complexity, classify problems, define quality attributes, plan rigorously, make decisions under uncertainty, establish causality, reason about time and trends, describe architecture or structural views, check mathematical model fit, or improve pattern quality. Also triggers on: FPF, bounded contexts, SoTA packs, assurance calculus, decision theory, causal reasoning, temporal reasoning, architecture description, quality gates, FPF Parts A-K. Not for simple task planning, general philosophy, or Agile unrelated to FPF.
tools
Search, find, discover, install, remove, update, review, list, move, optimise, and iterate on skills for AI coding agents. Use when user asks "find a skill for X", "search for a skill", "is there a skill for X", "install skill", "remove skill", "update skills", "list skills", "review skill quality", "move skill", "check for updates", "optimise skill", "train skill on tasks", "iterate skill", "audit skill edits", "log skill edit", "diff skill versions", "trigger test skill", "transfer skill across agents", or "how do I do X" where X might have an existing skill. THE tool for skill discovery, ecosystem search, and SkillOpt-style training loops. Do not use for creating skills from scratch (use /skill-creator instead).
tools
Rename and refactor C# symbols in a .NET solution or multi-solution monorepo with a one-shot Roslyn CLI. Use when the user asks to rename a symbol, preview impact, update references across a solution, or refactor shared projects across several solutions.
tools
Hands-on playbook for macOS disk cleanup, dev-machine optimization, and proactive health alerting. Use when the Mac is full or slow, when a kernel panic / watchdog timeout / vm-compressor-space-shortage / Jetsam event happened, when the user asks to free disk space, audit storage, set up disk/memory alerts, or restore the same monitoring on a new Mac. Built around Mole (`mo` CLI) for safety guards plus a custom LaunchAgent-based alerter for active warnings. Covers Apple Silicon laptops with heavy AI/Docker workloads. Not for general macOS support, hardware diagnostics, networking issues, GUI / window-manager bugs, Time Machine recovery, broken app installs, or app-specific performance problems unrelated to disk or memory pressure.