Reference for invoking the consult-llm CLI. Workflow skills delegate here for mechanics; they focus on orchestration.

Invocation

Run consult-llm with the prompt on stdin, using a quoted heredoc.

cat <<'__CONSULT_LLM_END__' | consult-llm -m <selector> -f src/foo.rs -f src/bar.rs
<prompt body>
__CONSULT_LLM_END__

Rules:

Run Bash in the foreground (synchronous, no run_in_background). Only background the call when the caller explicitly passes --background. Always set timeout: 600000 (10 minutes) — LLM calls routinely exceed the 2-minute default.
ALWAYS use <<'__CONSULT_LLM_END__' (quoted, with this exact terminator). The single quotes prevent shell expansion of $var, backticks, and escapes. The specific terminator __CONSULT_LLM_END__ is chosen because it won't appear in model responses — never use EOF or PROMPT which commonly appear in code samples and would silently truncate the prompt.
Fallback to --prompt-file <path> if the prompt contains __CONSULT_LLM_END__, or on Windows/PowerShell. Write the prompt to a temp file with $(mktemp), then pass it via consult-llm --prompt-file "$f" ….
Stdout layout. First line is [model:<id>] [thread_id:<id>], then a blank line, then the response body. In --web mode the prefix is just [model:<id>] (no thread).
Multi-turn. Read [thread_id:xxx] from line 1 and pass it back with -t <id> on the next call. Thread IDs are opaque strings — don't modify them. Not portable across backends.
Stderr carries progress/spinner output. Ignore it.
Exit codes. 0 success, 1 backend/network error (includes thread-not-found), 2 usage error, 3 configuration error (missing API key, unsupported backend).

Models

Selectors and allowed models resolvable in this environment (availability depends on which API keys are configured):

!`consult-llm models`

Pass a selector or an exact model ID to -m. Only enabled selectors are listed — anything not shown has no available model. For workflow skills that fan out to multiple models, use the ordered Default models list from consult-llm models when the user did not pass explicit model flags; duplicates in that list are intentional and must be preserved. For same-prompt calls, translate that list to repeated -m <model> args (the Default -m args: line is a convenience). For --run, create one --run model=<model>,prompt-file=<path> entry per default model instead; do not paste -m args into a --run invocation. For ordinary single-response CLI use, omit -m to use default_model/fallback. -m is ignored when --web is used.

Multi-model: repeat -m to consult multiple model positions in parallel (e.g. -m gemini -m openai, max 5 total runs). You may repeat the same selector/model (e.g. -m openai -m openai) to get independent calls with the same prompt. The response is a group format: first line is [thread_id:group_xxx], each model's answer under a ## Model: <id> header preceded by [model:<id>] [thread_id:<per-model-id>]. When the same resolved model appears more than once, only those duplicate sections use ## Model: <id>#K and [model:<id>#K] labels. Pass -t group_xxx to resume all group positions together on the next turn; pass an individual per-model thread ID with a single -m <model> to resume just that model outside the group context.

Task modes

Pick a --task mode based on the kind of question. Omit for neutral general-purpose.

| Mode | When to use | | ------------------- | ------------------------------------------------------------------------------------------------- | | general (default) | Neutral prompt. Defers to instructions in the prompt body. Use for open questions. | | review | Critical code reviewer — bugs, security issues, quality problems. | | debug | Root-cause troubleshooter from errors/logs/stack traces. Ignores style. | | plan | Constructive architect — explore trade-offs, design solutions. Always ends with a recommendation. | | create | Generative writer for docs, content, or design output. |

Web mode

--web copies the formatted prompt (system prompt + user prompt + file context) to the clipboard and exits 0 instead of calling an LLM. Only use when the user specifically asks for browser/web mode. After invoking, wait for the user to paste the external LLM's response back — do not continue implementation on your own. -m is ignored in this mode.

Prompt authoring

Ask neutral, open-ended questions. Do not suggest specific solutions in the prompt body — that biases the analysis. Let the LLM form its own view.

Flags

| Flag | Purpose | | ---------------------------- | --------------------------------------------------------------- | | -m, --model <selector\|id> | See "Models" above. Usually omit. | | -f, --file <path> | Repeatable. File context — path + code block. | | -t, --thread-id <id> | Resume a multi-turn conversation. See "Multi-turn". | | --task <mode> | Persona. See "Task modes" above. | | --web | Clipboard mode. See "Web mode" above. | | --prompt-file <path> | Read prompt from file instead of stdin. | | --diff-files <path> | Repeatable. Include git diff for this file as context. | | --diff-base <ref> | Base ref for diff (default HEAD — shows uncommitted changes). | | --diff-repo <path> | Repo path (default cwd). | | --run <spec> | Per-model run. See "Per-model runs" below. |

Run consult-llm --help for the authoritative flag list.

File context (`-f`) best practices

The consulted LLM has no access to your conversation history. Anything it needs - source files, logs, command output, traces, timelines, error messages - must be attached with -f.

Include conversation artifacts. If the current session already produced diagnostic output relevant to the question (log excerpts, traces, reproduction steps, command output), attach it as a temp file. Prefer raw evidence over prose summaries when both exist.
Re-run the original command piping to a temp file (cmd > /tmp/artifact.txt) instead of writing output from memory. This is cheaper, faster, and preserves the exact output.
Source files and diagnostic artifacts are both first-class -f inputs. Do not limit context gathering to source code.

Per-model runs

Use --run when a workflow needs to query multiple models in parallel with different prompt bodies. Do not use it for ordinary multi-model calls where the same prompt goes to every model — repeat -m for that.

GEMINI_PROMPT=$(mktemp)
CODEX_PROMPT=$(mktemp)

cat <<'__CONSULT_LLM_END__' >| "$GEMINI_PROMPT"
[prompt for Gemini]
__CONSULT_LLM_END__

cat <<'__CONSULT_LLM_END__' >| "$CODEX_PROMPT"
[prompt for Codex]
__CONSULT_LLM_END__

# First call — no existing threads yet
consult-llm \
  --run "model=gemini,prompt-file=$GEMINI_PROMPT" \
  --run "model=openai,prompt-file=$CODEX_PROMPT"

# Subsequent calls — continue each per-run thread
consult-llm \
  --run "model=gemini,thread=$GEMINI_THREAD,prompt-file=$GEMINI_PROMPT" \
  --run "model=openai,thread=$CODEX_THREAD,prompt-file=$CODEX_PROMPT"

# Duplicate resolved models are allowed; use distinct prompt files and distinct per-run threads.
consult-llm \
  --run "model=openai,prompt-file=$PROMPT_A" \
  --run "model=openai,prompt-file=$PROMPT_B"

Each --run value accepts model=<selector-or-id>, prompt-file=<path>, and optionally thread=<id>. Use mktemp for temporary prompt files and always use __CONSULT_LLM_END__ as the heredoc terminator. Use >| to overwrite temp files in zsh (avoids noclobber errors).

Constraints: max 5 total runs, cannot combine with -m/-t/--prompt-file/--web, duplicate resolved models are allowed, duplicate explicit thread=<id> values are rejected, thread=group_* is rejected because --run uses per-run thread IDs, shared -f and --diff-* context applies to every run, prompt-file paths with commas are unsupported.

Output is the same group format as multi-model -m calls. Extract per-run thread IDs from each section header for subsequent --run thread=... turns.

Reference for invoking the consult-llm CLI. Workflow skills delegate here for mechanics; they focus on orchestration.

Invocation

Run consult-llm with the prompt on stdin, using a quoted heredoc.

cat <<'__CONSULT_LLM_END__' | consult-llm -m <selector> -f src/foo.rs -f src/bar.rs
<prompt body>
__CONSULT_LLM_END__

Rules:

Run Bash in the foreground (synchronous, no run_in_background). Only background the call when the caller explicitly passes --background. Always set timeout: 600000 (10 minutes) — LLM calls routinely exceed the 2-minute default.
ALWAYS use <<'__CONSULT_LLM_END__' (quoted, with this exact terminator). The single quotes prevent shell expansion of $var, backticks, and escapes. The specific terminator __CONSULT_LLM_END__ is chosen because it won't appear in model responses — never use EOF or PROMPT which commonly appear in code samples and would silently truncate the prompt.
Fallback to --prompt-file <path> if the prompt contains __CONSULT_LLM_END__, or on Windows/PowerShell. Write the prompt to a temp file with $(mktemp), then pass it via consult-llm --prompt-file "$f" ….
Stdout layout. First line is [model:<id>] [thread_id:<id>], then a blank line, then the response body. In --web mode the prefix is just [model:<id>] (no thread).
Multi-turn. Read [thread_id:xxx] from line 1 and pass it back with -t <id> on the next call. Thread IDs are opaque strings — don't modify them. Not portable across backends.
Stderr carries progress/spinner output. Ignore it.
Exit codes. 0 success, 1 backend/network error (includes thread-not-found), 2 usage error, 3 configuration error (missing API key, unsupported backend).

Models

Selectors and allowed models resolvable in this environment (availability depends on which API keys are configured):

!`consult-llm models`

Task modes

Pick a --task mode based on the kind of question. Omit for neutral general-purpose.

Web mode

Prompt authoring

Ask neutral, open-ended questions. Do not suggest specific solutions in the prompt body — that biases the analysis. Let the LLM form its own view.

Flags

Run consult-llm --help for the authoritative flag list.

File context (`-f`) best practices

The consulted LLM has no access to your conversation history. Anything it needs - source files, logs, command output, traces, timelines, error messages - must be attached with -f.

Include conversation artifacts. If the current session already produced diagnostic output relevant to the question (log excerpts, traces, reproduction steps, command output), attach it as a temp file. Prefer raw evidence over prose summaries when both exist.
Re-run the original command piping to a temp file (cmd > /tmp/artifact.txt) instead of writing output from memory. This is cheaper, faster, and preserves the exact output.
Source files and diagnostic artifacts are both first-class -f inputs. Do not limit context gathering to source code.

Per-model runs

GEMINI_PROMPT=$(mktemp)
CODEX_PROMPT=$(mktemp)

cat <<'__CONSULT_LLM_END__' >| "$GEMINI_PROMPT"
[prompt for Gemini]
__CONSULT_LLM_END__

cat <<'__CONSULT_LLM_END__' >| "$CODEX_PROMPT"
[prompt for Codex]
__CONSULT_LLM_END__

# First call — no existing threads yet
consult-llm \
  --run "model=gemini,prompt-file=$GEMINI_PROMPT" \
  --run "model=openai,prompt-file=$CODEX_PROMPT"

# Subsequent calls — continue each per-run thread
consult-llm \
  --run "model=gemini,thread=$GEMINI_THREAD,prompt-file=$GEMINI_PROMPT" \
  --run "model=openai,thread=$CODEX_THREAD,prompt-file=$CODEX_PROMPT"

# Duplicate resolved models are allowed; use distinct prompt files and distinct per-run threads.
consult-llm \
  --run "model=openai,prompt-file=$PROMPT_A" \
  --run "model=openai,prompt-file=$PROMPT_B"

Output is the same group format as multi-model -m calls. Extract per-run thread IDs from each section header for subsequent --run thread=... turns.

Adoption

raine/consult-llm

$ install --global

Security Scan Results

SKILL.md

Invocation

Models

Task modes

Web mode

Prompt authoring

Flags

File context (`-f`) best practices

Per-model runs

Related Skills

raine/phased-implement

raine/implement

raine/review-panel

raine/panel

raine/consult-llm

$ install --global

Security Scan Results

SKILL.md

Invocation

Models

Task modes

Web mode

Prompt authoring

Flags

File context (`-f`) best practices

Per-model runs

Related Skills

raine/phased-implement

raine/implement

raine/review-panel

raine/panel

Adoption

raine/consult-llm

$ install --global

Security Scan Results

SKILL.md

Invocation

Models

Task modes

Web mode

Prompt authoring

Flags

File context (-f) best practices

Per-model runs

Related Skills

raine/phased-implement

raine/implement

raine/review-panel

raine/panel

raine/consult-llm

$ install --global

Security Scan Results

SKILL.md

Invocation

Models

Task modes

Web mode

Prompt authoring

Flags

File context (-f) best practices

Per-model runs

Related Skills

raine/phased-implement

raine/implement

raine/review-panel

raine/panel

File context (`-f`) best practices

File context (`-f`) best practices