skills/consult-llm/SKILL.md
How to invoke the consult-llm CLI. Canonical reference for the invocation contract, flags, stdin/stdout format, and multi-turn. Load this before calling consult-llm from any workflow skill (/consult, /collab, /debate, /collab-vs, /debate-vs).
npx skillsauth add raine/consult-llm-mcp consult-llmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference for invoking the consult-llm CLI. Workflow skills delegate here for mechanics; they focus on orchestration.
Run consult-llm with the prompt on stdin, using a quoted heredoc.
cat <<'__CONSULT_LLM_END__' | consult-llm -m <selector> -f src/foo.rs -f src/bar.rs
<prompt body>
__CONSULT_LLM_END__
Rules:
run_in_background). Only background the call when the caller explicitly passes --background. Always set timeout: 600000 (10 minutes) — LLM calls routinely exceed the 2-minute default.<<'__CONSULT_LLM_END__' (quoted, with this exact terminator). The single quotes prevent shell expansion of $var, backticks, and escapes. The specific terminator __CONSULT_LLM_END__ is chosen because it won't appear in model responses — never use EOF or PROMPT which commonly appear in code samples and would silently truncate the prompt.--prompt-file <path> if the prompt contains __CONSULT_LLM_END__, or on Windows/PowerShell. Write the prompt to a temp file with $(mktemp), then pass it via consult-llm --prompt-file "$f" ….[model:<id>] [thread_id:<id>], then a blank line, then the response body. In --web mode the prefix is just [model:<id>] (no thread).[thread_id:xxx] from line 1 and pass it back with -t <id> on the next call. Thread IDs are opaque strings — don't modify them. Not portable across backends.0 success, 1 backend/network error (includes thread-not-found), 2 usage error, 3 configuration error (missing API key, unsupported backend).Selectors and allowed models resolvable in this environment (availability depends on which API keys are configured):
!`consult-llm models`
Pass a selector or an exact model ID to -m. Only enabled selectors are listed — anything not shown has no available model. For workflow skills that fan out to multiple models, use the ordered Default models list from consult-llm models when the user did not pass explicit model flags; duplicates in that list are intentional and must be preserved. For same-prompt calls, translate that list to repeated -m <model> args (the Default -m args: line is a convenience). For --run, create one --run model=<model>,prompt-file=<path> entry per default model instead; do not paste -m args into a --run invocation. For ordinary single-response CLI use, omit -m to use default_model/fallback. -m is ignored when --web is used.
Multi-model: repeat -m to consult multiple model positions in parallel (e.g. -m gemini -m openai, max 5 total runs). You may repeat the same selector/model (e.g. -m openai -m openai) to get independent calls with the same prompt. The response is a group format: first line is [thread_id:group_xxx], each model's answer under a ## Model: <id> header preceded by [model:<id>] [thread_id:<per-model-id>]. When the same resolved model appears more than once, only those duplicate sections use ## Model: <id>#K and [model:<id>#K] labels. Pass -t group_xxx to resume all group positions together on the next turn; pass an individual per-model thread ID with a single -m <model> to resume just that model outside the group context.
Pick a --task mode based on the kind of question. Omit for neutral general-purpose.
| Mode | When to use |
| ------------------- | ------------------------------------------------------------------------------------------------- |
| general (default) | Neutral prompt. Defers to instructions in the prompt body. Use for open questions. |
| review | Critical code reviewer — bugs, security issues, quality problems. |
| debug | Root-cause troubleshooter from errors/logs/stack traces. Ignores style. |
| plan | Constructive architect — explore trade-offs, design solutions. Always ends with a recommendation. |
| create | Generative writer for docs, content, or design output. |
--web copies the formatted prompt (system prompt + user prompt + file context) to the clipboard and exits 0 instead of calling an LLM. Only use when the user specifically asks for browser/web mode. After invoking, wait for the user to paste the external LLM's response back — do not continue implementation on your own. -m is ignored in this mode.
Ask neutral, open-ended questions. Do not suggest specific solutions in the prompt body — that biases the analysis. Let the LLM form its own view.
| Flag | Purpose |
| ---------------------------- | --------------------------------------------------------------- |
| -m, --model <selector\|id> | See "Models" above. Usually omit. |
| -f, --file <path> | Repeatable. File context — path + code block. |
| -t, --thread-id <id> | Resume a multi-turn conversation. See "Multi-turn". |
| --task <mode> | Persona. See "Task modes" above. |
| --web | Clipboard mode. See "Web mode" above. |
| --prompt-file <path> | Read prompt from file instead of stdin. |
| --diff-files <path> | Repeatable. Include git diff for this file as context. |
| --diff-base <ref> | Base ref for diff (default HEAD — shows uncommitted changes). |
| --diff-repo <path> | Repo path (default cwd). |
| --run <spec> | Per-model run. See "Per-model runs" below. |
Run consult-llm --help for the authoritative flag list.
-f) best practicesThe consulted LLM has no access to your conversation history. Anything
it needs - source files, logs, command output, traces, timelines,
error messages - must be attached with -f.
cmd > /tmp/artifact.txt) instead of writing output from memory.
This is cheaper, faster, and preserves the exact output.-f
inputs. Do not limit context gathering to source code.Use --run when a workflow needs to query multiple models in parallel with different prompt bodies. Do not use it for ordinary multi-model calls where the same prompt goes to every model — repeat -m for that.
GEMINI_PROMPT=$(mktemp)
CODEX_PROMPT=$(mktemp)
cat <<'__CONSULT_LLM_END__' >| "$GEMINI_PROMPT"
[prompt for Gemini]
__CONSULT_LLM_END__
cat <<'__CONSULT_LLM_END__' >| "$CODEX_PROMPT"
[prompt for Codex]
__CONSULT_LLM_END__
# First call — no existing threads yet
consult-llm \
--run "model=gemini,prompt-file=$GEMINI_PROMPT" \
--run "model=openai,prompt-file=$CODEX_PROMPT"
# Subsequent calls — continue each per-run thread
consult-llm \
--run "model=gemini,thread=$GEMINI_THREAD,prompt-file=$GEMINI_PROMPT" \
--run "model=openai,thread=$CODEX_THREAD,prompt-file=$CODEX_PROMPT"
# Duplicate resolved models are allowed; use distinct prompt files and distinct per-run threads.
consult-llm \
--run "model=openai,prompt-file=$PROMPT_A" \
--run "model=openai,prompt-file=$PROMPT_B"
Each --run value accepts model=<selector-or-id>, prompt-file=<path>, and optionally thread=<id>. Use mktemp for temporary prompt files and always use __CONSULT_LLM_END__ as the heredoc terminator. Use >| to overwrite temp files in zsh (avoids noclobber errors).
Constraints: max 5 total runs, cannot combine with -m/-t/--prompt-file/--web, duplicate resolved models are allowed, duplicate explicit thread=<id> values are rejected, thread=group_* is rejected because --run uses per-run thread IDs, shared -f and --diff-* context applies to every run, prompt-file paths with commas are unsupported.
Output is the same group format as multi-model -m calls. Extract per-run thread IDs from each section header for subsequent --run thread=... turns.
testing
Coordinator skill that runs a multi-phase implementation across workmux worktrees. Each phase invokes /implement in its own worktree; the coordinator handles dispatch ordering (sequential, parallel, DAG), merge sequencing, and failure isolation. Composes /implement, /merge, workmux, and consult-llm.
development
Autonomously plan and implement a task with external LLM review. Writes a behavioral spec, runs an evidence-gated plan review (premortem + independent alternative), applies feedback through a decision ledger, implements with a triggered debug loop, and finishes with an evidence-gated post-implementation verification review. No user interaction.
development
Standalone multi-model code review of an existing diff. Multiple LLMs review in parallel; agent deduplicates, prioritizes by severity/confidence, and optionally applies localized fixes.
testing
Role-specialized LLM panel analyzes a task from asymmetric expert lenses (architect, security, maintainability, test-strategist by default). Agent synthesizes a trade-off resolution.