plugins/c-review/skills/c-review/SKILL.md
Performs comprehensive C/C++ security review for memory corruption, integer overflows, race conditions, and platform-specific vulnerabilities. Use when auditing native C/C++ applications, reviewing daemons or services for memory safety, or hunting integer overflow / use-after-free / race conditions in userspace code.
npx skillsauth add trailofbits/skills c-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Runs in the main conversation (invoke via /c-review:c-review). Orchestrator owns the Task* ledger as bookkeeping for retries; workers and judges have no Task tools. Workers and judges are named plugin subagents (c-review:c-review-worker, c-review:c-review-dedup-judge, c-review:c-review-fp-judge); tool sets are declared in plugins/c-review/agents/*.md. Findings are exchanged via markdown-with-YAML files in a shared output directory.
Native C/C++ application security review: memory safety, integer overflow, races, type confusion, Linux/macOS daemons, Windows userspace services.
| Subagent type | Purpose | Tool set |
|---|---|---|
| c-review:c-review-worker | Run assigned cluster, write findings | Read, Write, Edit, Grep, Glob, Bash |
| c-review:c-review-dedup-judge | Merge duplicates (runs first) | Read, Write, Edit, Glob |
| c-review:c-review-fp-judge | FP + severity + final reports (runs second) | Read, Write, Edit, Grep, Glob, Bash |
Tools come from each agent's frontmatter at spawn time. The orchestrator's Task*/Agent/Bash/etc. come from this skill's allowed-tools.
coordinator: write context.md → build_run_plan.py → TaskCreate × M
→ spawn primer (foreground) → spawn M workers (parallel)
→ classify Phase-7 outcomes + write findings-index.txt
→ dedup-judge → fp-judge → SARIF safety net → return REPORT.md
Output directory contains: context.md, plan.json, worker-prompts/, findings/, findings-index.d/ (per-worker shards), findings-index.txt, run-summary.md, dedup-summary.md, fp-summary.md, REPORT.md, REPORT.sarif.
Path convention: set ${C_REVIEW_PLUGIN_ROOT}=${CLAUDE_PLUGIN_ROOT} if that resolves (Bash: ls "${CLAUDE_PLUGIN_ROOT}/prompts/clusters/buffer-write-sinks.md"), otherwise Bash: find ~/.claude -path '*/plugins/c-review/prompts/clusters/buffer-write-sinks.md' -print -quit.
Scope convention: keep two scopes separate throughout the run:
finding_scope_root — the user-requested audit subtree. Workers may only file findings whose vulnerable location is inside this subtree.context_roots — read-only repo roots/files workers and judges may inspect to verify reachability, callers, wrappers, build flags, mitigations, and threat-model details. Default to . unless the user explicitly forbids broader context. Reading context outside finding_scope_root is allowed; filing findings there is not.Agent calls in a single assistant message already run concurrently. run_in_background=true defeats the Phase 6a primer cache, so every worker pays full cache-creation (cache_read_input_tokens=0) and the ~15 K-token primer is wasted M times. This is the single most common defect — multiple recent runs spawned 7-of-8 (or all) workers with bg=true. Default: omit run_in_background from worker spawns.build_run_plan.py." The script is the only authority for selection and rendering. Paraphrasing it drops fields that the worker self-check requires, producing worker-N abort: spawn prompt malformed. Always run the script and Read plan.json.REPORT.md from what completed." Hiding partial runs behind a successful report is a correctness bug. If any Phase-5 cluster task is not completed, surface it prominently in run-summary.md and the final response.dedup-summary.md on an empty index, fp-judge writes empty REPORT.md/REPORT.sarif, and Phase 8b's SARIF generator emits results: [] for the empty case. SARIF consumers depend on a stable artifact set.Bash: ls README* is fine for the preflight." Under zsh, an unmatched glob aborts the whole compound command before 2>/dev/null runs. Use Glob (preferred) or find (never fails on no-match).Run these phases in the main conversation.
Entry: skill invoked. Exit: threat_model, worker_model, severity_filter resolved; scope_subpath resolved or set to "."; finding_scope_root=scope_subpath; context_roots resolved.
The skill is invoked directly (no command wrapper). Parse any free-text arguments the user passed on the /c-review:c-review line (e.g. flamenco only, high severity only, use haiku) and pre-fill the answers they imply — then ask for any missing required parameters with one AskUserQuestion call. Never silently default the required parameters.
Required parameters:
| Parameter | Values | How to infer from args |
|---|---|---|
| threat_model | REMOTE / LOCAL_UNPRIVILEGED / BOTH | Words like "remote", "network", "attacker" → REMOTE; "local", "unprivileged" → LOCAL_UNPRIVILEGED; otherwise ask. |
| worker_model | haiku / sonnet / opus | Explicit model name in args. Otherwise ask (no silent default). |
| severity_filter | all / medium / high | "all", "every", "noisy" → all; "medium and above" → medium; "high only", "criticals only" → high. Otherwise ask — no silent default. |
| scope_subpath | repo-relative directory (optional) | Phrases like "X only", "just audit X/", "review subdirectory X" → src/X/ or the matching subdir. Apply fuzzy matching against top-level subdirectories of the repo. If absent, set "."; if ambiguous, ask. |
Call AskUserQuestion exactly once with only unresolved required parameters (threat_model, worker_model, severity_filter) plus scope_subpath only when the user explicitly requested a narrowed scope but it is ambiguous. If the required parameters were all pre-filled and scope is absent or resolved, skip the question.
After resolving scope_subpath, set finding_scope_root="${scope_subpath:-.}". Set context_roots="." by default so workers can verify callers/build settings outside a narrowed subtree without filing out-of-scope findings. If the user explicitly asks to forbid broader context, set context_roots="${finding_scope_root}" and note that reachability confidence may be lower.
Entry: Phase 0 complete. Exit: is_cpp, is_posix, is_windows flags determined.
Probe within ${finding_scope_root:-.}. Prefer Glob/Grep when available in the orchestrator's tool set; some sessions only expose Bash, so fall back to the equivalents below — both forms produce identical signals (non-empty output ⇒ flag true):
# is_cpp
find "${finding_scope_root:-.}" -type f \( -name '*.cpp' -o -name '*.cxx' -o -name '*.cc' -o -name '*.hpp' -o -name '*.hh' \) -print -quit
# is_posix
grep -rlE '#include[[:space:]]*<(pthread|signal|sys/(socket|stat|types|wait)|unistd|errno)\.h>' \
--include='*.c' --include='*.h' \
--include='*.cpp' --include='*.cxx' --include='*.cc' --include='*.hpp' --include='*.hh' \
"${finding_scope_root:-.}" | head -1
# is_windows
grep -rlE '#include[[:space:]]*<(windows|winbase|winnt|winuser|winsock|ntdef|ntstatus)\.h>' \
--include='*.c' --include='*.h' \
--include='*.cpp' --include='*.cxx' --include='*.cc' --include='*.hpp' --include='*.hh' \
"${finding_scope_root:-.}" | head -1
compile_commands.json is informational (no agent currently uses LSP), but the probe is mandatory so the run summary records whether richer local tooling is available. Probe via Glob: **/compile_commands.json under ${context_roots}. If Glob is unavailable, use:
printf '%s\n' "${context_roots:-.}" | tr ',' '\n' | while IFS= read -r root; do
[ -n "$root" ] && find "$root" -name compile_commands.json -print -quit
done | head -1
# `find "$root"` is quoted intentionally so a context root containing spaces
# (e.g. "/Users/me/My Repo") survives word-splitting. Do not unquote it.
If absent, suggest CMake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON/Bear/compiledb to the user but continue.
Entry: Phase 1 flags set. Exit: absolute output_dir resolved; ${output_dir}/findings/ exists.
Resolve an absolute path for output_dir (default: $(pwd)/.c-review-results/$(date -u +%Y%m%dT%H%M%SZ)/):
mkdir -p "${output_dir}/findings"
Entry: ${output_dir} exists. Exit: ${output_dir}/context.md written.
Skim README.{md,rst,txt} and any build file (Makefile, CMakeLists.txt, meson.build, configure.ac) — preflight with the Glob tool before any Read (a Read on a missing file aborts the turn). Do not use Bash: ls README* for the preflight: under zsh, an unmatched glob aborts the whole compound command before 2>/dev/null runs (observed: a Phase-3 ls src/X/README* call failed with no matches found and dropped the entire preflight). If you must use Bash, use find . -maxdepth 2 -name 'README*' -o -name 'Makefile' -o -name 'CMakeLists.txt' -o -name 'meson.build', which never fails on no-match.
Write ${output_dir}/context.md with: YAML frontmatter (threat_model, severity_filter, scope_subpath, finding_scope_root, context_roots, is_cpp, is_posix, is_windows, output_dir, compile_commands as present/absent plus path when present), then a short markdown body with five sections — Purpose (1-3 sentences), Scope (what's in finding_scope_root, and that findings outside it are out of scope), Entry points (where untrusted data enters: network, files, CLI, IPC), Trust boundaries (sandboxed vs trusted peers vs arbitrary remote), Existing hardening (fuzzing corpora, sanitizers, privilege separation).
Entry: language flags + threat_model known; ${output_dir}/findings/ exists. Exit: ${output_dir}/plan.json and ${output_dir}/worker-prompts/*.txt written; M = worker_count known.
Selection, filtering, path resolution, and spawn-prompt rendering are delegated to the script to prevent the "orchestrator paraphrases the spawn template and drops fields" failure mode:
python3 "${C_REVIEW_PLUGIN_ROOT}/scripts/build_run_plan.py" \
--plugin-root "${C_REVIEW_PLUGIN_ROOT}" --output-dir "${output_dir}" \
--threat-model "${threat_model}" --severity-filter "${severity_filter}" \
--scope-subpath "${finding_scope_root:-.}" --context-roots "${context_roots:-.}" \
--is-cpp "${is_cpp}" --is-posix "${is_posix}" --is-windows "${is_windows}"
The script writes plan.json + worker-prompts/worker-N.txt + (if --cache-primer=true, the default) worker-prompts/cache-primer.txt, and prints a JSON summary on stdout. Exits non-zero on any missing prompt — surface the message and stop. Typical M: 7 (C POSIX), 8 (C++ POSIX), 10 (C POSIX + Windows), 11 (C++ POSIX + Windows). After it returns, Read plan.json for the structured selection — never re-derive filtering or paths.
Entry: ${output_dir}/plan.json exists; M = plan.workers.length. Exit: cluster_task_ids[] created (1:1 with plan.workers), all pending.
The task ledger is orchestrator bookkeeping only (TUI visibility + Phase-7 retry tracking) — workers never read or write it. One TaskCreate per worker, populating metadata with kind="cluster", worker_n, cluster_id, spawn_prompt_path, pass_prefixes, attempt=1 — all values copied verbatim from plan.workers[i]. Track cluster_task_ids[] in plan.workers order.
Entry: cluster_task_ids[] populated; per-worker spawn prompt files exist at ${output_dir}/worker-prompts/worker-N.txt. Exit: all M Agent calls have returned (the parallel spawn block completed).
plan.run.cache_primer)A parallel batch from cold start cannot share cache (all M requests dispatch simultaneously, none has finished writing). To warm the prefix, spawn a tiny primer first — foreground (background spawns don't share cache with subsequent foreground spawns).
If plan.run.cache_primer == true, build_run_plan.py has written ${output_dir}/worker-prompts/cache-primer.txt. Spawn it in its own assistant message: Read the file, pass verbatim as Agent prompt with subagent_type=c-review:c-review-worker, model=${worker_model}, description="C review cache primer", no run_in_background. The script wrote the prefix byte-identical to worker-1.txt through the <context> block — that byte-identity is what gives the parallel workers their cache hit. The primer trailer contains Cache primer: true, which the worker system prompt treats as a first-class mode and returns exactly worker-PRIMER abort: cache primer (no analysis performed) in one text response with zero tool calls. Discard the abort line — Phase 7 ignores it (no worker-N id).
Foreground spawn already serializes — no sleep needed before Phase 6b. Skip Phase 6a entirely if plan.run.cache_primer == false.
STOP — read this before composing the spawn message.
Workers MUST be spawned foreground (no
run_in_backgroundfield, orrun_in_background=false). "Parallel" here means one assistant message containing MAgentcalls — that already runs them concurrently. Background spawns are NOT how you parallelize this skill.Background spawns defeat Phase 6a's primer cache: every worker pays full cache-creation on its first turn (
cache_read_input_tokens=0), and the primer's ~15 K tokens are wasted M times over. Two real runs (audit logs available) had exactly this symptom — every worker started withfirst_cr=0.Before sending the spawn message, audit your draft: every
Agentcall must have norun_in_backgroundkey. If you wroterun_in_background=true, delete it.
Required spawn shape: emit a single assistant message containing M Agent tool invocations. Sequential spawning serializes the review and is also wrong, but that failure is loud (timing); the background-spawn failure is silent (cost).
For each worker N ∈ [1..M]:
Read: ${output_dir}/worker-prompts/worker-N.txtAgent tool's prompt argument:| Parameter | Value |
|-----------|-------|
| subagent_type | c-review:c-review-worker |
| model | ${worker_model} (haiku / sonnet / opus) |
| description | C review worker N |
| prompt | the full text of worker-N.txt (no edits) |
| run_in_background | field MUST be omitted, OR set to false. Never true. See the foreground-spawn warning above. |
The spawn prompt is the single authority. Pass it verbatim — every field is required by the worker's self-check; any deviation triggers worker-N abort: spawn prompt malformed.
Anti-patterns to reject:
run_in_background=true (the dominant historical defect — see warning above).worker-N.txt.Entry: all M Phase-6 Agent calls have returned. Exit: every cluster has either succeeded or been retried up to the cap; ${output_dir}/findings-index.txt written.
The Phase-6 Agent invocations block until each worker returns. Inspect each worker's return text and apply this classifier in order — first match wins:
| # | Match (in return text) | Outcome | Action |
|---|---|---|---|
| 1 | worker-N complete: | success | TaskUpdate to completed. |
| 2 | abort: spawn prompt malformed, abort: pre-work budget exceeded, or abort: TaskList unavailable (legacy) | non-retryable orchestrator bug | Stop the run, surface the abort + spawn-prompt path. Re-running the same prompt repeats the failure — pre-work-budget exhaustion always means the worker couldn't pass its self-check, which a retry won't fix. |
| 3 | other worker-N abort: | retryable | Mark pending, set metadata.abort_reason, needs_respawn=true, increment attempt. |
| 4 | Agent errored or no complete:/abort: token | retryable | Same as #3 (transient worker crash). |
If any non-retryable, stop. Otherwise re-spawn each pending retryable with attempt < 2 in one parallel block (cap = 2 attempts per cluster). Replacement workers can safely overwrite partial files — finding IDs are deterministic per prefix.
For every complete: cluster, list ${output_dir}/findings/${prefix}-*.md for each pass_prefix (from plan.json). A worker that says "wrote N finding files" with N>0 but zero files on disk is suspicious — treat as retryable (classifier row #4). Zero claimed + zero on disk is fine.
Then build the index — workers wrote per-worker shards under ${output_dir}/findings-index.d/, prefer those:
# Use `find` rather than a `worker-*.txt` glob: zsh aborts the compound command on no-match
# even with `2>/dev/null`, so an empty findings-index.d would otherwise drop the index file.
# `awk 1` (vs `cat`) normalizes a missing trailing newline on any shard, so a future
# worker that writes shards via Write/printf instead of `ls -1 | sort` can't silently glue
# the last path of one shard onto the first of the next when sort -u dedupes.
if [ -d "${output_dir}/findings-index.d" ]; then
find "${output_dir}/findings-index.d" -maxdepth 1 -type f -name 'worker-*.txt' -exec awk 1 {} + 2>/dev/null \
| sort -u > "${output_dir}/findings-index.txt"
else
find "${output_dir}/findings" -maxdepth 1 -type f -name '*.md' 2>/dev/null | sort > "${output_dir}/findings-index.txt"
fi
sort -u collapses duplicates from Phase-7 retries. Empty file is the unambiguous "zero findings" signal. Cross-check the line count against the sum of wrote N worker claims; log mismatches but don't abort.
After task updates and index creation, run TaskList and write ${output_dir}/run-summary.md with:
threat_model, severity_filter, finding_scope_root, context_roots, language/platform flags, compile-commands status)worker_n, cluster_id, claimed finding count, shard line count, task status, retry/abort state)findings-index.txt line count and any mismatch against worker claimsIf any Phase-5 cluster task is not completed, include it prominently in run-summary.md and the final response. Do not hide a partial run behind a successful report.
Always run Phase 8 even on zero findings — both judges short-circuit on an empty index: dedup-judge writes a minimal no-op dedup-summary.md, and fp-judge writes empty REPORT.md/REPORT.sarif so SARIF consumers get a stable artifact set.
Entry: findings-index.txt exists. Exit: dedup-judge and fp-judge have returned; dedup-summary.md, fp-summary.md, REPORT.md, and ideally REPORT.sarif are written.
Each judge's full protocol is its system prompt (agents/c-review-{dedup,fp}-judge.md); spawn prompts pass only per-run variables. Do not reference prompts/internal/judges/ — those files don't exist.
Spawn sequentially (dedup first, fp-judge sees only merged primaries):
Agent(subagent_type="c-review:c-review-dedup-judge", description="Dedup judge", prompt=f"output_dir: {output_dir}")Agent(subagent_type="c-review:c-review-fp-judge", description="FP + severity judge", prompt=f"output_dir: {output_dir}\nsarif_generator_path: {sarif_generator_path}") — resolve sarif_generator_path to ${C_REVIEW_PLUGIN_ROOT}/scripts/generate_sarif.py.Judge failure handling. Same shape as Phase 7's classifier, applied to judge return text:
… complete: → success.… abort: → non-retryable. Surface the abort line plus ls -l ${output_dir}/findings-index.txt; stop.complete: (help message / error / question) → retryable once. SendMessage(to=<agentId>, …) rather than a fresh spawn (the agent already paid the protocol-parse cost). Include the explicit finding paths from findings-index.txt. If the second try still fails, surface the transcript and continue to Phase 8b.Entry: fp-judge returned, or the run aborted early. Exit: ${output_dir}/REPORT.sarif exists.
test -d "${output_dir}/findings" && python3 "${C_REVIEW_PLUGIN_ROOT}/scripts/generate_sarif.py" "${output_dir}"
Run unconditionally whenever findings/ exists — generator is idempotent (full overwrite), emits results: [] for zero-survivor runs, and handles partial runs (findings without fp_verdict are emitted as LIKELY_TP rather than being silently dropped). Always overwriting protects against the case where fp-judge crashed mid-write and left a corrupt REPORT.sarif on disk. Skip only if ${output_dir}/findings/ doesn't exist (Phase 2 failed). After this phase, update ${output_dir}/run-summary.md with judge/SARIF status.
Entry: Phase 8b complete. Exit: every item in Success Criteria verified true; REPORT.md returned to the caller.
Before composing the response, walk the Success Criteria checklist below and confirm each bullet against on-disk artifacts (TaskList for cluster tasks, ls/Read for the files). If any criterion fails, surface the failure prominently in the response — do not hide a partial run behind a successful report.
Then Read ${output_dir}/REPORT.md and return its content to the caller. Append an Artifacts list pointing at findings/, findings-index.txt, run-summary.md, dedup-summary.md, fp-summary.md, REPORT.md, REPORT.sarif.
Authoritative schema: agents/c-review-worker.md ("Finding File Format"). Three-stage write:
id, bug_class, title, location, function, confidence, worker) + seven body sections.merged_into on duplicates, or also_known_as + locations on primaries that absorbed.fp_verdict + fp_rationale on every primary; on survivors (TRUE_POSITIVE/LIKELY_TP) also adds severity, attack_vector, exploitability, severity_rationale.Authoritative: prompts/clusters/manifest.json. 47 always-on bug classes, up to 64 with all conditional clusters enabled. buffer-write-sinks is fully consolidated (its sub-prompts are not re-read at runtime).
The phase exits already cover most of this; the orchestrator-visible end-state is:
completed (verify via TaskList).${output_dir}/run-summary.md exists and records resolved scope/context, compile-commands probe result, worker claims vs index count, task status, and judge/SARIF status.merged_into) has fp_verdict + fp_rationale; every survivor (TRUE_POSITIVE/LIKELY_TP) also has severity, attack_vector, exploitability, severity_rationale.REPORT.md exists, severity-filtered per severity_filter.REPORT.sarif exists (Phase 8b safety net guarantees this).testing
Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
testing
Scans Solana programs for 6 critical vulnerabilities including arbitrary CPI, improper PDA validation, missing signer/ownership checks, and sysvar spoofing. Use when auditing Solana/Anchor programs.
development
Detects missing zeroization of sensitive data in source code and identifies zeroization removed by compiler optimizations, with assembly-level analysis, and control-flow verification. Use for auditing C/C++/Rust code handling secrets, keys, passwords, or other sensitive data.
testing
Guides authoring of high-quality YARA-X detection rules for malware identification. Use when writing, reviewing, or optimizing YARA rules. Covers naming conventions, string selection, performance optimization, migration from legacy YARA, and false positive reduction. Triggers on: YARA, YARA-X, malware detection, threat hunting, IOC, signature, crx module, dex module.