skills/code-review/SKILL.md
Use when reviewing code changes for quality, correctness, and production readiness before merge
npx skillsauth add toongri/oh-my-toong-playground code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Orchestrates chunk-reviewer agents against diffs. Handles input parsing, context gathering, chunking, and result synthesis.
These two premises are non-negotiable. They are forwarded to every chunk-reviewer dispatch and they govern every decision in this skill.
Post-change state — The working directory reflects the post-change state of the target ref. PR mode achieves this by checking out the PR head into a dedicated linked worktree (see Step 1). Non-PR modes (branch comparison, auto-detect) achieve this by verifying HEAD-match + clean-tree on the current working directory (also Step 1). Either way: read code freely from the working directory — the diff is the delta, the working directory is the result. Do not pretend the file system is read-only or stuck at base.
No diff-only review — A diff is a delta. The unit of review is the system the diff produces. Always trace dependencies, callers, callees, interfaces, configurations, and runtime context across files. If you cannot explain how the changed code behaves end-to-end against the surrounding system, you have not reviewed it.
These premises must be reflected in the chunk-reviewer dispatch prompt — see Step 5.
# PR (number or URL)
/code-review pr 123
/code-review pr https://github.com/org/repo/pull/123
# Branch comparison
/code-review main feature/auth
# Auto-detect (current branch vs origin/main or origin/master)
/code-review
| Action | YOU Do | DELEGATE | |--------|--------|----------| | Requirements 3-question gate | Yes | - | | Diff range determination & git | Yes | - | | Evidence Verification (build/test/lint) | Yes | - | | Chunking decision | Yes | - | | Walkthrough/critique synthesis | Yes | - | | Walkthrough context enrichment | Yes (read code directly) | explore (fallback for broad architectural scope) | | Individual candidate verification (Phase 2) | NEVER | verifier subagent (one per candidate) | | Individual chunk review | NEVER | chunk-reviewer | | Code modification | NEVER | (forbidden entirely) |
RULE: Orchestration, synthesis, and decisions = Do directly. Per-candidate verification = DELEGATE to verifier subagents (one per candidate). Multi-angle finding = DELEGATE to chunk-reviewer. Code modification = FORBIDDEN. You still read code directly for walkthrough enrichment (Phase 1a).
digraph role_separation {
rankdir=LR;
"Orchestrator" [shape=box];
"chunk-reviewer" [shape=box];
"verifier" [shape=box];
"Codebase" [shape=cylinder];
"Orchestrator" -> "chunk-reviewer" [label="dispatch with {DIFF_COMMAND}"];
"chunk-reviewer" -> "Orchestrator" [label="candidate findings"];
"Orchestrator" -> "verifier" [label="dispatch one per candidate"];
"verifier" -> "Codebase" [label="Read/Grep to verify"];
"verifier" -> "Orchestrator" [label="verdict + enriched finding"];
"Orchestrator" -> "Codebase" [label="Read for walkthrough enrichment"];
}
Your role as orchestrator:
NOT your role:
git diff command (chunk-reviewers and verifiers execute the diff)Allowed in orchestrator context:
git diff {range} --stat outputgit diff {range} --name-only outputgit log {range} --oneline outputForbidden in orchestrator context:
git diff command (chunk-reviewers and verifiers execute the diff, not you)Context management for code reading:
RULE: You read code to enrich the walkthrough (Phase 1a). Per-candidate verification is delegated to verifier subagents. You CANNOT run the raw diff command or modify files.
Intent acquisition is non-negotiable. Either intent is confirmed (from artifacts, interview, or both), or the user explicitly defers to a code-quality-only review. There is no third option — proceeding without intent and without explicit deferral is forbidden. Reviewing without intent produces wrong severities, missed scope creep, and false positives born from misunderstanding the author's goal.
PR descriptions, commits, and comments routinely link to richer context (issue trackers, design docs, chat threads). Follow every link recursively — a linked ticket may itself link to a doc which links to a discussion thread; keep following until the trail ends.
Do not name specific tools. Use whatever fetch capability the environment provides for each link type. If a link cannot be fetched directly (no credential, no MCP for that platform, network unreachable), do not skip it — mark it for the user interview step.
Sources to consult per input mode:
| Input mode | Sources |
|------------|---------|
| PR | gh pr view --json title,body,labels,comments,reviews, gh pr view --comments, linked issues, gh issue view <n>, every external link found in the chain, commit messages on the PR branch |
| Branch comparison | Commit messages, branch name conventions, any linked tickets discovered in commits, related issues |
| Auto-detect | Recent commit messages on HEAD, any linked tickets found there |
After exhausting fetchable sources, ask the user about:
DO NOT interview the user about codebase facts (file locations, patterns, architecture, who calls what). Use Read/Grep/Glob and the explore agent for those — they are reachable from the working directory.
Before exiting Step 0, the state must be one of:
| State | Action | |-------|--------| | Intent confirmed — author's goal, approach, and constraints are understood from artifacts and/or interview | Proceed to Step 1 | | User explicit deferral — user says "skip", "그냥 리뷰해줘", "없어", "code quality only", or unambiguous equivalent | Set {REQUIREMENTS} = "N/A — code-quality-only review (user deferred)" and proceed | | Neither — artifacts thin and user not yet asked, OR user gave vague answers without explicit deferral | BLOCK. Do not proceed. Continue interview until one of the two states above is reached. |
There is no "I tried hard enough, just review" path. The block IS the safety mechanism.
When the user gives a vague answer that is not an explicit deferral, refine ONCE with a specific follow-up:
각 follow-up 메시지는 deferral 옵션을 함께 안내하여 사용자가 "skip / 그냥 리뷰해줘 / 없어" 어휘를 몰라도 escape 가능하도록 한다.
| User says | Follow-up | |-----------|-----------| | "대충 있어" / "뭐 좀 있긴 한데" | "어디서 찾을 수 있나요? 링크나 문서 위치를 알려주세요. (답하기 어려우면 'skip'으로 코드 품질만 리뷰 가능)" | | "그냥 성능 개선이야" | "어떤 지표를 개선하려 했나요? (latency, throughput, memory 등 — 답하기 어려우면 'skip'으로 코드 품질만 리뷰 가능)" | | "여러 가지 고쳤어" | "가장 중요한 1-2개만 알려주세요. 나머지는 코드에서 식별하겠습니다. (답하기 어려우면 'skip'으로 코드 품질만 리뷰 가능)" |
If refinement still yields a vague answer, surface the block explicitly to the user:
"의도를 명확히 잡기 어렵습니다. 둘 중 하나를 선택해주세요: (1) [구체적 질문]에 답하여 의도 확정, 또는 (2) 'skip / 코드 품질만 리뷰' 명시적 deferral. 둘 중 하나를 명시하기 전까지 리뷰는 시작하지 않습니다."
This is not adversarial — it is refusing to silently produce a worse review.
| Situation | Method | |-----------|--------| | 2-4 structured choices (review scope, focus areas) | AskUserQuestion tool | | Free-form / subjective (intent, alternatives, constraints, concerns) | Plain text question |
One question per message. Never bundle. Wait for the answer before the next question.
Question quality — every question must include either a specific anchor (a summary the user can correct) or a default action in parentheses (so progress is possible without an answer):
| BAD | GOOD | |-----|------| | "요구사항이 있나요?" | "PR 본문과 연결된 이슈에서 [요약]을 추출했습니다. 보완할 부분이 있나요?" | | "어떤 부분을 볼까요?" | "23개 파일이 변경됐습니다. 집중할 영역이 있나요? (없으면 전체 리뷰)" |
Include project context when interpolating the chunk-reviewer prompt template in Step 5. Describe what kind of software this is, who uses it, how it runs, and what depends on it — based on CLAUDE.md, README.md, and the artifacts gathered above.
If available context is insufficient to characterize the project, ask the user once: "What kind of software is this? (e.g., personal CLI tool, internal team service, public-facing API, shared library, etc.)"
Proceed to Step 1 only when the Intent Block Gate state is Intent confirmed or User explicit deferral. Any other state → continue at Step 0.
Determine range and setup for subsequent steps:
| Input | Setup | Range |
|-------|-------|-------|
| pr <number or URL> | Fetch and check out PR ref into the worktree (see below) | origin/<baseRefName>...pr-<number> |
| <base> <target> | Verify HEAD is <target> via git rev-parse --abbrev-ref HEAD; verify clean tree via git status --porcelain -uno. Abort if mismatch or dirty. | <base>...<target> |
| (none) | Detect default branch (origin/main or origin/master). Verify HEAD is the target branch via git rev-parse --abbrev-ref HEAD; verify clean tree via git status --porcelain -uno. Abort if mismatch or dirty. | <default>...HEAD |
This skill assumes the orchestrator is already running inside a worktree dedicated to this review (the caller is responsible for creating the worktree). Therefore: fetch the PR ref AND check it out. The working directory must reflect the post-change state of the PR so that all subsequent code reading (Phase 1a, Phase 2 verification, chunk-reviewer Step 2) sees the actual code under review.
PR ID 추출 규칙: 사용자가 URL(https://github.com/<org>/<repo>/pull/<N>) 형식으로 호출하면, 아래 bash로 진입하기 전에 trailing path segment에서 numeric <N>을 추출해 <number> 자리에 substitute하라. URL을 그대로 substitute하면 git fetch origin pull/<URL>/head가 invalid refspec으로 실패하고 git checkout -B pr-<URL>이 invalid 브랜치명으로 실패한다.
set -euo pipefail
# 0. Safety guards (Premise 1 enforcement) — abort BEFORE any state change
# -uno: untracked files are preserved by checkout; only check tracked modifications
if [ -n "$(git status --porcelain -uno)" ]; then
echo "Error: working directory has uncommitted changes — refusing to checkout over the user's work" >&2
exit 1
fi
# Distinguish primary repo from linked worktree.
# In a linked worktree, --git-dir points inside .git/worktrees/<wt>,
# while --git-common-dir points to the shared .git directory; they differ.
# In a primary clone they are equal — refuse so we never checkout over the user's main work tree.
if [ "$(git rev-parse --git-dir 2>/dev/null)" = "$(git rev-parse --git-common-dir 2>/dev/null)" ]; then
echo "Error: refusing to run in primary repo — create a dedicated linked worktree first (Premise 1). Hint: 'git worktree add ../review-pr-<N> -b review/pr-<N>'" >&2
exit 1
fi
# 1. Get base branch name (abort if gh fails or returns empty)
BASE_REF=$(gh pr view <number> --json baseRefName --jq '.baseRefName')
if [ -z "$BASE_REF" ]; then
echo "Error: gh pr view returned empty baseRefName — aborting before any fetch" >&2
exit 1
fi
# 2. Fetch base branch first, then PR ref last so FETCH_HEAD points to PR head
# (rerun-safe — force-push on the PR is picked up on re-review)
git fetch origin "${BASE_REF}"
git fetch origin pull/<number>/head
# 3. Reset local pr-<N> to the freshly fetched PR head (FETCH_HEAD) and check it out
# so the working directory matches the PR state
git checkout -B pr-<number> FETCH_HEAD
If the working directory is dirty (uncommitted changes) or the caller is not in a worktree, abort and report — do not silently checkout over the user's work. The worktree premise is the safety net; without it, the safety net is gone.
All range formats use three-dot syntax (A...B), which is equivalent to git diff $(git merge-base A B)..B. This shows only changes introduced by the target since the common ancestor — not changes on the base branch. This prevents false positives when origin/main has moved ahead after branching.
All subsequent steps use {range} from this table. All diff commands use git diff {range} -- <files> for path-filtered output. After checkout, code reading via Read/Grep/Glob reflects the post-change state, which is the intended behavior — diff shows the delta, the working directory shows the result.
After Input Parsing, before proceeding to Step 2:
git diff {range} --stat (using the range determined in Step 1)Collect in parallel (using {range} from Step 1):
git diff {range} --stat (change scale)git diff {range} --name-only (file list)git log {range} --oneline (commit history)Run build, test, and lint checks BEFORE dispatching any chunk-reviewer agents. This is a fail-fast gate — a failing check aborts the review immediately.
Do NOT assume commands. Discover per-project commands in this order:
~/.omt/{project}/project-commands.md — derive {project} with: basename -s .git $(git remote get-url origin 2>/dev/null), fallback: basename $(git rev-parse --show-toplevel 2>/dev/null || pwd)CLAUDE.md, AGENTS.md, README.md, CONTRIBUTING.mdpackage.json scripts, build.gradle / build.gradle.kts tasks, Makefile targets, pyproject.toml, Cargo.tomlDiscovery is per-command and independent — if build is found but lint is not, run build and test; skip only the undiscovered command. If no commands are discovered at all, skip this step with the unavailable message (see Output Format below).
Default timeout per command: 120 seconds.
Run in sequence — stop immediately on first failure:
Any failure → do NOT dispatch chunk-reviewer agents. Report failure and exit.
Produce a two-part structured table after all checks complete.
Part 1 — Automated Checks
| Check | Status | Details | |-------|--------|---------| | Build | PASS / FAIL | Success: one-line summary. Failure: last 30 lines of output | | Tests | PASS (N passed) / FAIL (N/M failed) | Success: pass count. Failure: last 30 lines of output | | Lint | PASS / FAIL | Success: one-line summary or "No errors". Failure: last 30 lines of output |
Part 2 — Test Coverage Mapping
Identify production source files from the Step 2 file list: source code files (exclude non-code files such as config, documentation, build scripts, migrations, Markdown, YAML, JSON, images, etc.) that do NOT match test glob patterns (*Test*, *Spec*, *_test*, test_*, *.test.*, *.spec.*, *_spec*).
For each production source file, find its corresponding test file:
| Production Source File | Test File | Coverage Status |
|------------------------|-----------|-----------------|
| path/to/File.kt | path/to/FileTest.kt | In diff / Exists, not in diff / No test found |
Coverage Status values:
If more than 30 production source files are in the diff, group by directory instead of listing per file.
Unavailable message (no commands discovered): "Evidence verification unavailable — no build/test/lint commands discovered"
If any check fails:
Determine scale from --stat summary line (N files changed, X insertions(+), Y deletions(-)):
| Condition | Strategy | |-----------|----------| | Total changed lines (insertions + deletions) < 1500 AND changed files < 30 | Single review | | Total changed lines >= 1500 OR changed files >= 30 | Group into chunks by directory/module affinity |
Chunking heuristic: group files sharing a directory prefix or import relationships.
Per-chunk size guide:
For each chunk, construct the diff command string using git's native path filtering:
git diff {range} -- <file1> <file2> ... <fileN>
The orchestrator constructs this command string but does NOT execute it. The command is passed to the chunk-reviewer via {DIFF_COMMAND}, and each reviewer CLI executes it independently.
${CLAUDE_SKILL_DIR}/../orchestrate-review/scripts/chunk-reviewer-prompt.mdgit diff {range} (single chunk) or git diff {range} -- <chunk-files> (multi-chunk). Orchestrator constructs this string but does NOT execute it.chunk-reviewer agent(s) via Task tool (subagent_type: "chunk-reviewer") with interpolated promptDispatch rules:
| Scale | Action | |-------|--------| | Single chunk | 1 agent call | | Multiple chunks | Parallel dispatch -- all chunks in ONE response. Each chunk gets its own interpolated template with chunk-specific {DIFF_COMMAND} and {FILE_LIST} |
Each chunk-reviewer scans its whole assigned diff through every angle and reports coverage per angle (the Angle Coverage block), not per file. A file that produced no candidates is clean, not omitted — never treat an unmentioned file as a coverage gap.
Re-dispatch a chunk-reviewer for its chunk only when its response signals an infrastructure failure: a "Partial review"/"Limited review" degradation notice, an Angle Coverage entry marked Unavailable, or a reported diff-command failure.
Cap: maximum 1 re-dispatch per original chunk; if the re-dispatch also fails, accept partial coverage.
After all re-dispatches complete, merge all chunk results (original + re-dispatched) before proceeding to Step 6.
After all chunk-reviewers return, produce the final output in three phases: the walkthrough (Phase 1), per-candidate verification via a verifier fan-out (Phase 2), and findings synthesis (Phase 3).
Orchestrator directly produces the Walkthrough from:
git diff {range} --stat / --name-only and git log {range} (Step 2) — the shape, scope, and stated intent of the changeExecution order: First evaluate Phase 1a (context enrichment). Then generate the sections below.
After reviewing the diff shape (Step 2 stat/name-only/log) and the merged candidates, assess whether the available information is sufficient to write the Walkthrough. If gaps exist, read the relevant code directly before writing Phase 1 output.
When to read code for enrichment:
| Gap | What to Read | Example | |-----|-------------|---------| | Cross-module relationships unclear | Interfaces, base classes, caller/callee code | "Chunk A shows OrderService calling PaymentGateway — read the gateway interface and its implementations" | | Architecture hierarchy unknown | Class hierarchy, module structure | "New class extends BaseRepository — read the base class to understand the contract" | | Inconsistent patterns across chunks | Both implementations + project conventions | "Chunk A uses Result<T>, Chunk B uses exceptions — read both to determine project convention" |
When NOT to enrich:
| Condition | Reason | |-----------|--------| | Simple changes (test-only, doc-only, config-only, single-function logic) | The diff stat and commit history are self-sufficient | | The diff shape and a quick read of changed files give a complete picture | No gaps to fill | | Trivial diff (< 5 files, < 100 lines) | Sparse analysis is expected, not a gap |
Fallback: For very broad architectural exploration (10+ files across many modules), dispatch an explore agent instead of reading everything directly. This is the exception, not the default.
Data flow:
Phase 1: Read the diff shape (stat/name-only/log) + the merged candidates
→ Assess: sufficient for Walkthrough?
→ If gaps: Read the relevant changed files directly (Read/Grep/Glob)
→ Write Walkthrough using: diff shape + Step 2 metadata + code reading results
Finders surface candidates; they do not judge them — and you do not judge them in your own context either. You dispatch one verifier subagent per candidate. Each verifier reads the actual code in isolation and returns exactly one verdict. Per-candidate isolation is deliberate: it keeps each judgment free of the other candidates' framing (no anchoring) and gives each verifier a clean context to read deeply. This is the precision gate — finders ran wide for recall; verification narrows to what is real. There is no severity score and no P-level; a candidate is real or it is not, at a stated confidence.
Dispatch:
found by angles and keeping the most concrete failure scenario). Verifying duplicates wastes verifier slots.references/verifier-prompt.md — the per-candidate verifier contract.general-purpose subagent per candidate via the Task tool (subagent_type: "general-purpose") with the interpolated prompt. All candidates in ONE response — parallel, foreground.Cap & batching: verify at most 25 candidates per batch. If more survive dedup, batch by file proximity, correctness candidates first, and state how many were deferred — never silently drop.
The verdict ladder, the verification method, and the read-only constraint all live in references/verifier-prompt.md — that is the contract each verifier applies. Keep it as the single source; do not restate the ladder here.
Enrichment arrives with the verdict. Each kept verifier returns the enriched finding (current code, what's wrong, failure scenario, fix, blast radius) because it had to read the code to decide. Phase 3 assembles these directly — there is no separate enrichment pass.
This is a report. You surface verified findings, ranked by what matters most. You do NOT decide whether to merge and you do NOT decide whether to fix — that is the reader's call. Removing the merge verdict is deliberate: a reviewer that labels findings and shows the failure path is more trusted than one that issues a pass/fail an author then argues with.
[Pre-existing] and listed under Out of Scope — unless the change aggravates it (increases blast radius or frequency), in which case it stays in the main list.| Situation | Handling |
|-----------|----------|
| Finding references a deleted file | Read the file at base branch (git show {base}:{file}). Note "(deleted file)" in Context. |
| Finding spans multiple files | Primary file gets the code snippet. Other files listed in Blast Radius with brief context. |
| Fix cannot be expressed as simple diff | State design direction + "Concrete diff not possible — structural change required". |
| Zero findings after verification | Skip enrichment. Report a clean review (Walkthrough + "No findings survived verification."). |
| 50+ candidates requiring verification | Dispatch verifiers in batches per Phase 2 (≤25), correctness candidates first. |
## Walkthrough
### Change Summary
[Generated in Phase 1]
### Core Logic Analysis
[Generated in Phase 1]
### Architecture Diagram
[Generated in Phase 1 — Mermaid or "No structural changes — existing architecture preserved"]
### Sequence Diagram
[Generated in Phase 1 — Mermaid or "No call flow changes"]
---
## Findings
[Ranked: correctness CONFIRMED → correctness PLAUSIBLE → cleanup CONFIRMED → cleanup PLAUSIBLE.
If nothing survived verification: "No findings survived verification." and stop here.]
### Correctness
[Findings where the change behaves wrong. Omit the section if none.]
### Cleanup
[Findings where behavior is correct but quality is low. Omit the section if none.]
Per-finding format (enriched during Phase 2 verification):
**[{CONFIRMED|PLAUSIBLE}] {finding title}**
- **Location**: `{file}:{line}` — {section/function name}
- **Current Code**:
```{lang}
[5-15 lines centered on the issue]
[concrete diff or, if structural, a design direction]
Fallback (verifier failed): If a verifier subagent errored or timed out for a candidate, do NOT assign a verdict — an unverified candidate is not a finding. List it under the ## Unverified (verifier unavailable) section using the finder's text, and state how many candidates went unverified. These are coverage gaps the reader must check manually, not confirmed or plausible findings.
[Findings on unchanged context lines, verdict-labeled. These do not gate anything — they are noted, not blocking.]
[Only if one or more verifiers errored or timed out. Candidates listed with the finder's text and a count, explicitly NOT verdict-labeled — coverage gaps, not findings. Omit the section entirely when every candidate was verified.]
[Optional, non-finding suggestions for the author. Omit if none.]
There is no Assessment / "Ready to merge" section. This review reports; it does not gate.
### Example Final Output
See `references/output-example.md` for a complete example synthesized from 3 chunks (Order API + domain, Payment integration, Inventory + messaging). Read it when producing your first review output to understand the expected format, level of detail, and enrichment style.
tools
Use at the end of a work session to review the WHOLE session and record entities worth pinning. This is the manual, deliberate complete-sweep review — NOT an automated nudge. Triggers on "wrap up", "wrap-up", "session wrap", "end of session", "what should I pin".
documentation
Use when initializing the pins knowledge graph for the first time in a project. Guides the user through creating pins.yaml (the storage manifest). Triggers on "setup pins", "initialize pins", "create pins.yaml", "first-run pins".
testing
Use when you need to record a single pin entity to the knowledge graph. Invokes lib/pins record() to validate and write a canonical .md file. Triggers on "record pin", "pin this", "save this as a pin".
databases
Use when looking up pins by type, tags, or source. Drives lib/pins/query.ts to retrieve matching pin entries from the knowledge graph. Supersedes the legacy manual ls+frontmatter procedure.