skills/review-pr-copilot/SKILL.md
Address GitHub Copilot review comments on the active PR by triaging into confidence tiers, fixing in atomic commits, resolving threads, and re-requesting review.
npx skillsauth add arndvs/ctrlshft review-pr-copilotInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Output "Read Review PR Copilot skill." to chat to acknowledge you read this file.
Use whenever Copilot has left review comments on a pull request and the user wants to address them. Trigger phrases: "address the review comments", "address review", "address copilot review", "fix the PR comments", "clean up review feedback", "address PR feedback", "re-request review". Also trigger proactively when the active PR has unresolved Copilot review threads and the user asks to commit or ship.
Never ask the user to paste comment bodies into chat. Fetch them yourself per step 1 — that's the whole point of the skill. The only thing you may ask the user for is owner/repo#number if no active PR is detected.
This skill is a thin orchestrator. It does not reimplement comment fetching or commit logic — those live in:
atomic-commits skill — branch hygiene, conventional commits, ship modemcp_github_pull_request_read, mcp_github_add_reply_to_pull_request_comment, mcp_github_add_issue_comment, mcp_github_pull_request_review_write (method resolve_thread), mcp_github_request_copilot_review, mcp_github_issue_write (method create_issue, used by the HITL-deferrable flow in step 5b). Any of these may be deferred and require tool_search to load — see the tool-discovery contract in step 5b before calling.github-pull-request_currentActivePullRequest (required for thread node IDs)These are hard dependencies, with one nuance:
resolve_thread). If unavailable, the skill may run in degraded mode via raw MCP after the user supplies owner/repo#number, but it must warn the user that thread resolution will not happen — only acknowledgment replies.Do not attempt to substitute raw gh CLI commands (e.g. gh pr view) or git plumbing for thread resolution — the resolve-thread and request-review flows use GraphQL node IDs that require gh api graphql or the VS Code PR extension. When the PR extension is unavailable or stale, gh api graphql is the sanctioned fallback for fetching thread node IDs (see the compliance-audit Phase 2b query). Plain gh pr subcommands do not expose these IDs.
All checks below are PR-scoped, so begin step 0 by fetching the PR context: call github-pull-request_currentActivePullRequest (or, in degraded mode, prompt the user for owner/repo#number and use mcp_github_pull_request_read). Once you have a PR number, verify the run is worth starting:
(cap=<cap> overridden by user on round <R_override>) (where <cap> is the current cap value, e.g. 3, and <R_override> is the round number when the user authorized the override). This makes it auditable from the PR comment thread that the cap was exceeded by consent, not by drift. PR #50 dogfooded this — round 5 posted only a bare round count with no override marker, so a reviewer landing on the PR could not tell whether the cap had been breached or whether the cap simply didn't exist.mcp_github_pull_request_read (method get_status_checks) on the PR, or read currentActivePullRequest.statusCheckRollup. If checks are failing, ask the user before proceeding — fixing review nits while CI is red wastes a re-review cycle.PENDING state (not yet submitted), stop. Re-running this skill will produce no comments and waste a request_copilot_review call.If any check fails, surface the reason and ask before continuing. Step 1 then completes the rest of PR identification (filtering reviews to Copilot, building the commentId → threadId map).
Use github-pull-request_currentActivePullRequest (PR extension, not raw MCP) to detect the active PR — this tool returns thread node IDs in reviewThreads[].id which are required for resolving threads in step 5. The raw MCP pull_request_read with get_review_comments returns comment metadata but not GraphQL thread IDs.
If currentActivePullRequest is unavailable or returns no PR, ask the user for owner/repo#number, then fall back to mcp_github_pull_request_read (method get_review_comments) — but warn the user that step 5 will only post acknowledgments and cannot programmatically resolve threads in this mode.
Filter reviews to Copilot:
user.type == "Bot" — canonical signaluser.login matches one of copilot-pull-request-reviewer[bot], copilot-swe-agent, github-copilot[bot], or contains copilotuser.type == "Bot" first and the login pattern only to disambiguate from other botsSkip threads where isResolved == true or isOutdated == true. Build a map of commentId → threadId from the response so step 5 can resolve the right thread per fix.
Score every comment 0–100 using observable signals, not vibes:
Positive signals (add):
+20 Comment is specific — cites exact line and exact change+25 Fix is mechanical — rename, add guard, add type, fix typo, add missing await+15 Touches ≤1 file and ≤10 lines+15 Touched code has test coverage+10 No public API / exported type signature change+15 Copilot quoted the existing code or proposed a concrete replacementNegative signals (subtract):
-20 Touches a shared util, type, schema, or hook used in 3+ places-25 Vague language: "consider", "might want to", "could", "perhaps", "in some cases"-15 Cross-file or cross-module change-20 Modifies test assertions or fixtures (risk: masking the bug)-15 Changes error-handling semantics (swallow ↔ throw, sync ↔ async)-10 File changed since the comment was posted (stale context)Start at 50, apply signals, clamp 0–100.
Tiers + policy:
| Tier | Score | Action | |---|---|---| | Auto | ≥ 75 | Fix, commit, resolve thread — no prompt. Reported in final summary. | | Confirm | 40–74 | Show diff preview + one-line approval prompt per comment before commit. | | HITL | < 40 | Do not fix in this PR. Tier into HITL-deferrable (file an issue, resolve thread) or HITL-blocking (leave open) per step 5b. |
HITL is for subjective fixes, not large ones. The tier is decided by ambiguity, not by effort:
foo to bar across 5 files" — that's Confirm-tier, large but mechanicalIf a comment has a clear approach but you don't want to do it now, the answer is the HITL-deferrable flow in step 5b (file an issue). Do not push it into HITL-blocking just to skip the work.
Forced-Confirm keywords. If the comment uses any of the following, the floor is Confirm tier regardless of arithmetic — these signal a behavior or contract change that needs explicit approval before committing, even if the change looks mechanical:
refactor: / "refactor this"PR #50 commit c6c4bed autofixed an "align error semantics" ask without a Confirm prompt — it was a behavior change masked as a refactor. Auto tier was wrong; the keyword should have forced Confirm.
Show your work. For every comment, print the signal arithmetic before the score — never just declare a number. List every applicable signal you considered; if no signals apply on one side, say so explicitly (e.g. no negative signals applied). If you cannot explain the arithmetic at all, you are vibing; stop and re-read the comment. Do not invent signals just to show one on each side.
Print the triage table before any action, with the math visible:
PR #<N> — <X> open Copilot comments
Auto (≥75):
1. src/auth/token.ts:42 [50 +25 mechanical +20 specific +15 ≤10 lines −10 stale = 100] add null guard on user
2. src/api/fetch.ts:17 [50 +25 mechanical +20 specific +15 ≤10 lines = 100 → clamped] fix typo in error message
Confirm (40–74):
3. src/utils/parse.ts:103 [50 +20 specific −10 stale +15 ≤10 lines −15 cross-file = 60] extract repeated regex
HITL (<40):
4. src/store/index.ts:1 [50 −25 vague −20 shared util = 5 → clamped] "consider refactoring this module"
Failure mode caught in dogfooding (PR #68): every comment reported as "100" with no arithmetic. If your output looks like that, the scoring step was skipped — restart from this section.
User can override the policy for the session: "auto everything", "confirm everything", "be conservative" (raise thresholds), or list specific comment numbers to re-tier.
Group Auto + Confirm comments into atomic slices. HITL comments are excluded from slicing — they get thread replies instead.
Slicing rule (resolves the atomic-vs-bundle question dogfood surfaced):
src/auth/*). One commit per slice.SKILL.md and all correct factual errors → one commit. If they touch different concerns (one fixes a typo, one rewrites a section) → split.Surface the plan as a table:
| Slice | Files | Commit message |
|---|---|---|
| 1 | src/auth/token.ts | fix(auth): handle null user in token refresh |
| 2 | src/api/*.ts | fix(api): propagate errors instead of swallowing |
Approval gates per tier (no global gate — the triage table in step 2 is the only session-wide checkpoint):
For each slice:
Commit body format:
<type>(<scope>): <description>
Addresses Copilot review on PR #<N>:
- <file>:<line> — "<short quote of the comment>"
As you commit each slice, track which threads were fully addressed so Step 6 can reply on and resolve them in bulk. Do not reply or resolve threads during this step — that happens in Step 6 after all fixes are pushed.
If a fix only partially addresses a thread, note it separately — Step 6 will post the acknowledgment but leave the thread open.
HITL has two sub-tiers. Pick one before replying:
If you're tempted to call something HITL-deferrable just because it would be a lot of work, re-read step 2 — that's a Confirm-tier ask, not HITL. HITL exists for ambiguity, not effort.
Show your work. Print the signal arithmetic for HITL just like Auto/Confirm — never just declare a confidence number.
Create a GitHub issue via mcp_github_issue_write (method create_issue). If the tool is not loaded, run tool_search for "create github issue" first; if no issue-creation tool is available, stop and surface to the user — the HITL-deferrable flow requires a durable issue, so falling back to a bare PR-thread reply would re-introduce the PR #50 round-5 stranded-thread failure mode. Do not silently downgrade to HITL-blocking.
Title: <scope>: <one-line summary> (from PR #<N> Copilot review)
Labels: best-effort — first use tool_search to find an available GitHub label-lookup tool (e.g. get_label / list_labels); if one is loaded, look up copilot-review and hitl-deferred and include only labels that already exist. If no label tool is available or the labels are missing, omit labels entirely — do not create labels without authorization, and do not let label resolution block issue creation.
Body:
**Parent PR:** #<N>
**Source comment:** <html_url of the PR comment>
**File:** `<path>:<line>`
**Confidence:** <score> = <signal arithmetic>
## Interpretation
<one sentence>
## Proposed approach
- <bullet>
- <bullet>
## Blockers / questions
- <what makes this non-trivial>
## Context for shft
Files to read:
- `<path>` — <why>
Acceptance criteria:
- [ ] <testable outcome>
Feedback loops:
- `<command>`
Post the thread reply via mcp_github_add_reply_to_pull_request_comment:
Filed as #<issue-number> for follow-up — not blocking this PR.
Confidence: <score> = <arithmetic>
Interpretation: <one sentence>
Proposed approach:
- <bullet>
- <bullet>
Resolve the thread via mcp_github_pull_request_review_write (method resolve_thread). The work is now tracked in the issue — the reviewer can either accept the deferral or comment on the issue to challenge it. Degraded mode — if thread IDs are not available (e.g. the VS Code PR extension is not loaded and gh api graphql is not installed), skip resolution, note "thread not auto-resolved (degraded mode)" in the reply and the summary, and continue. The issue is the durable artifact; resolution is best-effort.
Post a reply on the thread with this shape and do not resolve, do not file an issue, do not commit a speculative fix:
Flagging for human review.
Confidence: <score> = <arithmetic>
My interpretation: <one sentence>
Why this is HITL-blocking (not deferrable): <what makes the approach itself ambiguous>
Reply with guidance and I'll address in a follow-up commit.
Failure modes caught in dogfooding:
(confidence: 10) with no arithmetic, then deferred a normal test-coverage ask into HITL using effort-based reasoning ("bundling risks another re-review round"). Both wrong: the score must show signals, and "this is a lot of work" is not a HITL signal — file an issue and move on.After the final slice, complete all four of the following. None are optional. Do not consider the round complete until every item is verified.
Hand off to atomic-commits (Ship mode) for rebase + push.
For every Copilot thread addressed by an Auto or Confirm fix, post an acknowledgment reply via mcp_github_add_reply_to_pull_request_comment:
Fixed in <sha[:7]>: <one-line summary of the change>.
This is the paper trail — reviewers can see what was done without diffing commit-by-commit.
For every thread that received a "Fixed in ..." reply, resolve it via mcp_github_pull_request_review_write (method resolve_thread, threadId from step 1's map).
Verify resolution — after resolving, re-fetch review threads and confirm isResolved == true for every thread you replied to. If any thread failed to resolve (e.g. missing thread ID in degraded mode), note it in the summary.
mcp_github_request_copilot_review on the PR. If it fails, surface the PR URL so the user can re-request manually.requested_reviewers (via gh api repos/<owner>/<repo>/pulls/<N> --jq '.requested_reviewers[].login' or equivalent MCP call) and confirm Copilot appears. If it does not, retry once; if still missing, surface to the user.The round is not done until all four sub-steps are verified:
isResolved == true)requested_reviewersIf any item is missing, complete it before moving to Step 7. Skipping any item leaves the PR in a dead state — threads without replies lose the paper trail, unresolved threads clutter the next review, and a missing re-request means Copilot never re-runs.
Failure mode caught in dogfooding (PR #75): The agent pushed fixes across 34+ commits but never posted thread replies, resolved threads, or re-requested review — all three closing actions were skipped. A follow-up session had to catch the gap and complete them manually. This happened because steps 5 and 6 were treated as "nice to have" rather than mandatory, and context compaction across a large PR caused the agent to lose track of the closing sequence.
Post the summary in two places: chat (for the user) and as a top-level PR comment via mcp_github_add_issue_comment (for the next reviewer — human or bot — who lacks chat history). Post on every round, not just the last — each round's summary gives the next reviewer the per-round paper trail. Use the same block in both:
PR #<N> — Copilot review addressed (round <R>)
Pre-flight: round <R>/<cap> | CI <green|red|pending> | pending review <yes|no>[ | (cap=<cap> overridden by user on round <R_override>)]
Triage: Auto <X> | Confirm <Y> | HITL-deferrable <Zd> | HITL-blocking <Zb>
Comments fixed: <X+Y> / <total>
Issues filed: <Zd> (HITL-deferrable)
Threads resolved: <X+Y+Zd−Zd_degraded>
Threads not auto-resolved: <Zd_degraded> (HITL-deferrable, degraded mode — issue filed but thread ID unavailable)
Threads left open: <Zb> (HITL-blocking)
Commits: <C>
Review re-requested: yes | manual | no (cap reached)
Commits:
<sha[:7]> <message>
...
Awaiting human (HITL):
- #<comment-id> <file>:<line> — <one-line summary> [confidence: <score>]
...
Skipped / deferred:
- <comment summary> — <reason>
Omit the Threads not auto-resolved line entirely when Zd_degraded == 0 (i.e. all HITL-deferrable threads were resolved normally). Only include it when degraded mode prevented thread resolution.
Skip the PR comment only if X+Y+Zd+Zb == 0 AND no pre-flight check fired — i.e. the round was a true no-op. Otherwise post, even on rounds where you only filed issues or only triaged.
The bracketed (cap=<cap> overridden by user on round <R_override>) segment in the pre-flight line is mandatory on every round after the user authorizes continuing past the cap (per §0). Omit the bracketed segment on rounds 1 through <cap>. This is the only sanctioned place to record the override — do not bury it in chat.
Failure mode caught in dogfooding (PR #50): ran 5 rounds, only round 5 posted a PR-comment summary. The intermediate rounds left no paper trail — a reviewer landing on the PR mid-flow couldn't tell what had been triaged, fixed, or deferred without scrolling commit-by-commit.
address-pr-comments separately)-25 vague-language signal will normally drop these into HITL; reply on the thread per step 5b. Only fix if the user explicitly re-tiers it to Auto/Confirm with a concrete approach.tool_search to find the actual GitHub MCP tools available; common variants: get_pull_request, pull_request_read, mcp_github_pull_request_readOutdated in the conversation tab and collapses it. This is independent of resolution. A thread can be both Outdated and Resolved; the UI only surfaces "Outdated". Verify resolution via currentActivePullRequest reviewThreads[].isResolved, or expand the thread in the Files changed tab to see the green ✅ Resolved badge. If a user reports "the threads aren't resolved", check the data, not the conversation tab.requested_reviewers includes Copilot. If it doesn't, re-request before closing.If you notice this skill itself is wrong while running it (e.g. a tool name is stale, a step is contradictory, an MCP returns unexpected shape):
fix(skills): scope and references the dogfood failure.Failure mode caught in dogfooding (PR #68): mid-round, the agent identified two skill bugs (wrong tool name, missing acknowledgment step) and shipped them as slices 3 and 4 of a Copilot-comment series, polluting the commit history and bypassing the guardrail.
development
Use when implementing UI, checking dark/light mode, or validating animations — adds a visual feedback loop via browser screenshots so frontend changes are verified, not assumed.
development
Use when Claude Code sessions had many manual approval ("press 1") prompts or when auditing hook permissions; identifies which Bash commands required approval.
tools
Use after merging a PR or during periodic cleanup to archive plan-mode files by linking them to merged PRs.
testing
Use when stress-testing a plan against the project's domain model — grills the design, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise.