claude/skills/evaluate-findings/SKILL.md
Critically assess external feedback (code reviews, AI reviewers, PR comments) and decide which suggestions to apply using adversarial verification. Use when the user asks to "evaluate findings", "assess review comments", "triage review feedback", "evaluate review output", or "filter false positives".
npx skillsauth add tobihagemann/turbo evaluate-findingsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Assess external feedback (code reviews, AI suggestions, PR comments) with adversarial verification. Triage findings into actionable verdicts. Do not apply fixes.
For each finding:
Read the referenced code at the mentioned location — include the full function or logical block, not just the flagged line
Check whether the code has diverged — if the finding references code that no longer exists or has since changed, skip it and note the divergence.
Determine scope — clarify whether the issue was introduced by the PR/changeset or is pre-existing.
Verify the claim against the actual code — does the issue genuinely exist?
Assess severity:
| Severity | Meaning | |----------|---------| | Critical | Drop everything. Blocking release or operations. | | High | Urgent. Should be addressed in the next cycle. | | Medium | Normal. To be fixed eventually. | | Low | Nice to have. Minor improvement. |
If the upstream reviewer already assigned a priority (P0-P3), map it: P0→Critical, P1→High, P2→Medium, P3→Low. Then re-assess based on what the actual code reveals. The upstream level is a starting point, not a binding constraint. When the re-assessed severity differs from the upstream level, note the change and the reason.
If the finding has no upstream priority, assess severity from scratch.
Assign a verdict and confidence:
| Verdict | Criteria | |---------|----------| | Apply | The finding is real and in scope: clear bug, missing check, genuine improvement, style violation matching project conventions | | Skip | False positive, subjective preference, reviewer is wrong, or the change's cost wildly dwarfs its benefit | | Escalate | Needs the user's judgment: behavior might be intentional, involves product intent, requires domain knowledge the agent lacks, the finding is out of scope, or two findings present a genuine trade-off |
Also assign an internal confidence level — High, Medium, or Low — reflecting how certain you are about the verdict. Confidence is used solely to route findings to the Devil's Advocate in Step 2. It does not appear in the output.
Escalate guidance: When a finding questions whether behavior is intentional and neither docs, specs, nor code comments clarify the intent, assign Escalate. Do not autonomously accept or reject findings that hinge on product intent. If a counterpart implementation exists elsewhere, suggest checking it for consistency.
Conflict guidance: When two findings contradict each other (they suggest opposite changes to the same code), treat the conflict as input, not a reason to skip. Verify each against the code and judge each on its merits as usual. If both are defensible and the choice is a genuine trade-off, assign Escalate to both, naming the opposing options so the user can decide.
Verdict guidance:
After the initial assessment, challenge uncertain findings from a different angle.
Spawn when any finding has Medium or Low confidence. Send only those findings to the subagent. High-confidence findings pass through unchallenged. Skip this step entirely if all findings are High confidence.
Launch a single subagent (model: "opus", do not set run_in_background). Provide the Medium/Low-confidence findings with their file locations, claims, and initial verdicts. Instruct the subagent to challenge each finding: try to prove it wrong, or confirm it with evidence.
The subagent picks research tools based on claim type:
| Claim Type | Tool | |------------|------| | API deprecated/removed/changed | Documentation MCP tools or WebSearch | | Method doesn't exist / wrong signature | Documentation MCP tools, WebSearch fallback | | Code causes specific bug or behavior | Bash (isolated read-only test snippet) | | Best practice or ecosystem claim | WebSearch | | Migration or changelog lookup | WebSearch → WebFetch |
Use whatever documentation tools are available. The specific tools vary by project setup.
Budget: max 2 research actions per finding. If the first action is conclusive, skip the second.
The subagent returns per finding:
Merge subagent results with the initial assessment:
Findings not investigated by the subagent keep their original verdict.
For Apply findings, document the issue and location. For Escalate findings, note what information would resolve the ambiguity. For Skip findings, document why.
Summarize the evaluated findings in a table:
| File | Issue | Source | Severity | Verdict | |------|-------|--------|----------|---------|
When Step 2 ran (any finding was investigated by the Devil's Advocate subagent), add an Investigated column:
| File | Issue | Source | Severity | Verdict | Investigated | |------|-------|--------|----------|---------|--------------|
Where Investigated shows:
For findings whose severity was re-assessed from the upstream level, append the change in the Severity cell (e.g., "High (was Medium)").
For disputed findings, add a callout below the table showing both perspectives. For each finding, indicate scope in the Issue column (e.g., "Pre-existing:" prefix).
Then use the TaskList tool and proceed to any remaining task. The next pending skill — /resolve-findings or /apply-findings — reads the findings table directly, including Escalate verdicts, which /apply-findings surfaces to the user via AskUserQuestion.
tools
Teach the user to deeply understand a change through interactive tutoring: restating understanding, drilling into why/what/how, and quizzing until mastery. The active counterpart to a one-shot explanation. Use when the user asks to "understand this change", "teach me this change", "help me understand what changed", "walk me through this change", "make sure I understand this", "quiz me on this", or "teach me what we did".
tools
Teach the user to deeply understand a change through interactive tutoring: restating understanding, drilling into why/what/how, and quizzing until mastery. The active counterpart to a one-shot explanation. Use when the user asks to "understand this change", "teach me this change", "help me understand what changed", "walk me through this change", "make sure I understand this", "quiz me on this", or "teach me what we did".
tools
Update an existing GitHub pull request's title and description to reflect the current state of the branch. Use when the user asks to "update the PR", "update PR description", "update PR title", "refresh PR description", or "sync PR with changes".
tools
Execute an approved split plan by creating separate branches, commits, and PRs for each change group. Use when the user asks to "split and ship", "ship the split plan", "create separate PRs", or "split changes into branches".