Evidence Review

One careful reviewer. Fast when scope is small, deep when risk is high, decision-grade when architecture is the target.

Boundaries

This skill may read code, tests, diffs, docs, logs, configs, and local validation output. This skill may run non-destructive inspection commands, linters, tests, and diff commands. This skill may write a review document only when the user explicitly asks for a file artifact. This skill must not edit implementation, rewrite tests, commit, push, publish, deploy, or post external comments. This is a review, not a fix. Findings first; remediation starts only after the user asks.

Modes

Quick — Sanity check for small diffs. One pass, strongest 1-5 findings, no broad scan.
Focused — Default mode for diffs, PRs, files, documents, plans, specs, and modules.
Architecture — Platform, directory, scaling readiness, migration, CTO handoff, or production readiness. Use the 90/10 architecture lens.
High-stakes — Security, money, auth, data loss, privacy, migrations, or irreversible decisions.

If the mode is ambiguous and no safe default exists, ask the user to choose Quick, Focused, or Architecture.

Phase 1: Scope

Entry: The user asked for a review or provided a target.

Determine target and mode:

No target means current branch diff.
PR URL or number means PR diff.
File path means that file plus call sites and tests.
Directory path means architecture or module review.
Markdown plan, spec, brainstorm, or ADR means document review.

Gather minimum context:

Read project instructions such as AGENTS.md, CLAUDE.md, CODEX.md, or GEMINI.md when present.
Inspect git status and diff when reviewing current work.
Read implementation and tests together when code is in scope.
Read related docs, plans, schemas, routes, configs, and call sites when they affect correctness.
Identify risk areas such as auth, secrets, validation, money, migrations, concurrency, privacy, and external APIs.
For architecture mode, identify team constraints, target scale, deployment shape, and critical flows.

Exit: Scope, mode, conventions, and risk areas are known.

Phase 2: Inspect

Entry: Scope is known.

Review from evidence, not impressions.

For code, inspect:

Correctness, edge cases, invariants, state transitions, and error handling.
Security, auth, authorization, input validation, secrets, injection, CSRF, SSRF, and rate limits.
Tests, meaningful assertions, negative paths, integration seams, fixtures, and false confidence.
Simplicity, YAGNI, one-use abstractions, dead branches, duplicated logic, and misleading names.
Maintainability, dependency direction, coupling, ownership, public contracts, and migration safety.
Performance, N plus one behavior, blocking work, memory growth, query shape, and hot paths.

For documents, inspect:

Why the decision exists, not only what will be done.
Scope, non-goals, constraints, acceptance criteria, and rejection criteria.
Risks, trade-offs, migration plan, rollback plan, and open questions.
Whether an autonomous implementer could start safely from the document.
Whether subjective work has references, anti-references, and preview gates.

For architecture, inspect:

System map: entrypoints, runtimes, data stores, queues, external services, auth boundaries, deployment, and ownership.
Claims versus code: verify README, docs, issue claims, and audit claims against implementation.
Critical flows: trace the top user, money, auth, data, and background-job flows end-to-end.
Failure modes: what is persisted, what is lost, how retry works, and how manual recovery works.
Production readiness: observability, alerts, logs, backups, rollback, admin path, CI, and environment separation.
Scaling bottlenecks: connection pools, queues, synchronous external calls, hot queries, CPU work, and single points of failure.
Team fit: a correct recommendation that the team cannot operate is not a good recommendation.

Exit: Candidate findings exist with supporting evidence.

Architecture 90/10 Lens

Use this when Architecture mode is active. Do enough to make decisions without turning the review into a full consulting engagement.

Map the current system from code, configs, and docs.
Verify claims against implementation. Docs are leads, not truth.
Trace the top three critical flows end-to-end.
For each critical flow, identify persisted state, lost state, retry path, and manual recovery.
Find bottlenecks and single points of failure.
Check minimum production readiness: errors, alerts, logs, backups, rollback, admin path, CI, and environment separation.
Convert major choices into ADR candidates with 2-4 options and one recommended minimum.
Sequence work into the first safe slice, next slices, and external gates.

Artifact Mode

Use artifact mode when the user asks for a file, the review is a handoff, or findings exceed what fits cleanly in chat. Do not write artifacts by default. If useful but not requested, offer it.

Default paths:

Focused review: docs/reviews/YYYY-MM-DD-<slug>.md
Architecture review: docs/reviews/YYYY-MM-DD-architecture-<slug>.md
Existing project convention wins over defaults.

Artifact structure:

# Review: <scope>

Date: <YYYY-MM-DD>
Mode: <Quick | Focused | Architecture | High-stakes>
Verdict: <verdict>

## Executive Summary

...

## Evidence Map

...

## Findings

...

## Architecture Addendum

...

## ADR Candidates

...

## Execution Order

...

## Appendix

Checked files and commands.

When writing an artifact, include checked files, commands, assumptions, uncertainty, and evidence type for major claims.

Phase 3: Challenge

Entry: Candidate findings exist.

Challenge every finding:

Is this a real failure, risk, or decision gap rather than a preference?
What concrete scenario triggers the problem?
What evidence proves it?
What would disprove it?
Is severity honest?

Drop findings that cannot survive this challenge. Mark uncertainty explicitly instead of pretending confidence. For high-stakes reviews, perform an independent second pass or subagent pass when available, then reconcile disagreements against evidence.

Exit: Findings are verified, deduplicated, and severity-ranked.

Phase 4: Report

Entry: Findings are verified.

Report only useful signal. Put the most important issues first.

## Review: <scope>

### Verdict

APPROVE | APPROVE WITH NOTES | REQUEST CHANGES | READY | NEEDS REFINEMENT | FIT WITH GAPS | NOT READY

### Critical Issues

- **[CRIT-1]** path#line — Finding. Evidence. Failure scenario. Suggested direction.

### Suggestions

- **[SUG-1]** path#line — Improvement. Trade-off if ignored.

### Observations

- **[OBS-1]** Useful context that does not block shipping.

### Document Gaps

- **[GAP-1]** Missing contract or unclear criterion. Why it matters.

### Architecture Addendum

- System map, ADR candidates, failure modes, production readiness, risks, and execution order when architecture is in scope.

Omit empty sections. If no issues are found, say so clearly and name what was checked. Do not invent problems to look useful.

Exit: Review delivered with a clear verdict.

Phase 5: Handoff

Entry: Review has been delivered.

Offer the next step without doing it automatically:

Address findings.
Discuss or challenge a finding.
Create a fix plan.
Capture a recurring pattern in project context.
Stop.

If the user asks to fix issues, exit review mode and switch to the normal coding contract.

Example

Strong finding:

- **[CRIT-1]** `src/auth/session.ts#42` — Expired sessions are accepted because `expiresAt` is parsed but never compared. A replayed cookie remains valid until signing key rotation. Check expiry before returning the session.

Weak finding:

- Auth looks risky.

Severity Rules

Critical — Likely bug, vulnerability, data loss, broken invariant, failed migration, or release blocker.
Suggestion — Real improvement with a trade-off, but not a blocker.
Observation — Useful context, pattern, or small hygiene note.

Nits belong only in Observations and only when they prevent confusion. Do not mix style preferences with release blockers.

Anti-Traps

Do not trust docs over code.
Do not trust passing tests without checking assertions.
Do not recommend best practices that the team cannot operate.
Do not skim high-risk paths.
Do not hide uncertainty.

Validation Checklist

Before final answer, verify:

Read implementation and tests when code is in scope.
Checked call sites and integration boundaries when relevant.
Checked project conventions before judging style.
Checked security and data handling for sensitive paths.
Checked test integrity, not just coverage presence.
Verified every Critical finding has evidence and a failure scenario.
Checked architecture mode against system map, critical flows, failure modes, ADRs, ops readiness, and execution order.
Wrote artifact only when explicitly requested or offered and accepted.

Evidence Review

One careful reviewer. Fast when scope is small, deep when risk is high, decision-grade when architecture is the target.

Boundaries

Modes

Quick — Sanity check for small diffs. One pass, strongest 1-5 findings, no broad scan.
Focused — Default mode for diffs, PRs, files, documents, plans, specs, and modules.
Architecture — Platform, directory, scaling readiness, migration, CTO handoff, or production readiness. Use the 90/10 architecture lens.
High-stakes — Security, money, auth, data loss, privacy, migrations, or irreversible decisions.

If the mode is ambiguous and no safe default exists, ask the user to choose Quick, Focused, or Architecture.

Phase 1: Scope

Entry: The user asked for a review or provided a target.

Determine target and mode:

No target means current branch diff.
PR URL or number means PR diff.
File path means that file plus call sites and tests.
Directory path means architecture or module review.
Markdown plan, spec, brainstorm, or ADR means document review.

Gather minimum context:

Read project instructions such as AGENTS.md, CLAUDE.md, CODEX.md, or GEMINI.md when present.
Inspect git status and diff when reviewing current work.
Read implementation and tests together when code is in scope.
Read related docs, plans, schemas, routes, configs, and call sites when they affect correctness.
Identify risk areas such as auth, secrets, validation, money, migrations, concurrency, privacy, and external APIs.
For architecture mode, identify team constraints, target scale, deployment shape, and critical flows.

Exit: Scope, mode, conventions, and risk areas are known.

Phase 2: Inspect

Entry: Scope is known.

Review from evidence, not impressions.

For code, inspect:

Correctness, edge cases, invariants, state transitions, and error handling.
Security, auth, authorization, input validation, secrets, injection, CSRF, SSRF, and rate limits.
Tests, meaningful assertions, negative paths, integration seams, fixtures, and false confidence.
Simplicity, YAGNI, one-use abstractions, dead branches, duplicated logic, and misleading names.
Maintainability, dependency direction, coupling, ownership, public contracts, and migration safety.
Performance, N plus one behavior, blocking work, memory growth, query shape, and hot paths.

For documents, inspect:

Why the decision exists, not only what will be done.
Scope, non-goals, constraints, acceptance criteria, and rejection criteria.
Risks, trade-offs, migration plan, rollback plan, and open questions.
Whether an autonomous implementer could start safely from the document.
Whether subjective work has references, anti-references, and preview gates.

For architecture, inspect:

System map: entrypoints, runtimes, data stores, queues, external services, auth boundaries, deployment, and ownership.
Claims versus code: verify README, docs, issue claims, and audit claims against implementation.
Critical flows: trace the top user, money, auth, data, and background-job flows end-to-end.
Failure modes: what is persisted, what is lost, how retry works, and how manual recovery works.
Production readiness: observability, alerts, logs, backups, rollback, admin path, CI, and environment separation.
Scaling bottlenecks: connection pools, queues, synchronous external calls, hot queries, CPU work, and single points of failure.
Team fit: a correct recommendation that the team cannot operate is not a good recommendation.

Exit: Candidate findings exist with supporting evidence.

Architecture 90/10 Lens

Use this when Architecture mode is active. Do enough to make decisions without turning the review into a full consulting engagement.

Map the current system from code, configs, and docs.
Verify claims against implementation. Docs are leads, not truth.
Trace the top three critical flows end-to-end.
For each critical flow, identify persisted state, lost state, retry path, and manual recovery.
Find bottlenecks and single points of failure.
Check minimum production readiness: errors, alerts, logs, backups, rollback, admin path, CI, and environment separation.
Convert major choices into ADR candidates with 2-4 options and one recommended minimum.
Sequence work into the first safe slice, next slices, and external gates.

Artifact Mode

Use artifact mode when the user asks for a file, the review is a handoff, or findings exceed what fits cleanly in chat. Do not write artifacts by default. If useful but not requested, offer it.

Default paths:

Focused review: docs/reviews/YYYY-MM-DD-<slug>.md
Architecture review: docs/reviews/YYYY-MM-DD-architecture-<slug>.md
Existing project convention wins over defaults.

Artifact structure:

# Review: <scope>

Date: <YYYY-MM-DD>
Mode: <Quick | Focused | Architecture | High-stakes>
Verdict: <verdict>

## Executive Summary

...

## Evidence Map

...

## Findings

...

## Architecture Addendum

...

## ADR Candidates

...

## Execution Order

...

## Appendix

Checked files and commands.

When writing an artifact, include checked files, commands, assumptions, uncertainty, and evidence type for major claims.

Phase 3: Challenge

Entry: Candidate findings exist.

Challenge every finding:

Is this a real failure, risk, or decision gap rather than a preference?
What concrete scenario triggers the problem?
What evidence proves it?
What would disprove it?
Is severity honest?

Exit: Findings are verified, deduplicated, and severity-ranked.

Phase 4: Report

Entry: Findings are verified.

Report only useful signal. Put the most important issues first.

## Review: <scope>

### Verdict

APPROVE | APPROVE WITH NOTES | REQUEST CHANGES | READY | NEEDS REFINEMENT | FIT WITH GAPS | NOT READY

### Critical Issues

- **[CRIT-1]** path#line — Finding. Evidence. Failure scenario. Suggested direction.

### Suggestions

- **[SUG-1]** path#line — Improvement. Trade-off if ignored.

### Observations

- **[OBS-1]** Useful context that does not block shipping.

### Document Gaps

- **[GAP-1]** Missing contract or unclear criterion. Why it matters.

### Architecture Addendum

- System map, ADR candidates, failure modes, production readiness, risks, and execution order when architecture is in scope.

Omit empty sections. If no issues are found, say so clearly and name what was checked. Do not invent problems to look useful.

Exit: Review delivered with a clear verdict.

Phase 5: Handoff

Entry: Review has been delivered.

Offer the next step without doing it automatically:

Address findings.
Discuss or challenge a finding.
Create a fix plan.
Capture a recurring pattern in project context.
Stop.

If the user asks to fix issues, exit review mode and switch to the normal coding contract.

Example

Strong finding:

- **[CRIT-1]** `src/auth/session.ts#42` — Expired sessions are accepted because `expiresAt` is parsed but never compared. A replayed cookie remains valid until signing key rotation. Check expiry before returning the session.

Weak finding:

- Auth looks risky.

Severity Rules

Critical — Likely bug, vulnerability, data loss, broken invariant, failed migration, or release blocker.
Suggestion — Real improvement with a trade-off, but not a blocker.
Observation — Useful context, pattern, or small hygiene note.

Nits belong only in Observations and only when they prevent confusion. Do not mix style preferences with release blockers.

Anti-Traps

Do not trust docs over code.
Do not trust passing tests without checking assertions.
Do not recommend best practices that the team cannot operate.
Do not skim high-risk paths.
Do not hide uncertainty.

Validation Checklist

Before final answer, verify:

Read implementation and tests when code is in scope.
Checked call sites and integration boundaries when relevant.
Checked project conventions before judging style.
Checked security and data handling for sensitive paths.
Checked test integrity, not just coverage presence.
Verified every Critical finding has evidence and a failure scenario.
Checked architecture mode against system map, critical flows, failure modes, ADRs, ops readiness, and execution order.
Wrote artifact only when explicitly requested or offered and accepted.

Adoption

llblab/evidence-review

$ install --global

Security Scan Results

SKILL.md

Evidence Review

Boundaries

Modes

Phase 1: Scope

Phase 2: Inspect

Architecture 90/10 Lens

Artifact Mode

Phase 3: Challenge

Phase 4: Report

Phase 5: Handoff

Example

Severity Rules

Anti-Traps

Validation Checklist

Related Skills

llblab/dev2main-release

llblab/bits-ui

llblab/re-review

llblab/brain-storm

llblab/evidence-review

$ install --global

Security Scan Results

SKILL.md

Evidence Review

Boundaries

Modes

Phase 1: Scope

Phase 2: Inspect

Architecture 90/10 Lens

Artifact Mode

Phase 3: Challenge

Phase 4: Report

Phase 5: Handoff

Example

Severity Rules

Anti-Traps

Validation Checklist

Related Skills

llblab/dev2main-release

llblab/bits-ui

llblab/re-review

llblab/brain-storm