skills/forgewright/skills/code-reviewer/SKILL.md
[production-grade internal] Reviews code for quality — architecture conformance, anti-patterns, performance issues, maintainability. Read-only analysis, never modifies code. Routed via the production-grade orchestrator.
npx skillsauth add ouakar/ubinarys-dental code-reviewerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
!cat skills/_shared/protocols/ux-protocol.md 2>/dev/null || true
!cat skills/_shared/protocols/input-validation.md 2>/dev/null || true
!cat skills/_shared/protocols/tool-efficiency.md 2>/dev/null || true
!cat skills/_shared/protocols/code-intelligence.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
Fallback (if protocols not loaded): Use notify_user with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use view_file_outline before full Read.
!cat .forgewright/settings.md 2>/dev/null || echo "No settings — using Standard"
| Mode | Behavior | |------|----------| | Express | Full review, report findings. No interaction during review. Present final report. | | Standard | Surface critical architecture drift or anti-patterns immediately. Present final report with severity distribution. | | Thorough | Show review scope and checklist before starting. Present findings per category. Ask about which quality standards matter most (performance vs maintainability vs consistency). | | Meticulous | Walk through review categories one by one. Show specific code examples for each finding. Discuss trade-offs for each recommendation. User prioritizes which findings to remediate. |
Read .production-grade.yaml at startup. Use path overrides if defined for paths.services, paths.frontend, paths.tests, paths.architecture_docs, paths.api_contracts.
Produces findings and patch suggestions only. Does NOT modify source code — remediation is handled by the orchestrator as a separate task. All output is written exclusively to .forgewright/code-reviewer/.
Inspired by Superpowers two-stage review methodology
Before reviewing code quality, verify spec compliance first. This prevents wasting review effort on code that doesn't match the requirements.
Only after spec compliance passes, proceed with the full code quality review pipeline.
Why this order matters:
Security analysis: see security-engineer findings. Code reviewer does NOT perform OWASP or security review.
This skill runs as a quality gate AFTER implementation (services/, libs/), frontend (frontend/), and testing (tests/) are complete. It is the final validation step before code is considered ready for deployment pipeline configuration.
Inputs:
docs/architecture/, api/ — ADRs, API contracts (OpenAPI/AsyncAPI), data models, sequence diagrams, architectural decisions, technology choicesservices/, libs/ — Backend services, handlers, repositories, domain models, middleware, infrastructure codefrontend/ — UI components, pages, hooks, state management, API clients, routingtests/, .forgewright/qa-engineer/test-plan.md — Test suites, coverage thresholds, test plan, fixturesAll artifacts are written to .forgewright/code-reviewer/ in the project root.
.forgewright/code-reviewer/
├── review-report.md # Full review report — executive summary + all findings
├── architecture-conformance.md # ADR compliance check — decision-by-decision audit
├── findings/
│ ├── critical.md # Findings that block deployment (data loss risks, correctness bugs)
│ ├── high.md # Findings that must be fixed before production (arch violations, major bugs)
│ ├── medium.md # Findings that should be fixed soon (code quality, maintainability)
│ └── low.md # Findings that are advisory (style, minor optimizations)
├── metrics/
│ ├── complexity.json # Cyclomatic complexity per function/module
│ ├── coverage-gaps.json # Untested code paths, missing edge case coverage
│ └── dependency-analysis.json # Dependency graph, coupling metrics, circular dependencies
└── auto-fixes/ # Suggested code patches organized by service
└── <service>/
└── <file>.patch.md # Markdown with before/after code blocks and explanation
Every finding should be assigned exactly one severity level — consistent severity grading lets the developer prioritize fixes efficiently and prevents "alert fatigue" from mis-categorized issues. Use these definitions:
| Severity | Definition | Action Required | Examples | |----------|-----------|----------------|---------| | Critical | Data loss risk or correctness bug that will cause production incidents | Must fix before any deployment | Race condition causing double charges, unencrypted PII storage, missing auth check on admin endpoint | | High | Architectural violation, significant design flaw, or reliability risk that will cause problems at scale | Must fix before production release | Violates ADR decision, synchronous call in async pipeline, missing circuit breaker on external dependency, N+1 query on high-traffic endpoint | | Medium | Code quality issue that increases maintenance cost, makes debugging harder, or indicates emerging tech debt | Should fix within current sprint | SOLID violation, duplicated business logic across services, poor error messages, missing structured logging | | Low | Style issue, minor optimization, or improvement that would make code marginally better | Fix when convenient; consider adding to backlog | Inconsistent naming convention, unused import, suboptimal but correct algorithm, missing JSDoc on public API |
Execute each phase sequentially. Every phase produces specific output files. Do NOT skip phases.
Phases 1-4 can run in parallel — each reviews a different dimension of the same codebase:
Execute sequentially: Review architecture conformance following Phase 1 checklist. Compare implementation against ADRs. Write to code-reviewer/architecture-conformance.md.
Execute sequentially: Review code quality following Phase 2 checklist (SOLID, DRY, complexity). Write findings to code-reviewer/findings/.
Execute sequentially: Review performance following Phase 3 checklist (N+1, caching, bundle size). Write findings to code-reviewer/findings/.
Execute sequentially: Review test quality following Phase 4 checklist. Cross-reference test plan. Write to code-reviewer/metrics/.
Wait for all 4 agents, then run Phase 5 (Review Report) sequentially — it compiles all findings.
Execution order:
Goal: Verify that the implementation faithfully follows the architectural decisions documented in docs/architecture/. Flag every deviation.
Inputs to read:
docs/architecture/ ADRs (every Architecture Decision Record)docs/architecture/ system architecture diagrams, service boundaries, communication patternsapi/ API contracts (OpenAPI/AsyncAPI)schemas/ data models and database designservices/, libs/ full backend source treefrontend/ full frontend source treeReview checklist:
Output: Write .forgewright/code-reviewer/architecture-conformance.md with:
docs/architecture/ and its conformance status (Conformant / Partial / Violated)Goal: Evaluate code against software engineering best practices. Identify structural issues that static analysis tools typically miss.
Inputs to read:
services/, libs/ all backend source filesfrontend/ all frontend source filesReview checklist:
SOLID Principles:
Code Structure:
6. DRY violations — Identify duplicated logic (not just duplicated strings). Business rules implemented in multiple places are high-severity findings.
7. Cyclomatic complexity — Flag functions with complexity > 10. Calculate and record in metrics/complexity.json.
8. Naming conventions — Are names consistent, intention-revealing, and following language idioms? Flag abbreviations, single-letter variables (outside loops), and misleading names.
9. Error handling — Are errors caught at the right level? Flag swallowed exceptions (empty catch blocks), generic catches (catch (e: any)), and errors that lose stack traces.
10. Logging — Is logging structured (JSON)? Are appropriate levels used (error for errors, warn for degraded, info for business events, debug for troubleshooting)? Are sensitive fields redacted?
Frontend-Specific: 11. Component size — Flag components > 200 lines. Identify components that mix data fetching, business logic, and presentation. 12. State management — Is state lifted to the appropriate level? Flag prop drilling > 3 levels. Flag global state used for local concerns. 13. Effect management — Flag useEffect with missing dependencies, effects that should be event handlers, and effects without cleanup for subscriptions/timers. 14. Accessibility — Flag interactive elements without ARIA labels, images without alt text, forms without labels, and missing keyboard navigation.
Output: Write findings to .forgewright/code-reviewer/findings/ by severity. Write complexity metrics to .forgewright/code-reviewer/metrics/complexity.json.
Goal: Identify performance bottlenecks, inefficient patterns, and missing optimizations in the codebase.
Inputs to read:
services/, libs/ all backend source files (especially data access, API handlers, middleware)frontend/ all frontend source files (especially data fetching, rendering, bundle composition)docs/architecture/ NFRs (latency targets, throughput requirements)Review checklist:
Backend:
Frontend:
9. Bundle size — Flag large third-party dependencies imported wholesale (import _ from 'lodash' instead of import get from 'lodash/get').
10. Render performance — Flag components that re-render on every parent render without memoization. Flag expensive computations in render path without useMemo.
11. Network waterfall — Flag sequential API calls that could be parallelized. Flag missing data prefetching for predictable navigation.
12. Image optimization — Flag unoptimized images, missing lazy loading, missing responsive srcsets.
13. Missing code splitting — Flag routes that bundle all pages together instead of using lazy loading.
Output: Write performance findings to .forgewright/code-reviewer/findings/ by severity. Write dependency analysis to .forgewright/code-reviewer/metrics/dependency-analysis.json.
Goal: Evaluate the test suites in tests/ for coverage quality, assertion strength, and test design.
Inputs to read:
tests/ all test files.forgewright/qa-engineer/test-plan.md traceability matrix.forgewright/qa-engineer/coverage/thresholds.jsonservices/, libs/ source files (to identify untested paths)Review checklist:
true/false instead of specific values.Output: Write test quality findings to .forgewright/code-reviewer/findings/ by severity. Write coverage gap analysis to .forgewright/code-reviewer/metrics/coverage-gaps.json.
Goal: Compile all findings into a structured, actionable review report. Generate auto-fix suggestions for issues where the fix is unambiguous.
Inputs:
Actions:
Write .forgewright/code-reviewer/review-report.md with the following sections:
Write individual findings files to .forgewright/code-reviewer/findings/:
critical.md — Findings that block deploymenthigh.md — Findings that must be fixed before productionmedium.md — Findings that should be fixed soonlow.md — Advisory findingsEach finding in these files uses the following format:
### [FINDING-ID] Short description
**Severity:** Critical | High | Medium | Low
**Category:** Architecture | Code Quality | Performance | Test Quality
**Location:** `path/to/file.ts:42`
**Description:**
What the issue is and why it matters.
**Impact:**
What happens if this is not fixed.
**Evidence:**
```code
// The problematic code
Recommendation: How to fix it, with a code example if applicable.
Generate auto-fix suggestions for findings where the fix is mechanical and unambiguous:
Write each auto-fix to .forgewright/code-reviewer/auto-fixes/<service>/<file>.patch.md with:
Compile metrics:
.forgewright/code-reviewer/metrics/complexity.json — Cyclomatic complexity per function, flagged functions with complexity > 10.forgewright/code-reviewer/metrics/coverage-gaps.json — List of untested files, untested functions, untested branches.forgewright/code-reviewer/metrics/dependency-analysis.json — Service dependency graph, coupling score per service, circular dependency detectionOutput: Write all report files, findings, metrics, and auto-fixes to .forgewright/code-reviewer/.
| # | Mistake | Why It Fails | What to Do Instead |
|---|---------|-------------|-------------------|
| 1 | Reporting linter-level issues (missing semicolons, trailing whitespace) as review findings | Wastes reviewer credibility on noise; these should be caught by automated linting in CI | Focus on structural, architectural, and logical issues that linters and formatters cannot catch |
| 2 | Flagging code without reading the ADR that justified it | The "violation" may be an intentional, documented trade-off | Always cross-reference docs/architecture/ ADRs before flagging an architectural concern |
| 3 | Marking every finding as Critical | Severity inflation makes the report useless — developers ignore it entirely | Use Critical only for data loss risks and correctness bugs. Most issues are Medium |
| 4 | Writing vague findings like "code quality could be improved" | Not actionable; developers do not know what to fix or where | Every finding must have a specific file location, a concrete description, and a recommended fix |
| 5 | Suggesting auto-fixes without verifying they compile/type-check | Broken auto-fix suggestions destroy trust in the review process | Only suggest fixes for mechanical changes where the correct fix is unambiguous. Include enough context for the fix to be applied directly |
| 6 | Reviewing generated code (migrations, protobuf stubs, OpenAPI clients) as handwritten code | Generated code has different quality standards; flagging it creates noise | Identify generated files by convention (file headers, directory names) and skip them or apply relaxed rules |
| 7 | Ignoring frontend/ entirely or applying only backend review criteria | Frontend has its own class of issues (render performance, accessibility, bundle size) that backend checklists miss | Apply frontend-specific review criteria from Phase 2 and Phase 3 to all frontend/ code |
| 8 | Not reading the test files before reviewing test quality | Cannot identify coverage gaps, assertion quality issues, or missing edge cases without reading the actual tests | Read both the source file and its corresponding test file together to identify gaps |
| 9 | Producing a review report longer than 50 pages | No one reads it. Critical findings get lost in the noise | Keep the executive summary to 1 page. Use the findings files for detail. Prioritize ruthlessly |
| 10 | Modifying files in services/, frontend/, or tests/ | The reviewer must not change source code — only document findings and suggest fixes | Write all output exclusively to .forgewright/code-reviewer/. Suggested code changes go in auto-fixes/ as patch files |
| 11 | Reporting the same root-cause issue multiple times as separate findings | Inflates finding count; developers fix the pattern once, not N times | Group related symptoms under one finding. Reference all affected locations but assign one severity and one fix |
| 12 | Skipping performance review for "simple CRUD apps" | Even simple apps have N+1 queries, missing pagination, and unbounded selects that cause outages at scale | Every project gets a performance review. Adjust depth based on traffic expectations, but never skip it |
| 13 | Not providing impact statements for findings | Developers cannot prioritize fixes without understanding consequences | Every finding must explain what happens if the issue is not fixed: data loss, outage, slow degradation |
| 14 | Reviewing code in isolation without understanding the business context | Flags technically correct code as problematic because the business rule was not understood | Read the BRD/PRD acceptance criteria before starting the review to understand why the code exists |
| 15 | Performing OWASP or security vulnerability analysis | Security review is the sole responsibility of the security-engineer skill | Defer all security findings to the security-engineer. Focus on architecture, code quality, performance, and test quality |
Goal: Evaluate git workflow practices — branching strategy, commit quality, PR hygiene, and CI/CD integration.
Review checklist:
feat:, fix:, chore:, docs:).Output: Include git workflow findings in review-report.md under a dedicated "Git Workflow" category.
Before marking the skill as complete, verify:
architecture-conformance.md audits every ADR in docs/architecture/ with a conformance status.forgewright/qa-engineer/test-plan.md traceability matrix for coverage gapsreview-report.md has an executive summary with total finding counts and overall assessmentcritical.md, high.md, medium.md, and low.mdmetrics/complexity.json has per-function cyclomatic complexity scoresmetrics/coverage-gaps.json identifies untested files, functions, and branchesmetrics/dependency-analysis.json maps service dependencies and flags circular dependenciesdevelopment
[production-grade internal] Builds AR/VR/MR applications — spatial UI/UX, hand tracking, gaze input, controller interaction, comfort optimization, and cross-platform XR (Quest, Vision Pro, WebXR, PCVR). Routed via the production-grade orchestrator (Game Build mode).
development
[production-grade internal] Creates, edits, analyzes, and validates Excel spreadsheet files (.xlsx, .csv, .tsv). Trigger when the primary deliverable is a spreadsheet — creating financial models, data reports, dashboards, cleaning messy tabular data, adding formulas/formatting, or converting between tabular formats. Also trigger when user references a spreadsheet file by name or path and wants it modified or analyzed. DO NOT trigger when the deliverable is a web page, database pipeline, Google Sheets API integration, or standalone Python script — even if tabular data is involved. Routed via the production-grade orchestrator (Feature/Custom mode).
development
[production-grade internal] Security-first web scraping and data extraction — crawl4ai integration with URL validation, output sanitization, SSRF defense, CSS-first extraction, and browser isolation. Library-only mode (no Docker API). Routed via the production-grade orchestrator (AI Build/Research/Feature mode).
testing
[production-grade internal] Conducts user research — usability testing, user interviews, persona creation, journey mapping, heuristic evaluation, and data-driven design recommendations. Routed via the production-grade orchestrator (Design mode).