skills/architecture-reviewer/SKILL.md
Architecture reviews across 7 dimensions (structural, scalability, enterprise readiness, performance, security, ops, data) with scored reports. Triggers on: "review architecture", "critique design", "audit system", "assess scalability", "enterprise readiness", "technical due diligence". NOT for diagrams, use architecture-diagram.
npx skillsauth add mathews-tom/armory architecture-reviewerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic, framework-driven architecture review skill. Acts as a senior staff/principal engineer performing a thorough architecture critique. Not a rubber-stamp — the skill is opinionated, identifies real risks, and challenges assumptions. Every finding is tied to a concrete impact and a concrete recommendation.
The review proceeds in 4 phases:
These constraints are NON-NEGOTIABLE. Memorize before starting any review.
SCORE SCALE: 1-5 only (NOT 1-10, NOT percentages)
Half-scores (3.5) permitted with justification
SEVERITY LABELS: [S1] Critical — System will fail or is exploitable
[S2] High — Significant risk under realistic conditions
[S3] Medium — Design weakness limiting growth
[S4] Low — Suboptimal but manageable
[S5] Info — Best practice suggestion (also used for strengths)
DIMENSION WEIGHTS:
Structural Integrity: 20% | Performance: 17%
Scalability: 18% | Enterprise Readiness: 15%
Security: 18% | Operational Excellence: 7%
| Data Architecture: 5%
GRADE BOUNDARIES:
A = 90-100% | B = 80-89% | C = 70-79% | D = 60-69% | F = <60%
FORMULA: Overall% = (Σ dimension_score × weight) / 5 × 100
Template compliance is mandatory. See Phase 4 checklist before finalizing any report.
Before classifying the input, check for repo-local agent context when reviewing a codebase:
docs/agents/domain.md for CONTEXT.md, CONTEXT-MAP.md, and ADR lookup rulesCONTEXT.md files for domain vocabularydocs/adr/ and context-local ADR directories for accepted architecture decisionsUse glossary terms in findings and recommendations. Treat ADRs as constraints unless the observed friction is severe enough to justify reopening the decision. If context files are absent, continue with inferred vocabulary.
Determine the review mode from what the user provides:
Mode A — Codebase Review: User provides a directory path, repository, or uploaded code files.
scripts/scan_codebase.sh <path> for structural overview.Mode B — Document Review: User provides architecture documents, design specs, RFCs, diagrams, or verbal system descriptions. No codebase available.
Mode C — Hybrid: User provides both code and documents.
If Mode A or C (codebase available): Run the scan script to get a structural fingerprint:
bash scripts/scan_codebase.sh <codebase_path>
Review the output to understand tech stack, service boundaries, infrastructure patterns, and key configuration files before proceeding.
If Mode B or C (documents available): Read all provided documents. Extract:
Always ask clarifying questions before starting the analysis. Tailor questions based on what is already known from the input, but always cover these areas:
System Context:
Scale & Performance Expectations:
Deployment & Operations:
Compliance & Security:
Scope & Focus:
Adapt the questions — skip what's already answered by the input, and add domain-specific questions based on what you see. Keep questions focused and avoid overwhelming the user.
Wait for the user's responses before proceeding to Phase 2.
Evaluate the architecture across 7 weighted dimensions. For each dimension:
references/scoring-rubric.md| # | Dimension | Weight | Reference File |
| --- | ---------------------------------------- | ------ | -------------------------------------- |
| 1 | Structural Integrity & Design Principles | 20% | references/structural-integrity.md |
| 2 | Scalability | 18% | references/scalability.md |
| 3 | Enterprise Readiness | 15% | references/enterprise-readiness.md |
| 4 | Performance | 17% | references/performance.md |
| 5 | Security | 18% | references/security.md |
| 6 | Operational Excellence | 7% | references/operational-excellence.md |
| 7 | Data Architecture | 5% | references/data-architecture.md |
Progressive loading: Read each reference file only when analyzing that dimension. Do not load all references at once.
When analyzing Structural Integrity, also consult references/deep-module-analysis.md if
the review involves module boundaries, testability, service decomposition, or refactoring
recommendations.
Mode-specific guidance:
references/codebase-signals.md for what files and
patterns to inspect per dimension.references/document-review-guide.md for completeness
checklists and common gaps.| Level | Label | Meaning | | ----- | ----------------- | ------------------------------------------------------------------------- | | S1 | Critical | System will fail in production or has an active exploitable vulnerability | | S2 | High | Significant risk that will cause problems under realistic conditions | | S3 | Medium | Design weakness that limits growth or creates tech debt | | S4 | Low | Suboptimal choice with manageable impact | | S5 | Informational | Observation, best practice suggestion, or note for awareness |
The review is architecture-pattern-agnostic. Do not assume any pattern is inherently superior. Instead, evaluate whether the current or proposed pattern fits the system's requirements.
When the evidence suggests a different architecture pattern would better serve the system's needs (e.g., a distributed monolith that should be either a true monolith or properly decomposed microservices), include this as a finding with:
After completing all 7 dimensions, perform synthesis:
Multi-Dimension Findings — Identify issues that span dimensions (e.g., missing cache is both a performance AND scalability issue). Consolidate duplicates, note the cross-cutting nature.
Conflicting Decisions — Detect contradictions (e.g., strong consistency claimed alongside horizontal scalability, or microservices chosen with a shared database).
Architectural Coherence — Do the parts fit together into a unified whole? Is there a clear, consistent architectural vision, or is it an accidental architecture?
Requirements Alignment — Does this architecture actually solve the stated problem at the stated scale? Is it over-engineered or under-engineered for the requirements?
Architecture Pattern Fitness — Based on the full analysis, is the chosen (or emergent) architecture pattern the right one? If not, what would be better and why?
Severity Reconciliation — Review findings that appear in multiple dimensions or combine to create compound risks. When cross-cutting analysis reveals that multiple issues together are more severe than individually assessed:
Systemic Risk — Identify the single biggest risk. If one thing will sink this system, what is it? The systemic risk severity should reflect the reconciled assessment from step 6, which may be higher than any individual finding.
references/scoring-rubric.mdOverall = Σ(dimension_score × weight) / 5 × 100
Use assets/report-template.md as the skeleton. Fill in all sections:
Before finalizing the report, verify ALL of the following. Non-compliance invalidates the review.
Scoring Format Compliance:
Severity Label Compliance:
Arithmetic Verification (from v1.1):
Report Structure Compliance:
Content Quality Gates:
If any checkbox fails: Fix the issue before delivering the report. Do not proceed with a non-compliant report.
Output the completed report as a markdown file.
Apply these rules to ensure fair, useful reviews:
Stage-aware: A greenfield design should not be penalized for missing implementation details. Evaluate plans, not missing code. Conversely, a mature production system should be held to a higher standard.
Scale-aware: A solo-dev side project doesn't need multi-region active-active HA. Scale enterprise-readiness expectations to the stated requirements and team size.
"Not applicable" vs "Missing": If the system is a batch analytics pipeline, P99 latency targets are irrelevant — mark as N/A, don't score as zero. If the system is a user-facing API and P99 latency is unaddressed, that's a finding.
Acknowledge strengths: Highlight what's done well. Architecture reviews that are 100% negative are demoralizing and less actionable. Lead with genuine strengths.
Specificity over generality: Every recommendation must be actionable. "Add caching" is insufficient. Specify what to cache, with what strategy, what TTL, and why.
Language and framework agnostic: Evaluate architectural decisions, not language choices. A well-architected PHP system scores higher than a poorly-architected Rust system.
Honest about unknowns: If the input doesn't provide enough information to evaluate a sub-criterion, say so explicitly. Don't guess. Flag it as requiring more information.
| Rationalization | Reality | |---|---| | "It works in production already" | Working today doesn't mean it scales, maintains, or survives team turnover — architecture debt compounds silently | | "We'll refactor when it becomes a problem" | By then the cost is 10x higher — refactoring under load with accumulated dependencies is surgical, not routine | | "The framework handles that" | Frameworks provide defaults, not architecture — you're still responsible for boundaries, error propagation, and data flow | | "It's an internal service, standards don't apply" | Internal services become external faster than you expect — technical debt migrates across boundaries | | "Performance is fine for our current scale" | Architecture reviews evaluate the next 10x, not the current state — O(n^2) at 1k rows is invisible at 100k rows | | "We don't have time for a full review" | Partial reviews create false confidence — better to review fewer dimensions thoroughly than all dimensions superficially |
testing
Create, review, and restyle data visualizations using Edward Tufte principles: high data-ink ratio, direct labels, range-frame axes, small multiples, accessible color, responsive charts, and honest comparisons. Triggers on: "create a chart", "style this chart", "review this graph", "Tufte chart", "data visualization", "Recharts", "Plotly", "matplotlib", "Chart.js", "ECharts", "D3". Use when generating or critiquing charts, dashboards, sparklines, and data tables.
testing
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.