skills/forgewright/skills/solution-architect/SKILL.md
[production-grade internal] Designs system architecture when you need to decide tech stack, API contracts, data models, or infrastructure shape. Routed via the production-grade orchestrator.
npx skillsauth add ouakar/web-hosting-ubinarys-dental solution-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
!cat skills/_shared/protocols/ux-protocol.md 2>/dev/null || true
!cat skills/_shared/protocols/input-validation.md 2>/dev/null || true
!cat skills/_shared/protocols/tool-efficiency.md 2>/dev/null || true
!cat skills/_shared/protocols/code-intelligence.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
!cat .forgewright/codebase-context.md 2>/dev/null || true
Fallback (if protocols not loaded): Use notify_user with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use view_file_outline before full Read.
If .forgewright/codebase-context.md exists and mode is brownfield:
!cat .forgewright/settings.md 2>/dev/null || echo "No settings — using Standard"
Read .forgewright/settings.md at startup. Adapt discovery depth:
| Mode | Discovery Approach | |------|-------------------| | Express | Auto-derive from BRD. Ask only if critical info missing. Conservative defaults. | | Standard | 5-7 questions across 2 rounds. Scale sizing + constraints. Fitness-derived architecture. | | Thorough | 12-15 questions across 4 structured rounds. Full capacity planning. Trade-off analysis. Architecture alternatives. | | Meticulous | Everything in Thorough + individual ADR approval, tech stack walkthrough, capacity modeling with cost estimates. |
Full architecture pipeline: from business requirements to a scaffolded, production-ready codebase. The architecture is DERIVED from project constraints (scale, team, budget, compliance) — not picked from a template. There is no one-size-fits-all architecture.
Generates architecture deliverables at the project root (api/, schemas/, docs/architecture/, project scaffold) with workspace artifacts in .forgewright/solution-architect/.
Read .production-grade.yaml at startup. Use these overrides if defined:
paths.api_contracts — default: api/paths.adrs — default: docs/architecture/architecture-decision-records/paths.architecture_docs — default: docs/architecture/paths.erd — default: schemas/erd.mdpaths.migrations — default: schemas/migrations/paths.tech_stack — default: docs/architecture/tech-stack.mdDeliverables go to the project root (api/, schemas/, docs/architecture/). Workspace artifacts go to .forgewright/solution-architect/.
digraph sa {
rankdir=TB;
"Triggered" [shape=doublecircle];
"Phase 1: Discovery" [shape=box];
"Phase 2: Architecture Design" [shape=box];
"Phase 3: Tech Stack" [shape=box];
"Phase 4: API Contracts" [shape=box];
"Phase 5: Data Models" [shape=box];
"Phase 6: Scaffold" [shape=box];
"User Review" [shape=diamond];
"Suite Complete" [shape=doublecircle];
"Triggered" -> "Phase 1: Discovery";
"Phase 1: Discovery" -> "Phase 2: Architecture Design";
"Phase 2: Architecture Design" -> "User Review";
"User Review" -> "Phase 2: Architecture Design" [label="revise"];
"User Review" -> "Phase 3: Tech Stack" [label="approved"];
"Phase 3: Tech Stack" -> "Phase 4: API Contracts";
"Phase 4: API Contracts" -> "Phase 5: Data Models";
"Phase 5: Data Models" -> "Phase 6: Scaffold";
"Phase 6: Scaffold" -> "Suite Complete";
}
The architecture must fit the project's actual constraints. This phase gathers those constraints — at a depth matching the engagement mode.
Before asking ANY questions, read in parallel:
.forgewright/polymath/handoff/context-package.md — may contain scale, constraints, decisions.forgewright/product-manager/BRD/brd.md — user stories, acceptance criteria, business rules.forgewright/codebase-context.md — brownfield contextReduce questions to cover ONLY gaps not addressed in existing context. If polymath or PM already established scale targets, do not re-ask.
Adapt depth to engagement mode. Use notify_user with structured options (never open-ended).
Skip interview entirely. Auto-derive from BRD signals:
✓ Express mode — auto-deriving architecture from BRDIf a critical constraint is completely missing (e.g., BRD mentions "enterprise customers" but no scale number), ask ONE clarifying question maximum.
Round 1 — Scale & Users:
notify_user with markdown options:
"question": "I need to understand your scale to design the right architecture.\n\n"
"These 3 questions determine whether you need a simple monolith or a distributed system.",
"header": "Scale & Users",
"options": [
{"label": "Small scale — < 1K users, MVP or internal tool", "description": "Simple architecture, minimal infra, fast to build"},
{"label": "Medium scale — 1K-100K users, startup/growth", "description": "Needs to scale but not from day 1. Service extraction plan."},
{"label": "Large scale — 100K+ users, high availability", "description": "Distributed architecture, multi-region, serious infrastructure"},
{"label": "Not sure — help me estimate", "description": "I'll ask a few questions to figure this out"},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
Follow up with:
notify_user with markdown options:
"question": "What's the primary data pattern?",
"header": "Data Characteristics",
"options": [
{"label": "Read-heavy — dashboards, content, catalogs", "description": "Cache-first, read replicas, CDN"},
{"label": "Write-heavy — logging, IoT, transactions", "description": "Queue-based, event sourcing, eventual consistency"},
{"label": "Balanced — typical CRUD SaaS", "description": "Standard request/response, relational DB"},
{"label": "Real-time — chat, collaboration, live updates", "description": "WebSocket/SSE, pub/sub, in-memory state"},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
Round 2 — Constraints:
notify_user with markdown options:
"question": "Who will build and maintain this system?",
"header": "Team & Budget",
"options": [
{"label": "Solo or pair — keep it simple", "description": "Monolith, managed services, minimal ops"},
{"label": "Small team (3-5) — some specialization", "description": "Can handle moderate complexity"},
{"label": "Medium team (6-15) — dedicated roles", "description": "Can support microservices if needed"},
{"label": "Large team (15+) — multiple squads", "description": "Service ownership model, independent deploys"},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
notify_user with markdown options:
"question": "Any hard constraints?",
"header": "Compliance & Deployment",
"options": [
{"label": "No special requirements", "description": "Standard web app, no regulatory burden"},
{"label": "GDPR — EU user data", "description": "Data residency, right to deletion, consent management"},
{"label": "SOC2 / ISO 27001 — enterprise customers", "description": "Audit trails, access controls, security policies"},
{"label": "HIPAA — health data", "description": "BAA required, encryption everywhere, dedicated tenancy"},
{"label": "PCI DSS — payment data", "description": "Tokenization, network segmentation, quarterly scans"},
{"label": "Multiple / Other (specify)", "description": "Select to describe your requirements"},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
Everything in Standard, PLUS two additional rounds:
Round 3 — Technical Requirements:
notify_user with markdown options:
"question": "Let's get precise about performance and availability requirements.",
"header": "Performance & Availability",
"options": [
{"label": "Standard SaaS — 99.9% uptime, < 500ms API response", "description": "8.7 hours downtime/year. Typical for most web apps."},
{"label": "High availability — 99.99% uptime, < 200ms response", "description": "52 minutes downtime/year. Requires multi-AZ, automated failover."},
{"label": "Mission critical — 99.999% uptime, < 100ms response", "description": "5 minutes downtime/year. Requires multi-region, chaos engineering."},
{"label": "Internal tool — best effort, availability not critical", "description": "Simplest architecture, no redundancy required."},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
notify_user with markdown options:
"question": "Where are your users?",
"header": "Geographic Distribution",
"options": [
{"label": "Single country", "description": "One region deployment, simplest"},
{"label": "Single continent", "description": "One region with CDN for static assets"},
{"label": "Global — users everywhere", "description": "Multi-region, edge CDN, data replication strategy"},
{"label": "Not sure yet", "description": "I'll design for single-region with a multi-region migration path"},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
notify_user with markdown options:
"question": "Expected peak concurrent users (CCU)?",
"header": "Peak Load",
"options": [
{"label": "< 100 CCU", "description": "Single instance can handle this"},
{"label": "100-1K CCU", "description": "Horizontal scaling, load balancer needed"},
{"label": "1K-10K CCU", "description": "Auto-scaling, connection pooling, caching layer"},
{"label": "10K+ CCU", "description": "Distributed architecture, queue-buffered writes, edge computing"},
{"label": "Help me estimate", "description": "Typically 5-10% of total users are concurrent at peak"},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
Round 4 — Strategic:
notify_user with markdown options:
"question": "How do you see this system evolving?",
"header": "Growth & Extensibility",
"options": [
{"label": "Steady linear growth", "description": "Predictable scaling, plan for 10x over 2 years"},
{"label": "Hockey stick — potential viral growth", "description": "Must handle 100x spikes, auto-scaling critical"},
{"label": "Seasonal — predictable traffic spikes", "description": "Scale-to-zero between peaks, burst capacity"},
{"label": "Platform play — third parties will build on this", "description": "Public API, webhooks, rate limiting, developer portal"},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
notify_user with markdown options:
"question": "Monthly infrastructure budget ceiling?",
"header": "Budget",
"options": [
{"label": "Minimal — under $500/mo", "description": "Serverless, managed DBs, free tiers. Optimize for cost."},
{"label": "Moderate — $500 to $5K/mo", "description": "Managed K8s, dedicated DBs, standard monitoring."},
{"label": "Significant — $5K+/mo", "description": "Dedicated infra, custom observability, multi-region."},
{"label": "Not a constraint", "description": "Optimize for performance and reliability, not cost."},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
notify_user with markdown options:
"question": "Cloud strategy?",
"header": "Vendor & Portability",
"options": [
{"label": "All-in on AWS (cheapest, most managed services)", "description": "Use AWS-native services. Fast to build, harder to migrate."},
{"label": "All-in on GCP (best for data/ML workloads)", "description": "Use GCP-native services. Strong managed K8s."},
{"label": "All-in on Azure (best for enterprise/Microsoft shops)", "description": "Use Azure-native services. AD integration."},
{"label": "Cloud-agnostic (most portable, higher upfront cost)", "description": "Terraform abstractions, avoid proprietary services."},
{"label": "Not sure — recommend based on my project", "description": "I'll recommend based on your requirements"},
{"label": "Chat about this", "description": "Free-form input"}
],
"multiSelect": false
}])
Everything in Thorough, PLUS:
After gathering inputs, DERIVE the architecture from constraints. The architecture is a FUNCTION of the inputs — not a template.
Architecture Pattern:
| Scale | Team | -> Pattern | |-------|------|------------| | < 1K users | 1-3 people | Monolith or Modular Monolith. Single deploy, single DB. Docker Compose for local dev. | | 1K-100K users | 3-15 people | Modular Monolith with documented service boundaries. Extract services ONLY when team or scale demands. Include service extraction plan in ADR. | | 100K+ users | 15+ people | Microservices. Service mesh, distributed data, event-driven communication. Each team owns 1-3 services. | | Any scale | Solo developer | Whatever is simplest. Serverless or monolith. Managed everything. Minimize operational burden. |
Infrastructure Sizing:
| Budget | -> Infrastructure Strategy | |--------|---------------------------| | < $500/mo | Serverless-first (Lambda/Cloud Run), managed DB (RDS free tier/PlanetScale), no K8s, CloudWatch/basic monitoring | | $500-5K/mo | Managed K8s (EKS/GKE) or ECS, managed DB with replicas, Redis cache, standard monitoring (Grafana/Datadog) | | > $5K/mo | Dedicated infrastructure, self-hosted options viable, custom observability stack, multi-region possible |
Data Architecture:
| Data Pattern | -> Strategy | |-------------|-------------| | Read-heavy (>80% reads) | Cache-first (Redis), read replicas, CDN for static, materialized views | | Write-heavy | Event sourcing or CQRS, queue-buffered writes (SQS/Kafka), eventual consistency | | Real-time | WebSocket/SSE infrastructure, pub/sub (Redis Pub/Sub or Kafka), in-memory state | | Balanced CRUD | Standard relational DB, connection pooling, query optimization |
Compliance Impact:
| Requirement | -> Architecture Changes | |------------|------------------------| | GDPR | Data residency controls, right-to-deletion pipeline, consent management, PII encryption | | SOC2 / ISO 27001 | Audit trail on all mutations, RBAC, centralized logging, access review automation | | HIPAA | Dedicated tenancy, encryption at rest + transit, BAA with all vendors, audit logging, no shared infrastructure | | PCI DSS | Tokenize card data (use Stripe/Adyen), network segmentation, quarterly vulnerability scans, no raw card storage |
Availability Impact:
| SLA | -> Architecture Changes | |-----|------------------------| | 99% (3.7 days/yr) | Single instance OK, basic health checks | | 99.9% (8.7 hrs/yr) | Multi-AZ, load balancer, automated restarts, basic monitoring | | 99.99% (52 min/yr) | Multi-AZ with automated failover, zero-downtime deploys, chaos engineering, comprehensive monitoring | | 99.999% (5 min/yr) | Multi-region active-active, global load balancing, circuit breakers everywhere, dedicated SRE |
Growth Model Impact:
| Growth | -> Architecture Changes | |--------|------------------------| | Linear/steady | Plan for 10x. Vertical scaling first, horizontal when needed. | | Hockey stick | Horizontal scaling from day 1. Stateless services. Auto-scaling groups. Queue-buffered writes. Feature flags for load shedding. | | Seasonal | Scale-to-zero capable (serverless/spot instances). Pre-warming automation. Burst capacity planning. | | Platform/API | API gateway, rate limiting, webhook system, developer portal, backwards-compatible versioning from day 1. |
Present the derived architecture: "Based on your constraints [summary], here's what fits and why..."
For Thorough/Meticulous modes, also present 1-2 alternative architectures:
Each alternative includes a trade-off summary: build time, operational complexity, monthly cost estimate, scaling ceiling, team fit.
Generate architecture documents in docs/architecture/ (or paths.architecture_docs from config):
One ADR per major decision using this template:
# ADR-NNN: [Title]
**Status:** Accepted | Superseded | Deprecated
**Context:** Why this decision is needed
**Decision:** What we chose and why
**Consequences:** Trade-offs accepted
**Alternatives Considered:** What we rejected and why
Required ADRs:
Create Mermaid diagrams in markdown files:
Apply and document these production patterns:
Present architecture to user via notify_user for approval before proceeding.
Generate docs/architecture/tech-stack.md (or paths.tech_stack from config):
| Layer | Selection | Rationale | |-------|-----------|-----------| | Language(s) | Based on team/requirements | Performance, ecosystem, hiring | | Framework | Based on language choice | Maturity, community, features | | Database(s) | Based on data patterns | ACID vs BASE, query patterns | | Cache | Redis/Memcached | Access patterns, consistency needs | | Message Broker | Kafka/RabbitMQ/SQS/Pub-Sub | Throughput, ordering, durability | | API Gateway | Kong/AWS API GW/GCP API GW | Rate limiting, auth, routing | | Auth | Keycloak/Auth0/Cognito/Firebase Auth | SSO, MFA, compliance | | Search | Elasticsearch/OpenSearch | Full-text, analytics, scale | | Object Storage | S3/GCS/Azure Blob | Cost, lifecycle, CDN integration | | CDN | CloudFront/Cloud CDN/Azure CDN | Edge locations, cost |
Selection criteria: production maturity, multi-cloud portability, team expertise, cost at scale.
Generate API contracts at api/ (or paths.api_contracts from config) at the project root:
Standards enforced:
{code, message, details, trace_id})cursor-based for production, offset only for admin)X-RateLimit-*)X-Request-ID)Generate data models at schemas/ at the project root:
paths.erd from config, default schemas/erd.md)paths.migrations from config, default schemas/migrations/)Standards enforced:
deleted_at timestampsScaffold the project root structure directly. The scaffold IS the project root — there is no separate scaffold directory.
project root/
├── services/
│ └── <service-name>/
│ ├── src/
│ ├── tests/
│ ├── Dockerfile
│ ├── Makefile
│ └── README.md
├── libs/
│ └── shared/ # Shared types, utils, clients
├── docker-compose.yml # Local dev environment
├── Makefile # Root-level commands
└── README.md # Getting started guide
Each service includes:
/healthz, /readyz)docs/architecture/
│ ├── architecture-decision-records/
│ │ ├── ADR-001-architecture-pattern.md
│ │ └── ...
│ ├── system-diagrams/
│ │ ├── c4-context.md
│ │ ├── c4-container.md
│ │ └── sequence-*.md
│ ├── tech-stack.md
│ └── design-principles.md
api/
│ ├── openapi/
│ │ └── *.yaml
│ ├── grpc/
│ │ └── *.proto
│ └── asyncapi/
│ └── *.yaml
schemas/
│ ├── erd.md
│ ├── migrations/
│ │ └── *.sql
│ └── data-flow.md
services/ # Scaffolded service directories
│ └── <service-name>/
│ ├── src/
│ ├── tests/
│ ├── Dockerfile
│ └── Makefile
libs/shared/
docker-compose.yml
Makefile
README.md
.forgewright/solution-architect/).forgewright/solution-architect/
├── working-notes.md
└── analysis/
└── *.md
| Mistake | Fix | |---------|-----| | Picking architecture before knowing constraints | Run the fitness function FIRST. Scale, team, budget determine the pattern. | | Microservices for a 2-person team | Start modular monolith, extract services when team/scale demands | | Kubernetes for < 1K users | Docker Compose or serverless. K8s operational cost > benefit at small scale. | | Same architecture for $200/mo and $20K/mo | Budget changes everything — serverless vs dedicated, managed vs self-hosted | | Shared database across services | Each service owns its data, communicate via APIs/events | | No API versioning strategy | Decide v1 URL path versioning from day one | | Skipping ADRs | Future-you needs to know WHY, not just WHAT | | Over-engineering auth | Use managed auth (Auth0/Cognito) unless compliance requires self-hosted | | Ignoring multi-tenancy from start | Retrofitting tenant isolation is 10x harder than designing it in | | Skipping scale interview | "Build a SaaS" means nothing without scale context. 100 users vs 10M users is a completely different system. | | Ignoring engagement mode | Express: auto-derive. Standard: 2 rounds. Thorough: 4 rounds. Meticulous: full walkthrough. Read settings.md. | | Designing for 10M users when there are 100 | Design for current + 10x. Not 1000x. Over-engineering kills velocity. | | Not presenting alternatives in Thorough/Meticulous | Users at those engagement levels want to understand trade-offs, not just see one answer. |
development
[production-grade internal] Builds AR/VR/MR applications — spatial UI/UX, hand tracking, gaze input, controller interaction, comfort optimization, and cross-platform XR (Quest, Vision Pro, WebXR, PCVR). Routed via the production-grade orchestrator (Game Build mode).
development
[production-grade internal] Creates, edits, analyzes, and validates Excel spreadsheet files (.xlsx, .csv, .tsv). Trigger when the primary deliverable is a spreadsheet — creating financial models, data reports, dashboards, cleaning messy tabular data, adding formulas/formatting, or converting between tabular formats. Also trigger when user references a spreadsheet file by name or path and wants it modified or analyzed. DO NOT trigger when the deliverable is a web page, database pipeline, Google Sheets API integration, or standalone Python script — even if tabular data is involved. Routed via the production-grade orchestrator (Feature/Custom mode).
development
[production-grade internal] Security-first web scraping and data extraction — crawl4ai integration with URL validation, output sanitization, SSRF defense, CSS-first extraction, and browser isolation. Library-only mode (no Docker API). Routed via the production-grade orchestrator (AI Build/Research/Feature mode).
testing
[production-grade internal] Conducts user research — usability testing, user interviews, persona creation, journey mapping, heuristic evaluation, and data-driven design recommendations. Routed via the production-grade orchestrator (Design mode).