skills/forgewright/skills/devops/SKILL.md
[production-grade internal] Sets up deployment and infrastructure — Docker, CI/CD pipelines, cloud provisioning, environment configuration. Routed via the production-grade orchestrator.
npx skillsauth add ouakar/ubinarys-dental devopsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
!cat skills/_shared/protocols/ux-protocol.md 2>/dev/null || true
!cat skills/_shared/protocols/input-validation.md 2>/dev/null || true
!cat skills/_shared/protocols/tool-efficiency.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
!cat .forgewright/codebase-context.md 2>/dev/null || true
Fallback (if protocols not loaded): Use notify_user with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use view_file_outline before full Read.
!cat .forgewright/settings.md 2>/dev/null || echo "No settings — using Standard"
| Mode | Behavior |
|------|----------|
| Express | NON-TECHNICAL USER (Autonomous): Zero-config. Default to Vercel (Frontend) and Railway (Backend/DB) for instant PaaS deployment. Auto-generate vercel.json or railway.toml. DO NOT ask for infra choices. |
| Standard | Surface 1-2 critical decisions — container registry choice, CI provider (if not specified in architecture), monitoring stack. |
| Thorough | Surface all major decisions. Show Dockerfile strategy, CI pipeline design, monitoring architecture before implementing. Ask about deployment strategy (blue-green, canary, rolling). |
| Meticulous | Surface every decision. Walk through each Terraform module. Review CI pipeline stages. User approves monitoring alert thresholds. |
If .forgewright/codebase-context.md exists and mode is brownfield:
Full DevOps pipeline generator: from infrastructure design to production-ready deployment with monitoring and security. Generates infrastructure and deployment artifacts at the project root (infrastructure/, .github/workflows/, Dockerfiles) with planning notes in .forgewright/devops/.
Zero-Touch Deployments (Non-Tech Mode): If running for a non-technical user (Express Mode), bypass heavy infrastructure (Terraform/K8s) immediately. Generate direct Vercel/Railway configurations and GitHub Actions auto-deploy workflows. Let the PaaS handle the heavy lifting.
Read .production-grade.yaml at startup. Use these overrides if defined:
paths.terraform — default: infrastructure/terraform/paths.kubernetes — default: infrastructure/kubernetes/paths.ci_cd — default: .github/workflows/paths.monitoring — default: infrastructure/monitoring/After Phase 1 (Assessment), Phases 2-4 and Phases 5-6 can run as two parallel groups:
Group 1 (infrastructure artifacts — independent):
Execute sequentially: Generate Terraform IaC following Phase 2. Write to infrastructure/terraform/.
Execute sequentially: Generate CI/CD pipelines following Phase 3. Write to .github/workflows/ and scripts/.
Execute sequentially: Generate container orchestration following Phase 4. Write Dockerfiles and K8s manifests.
Group 2 (after Group 1 — needs infrastructure context):
Execute sequentially: Generate monitoring + observability following Phase 5. Write to infrastructure/monitoring/.
Execute sequentially: Generate security infrastructure following Phase 6. Write to infrastructure/security/.
Execution order:
digraph devops {
rankdir=TB;
"Triggered" [shape=doublecircle];
"Phase 1: Assessment" [shape=box];
"Phase 2: IaC" [shape=box];
"Phase 3: CI/CD" [shape=box];
"Phase 4: Containers" [shape=box];
"Phase 5: Monitoring" [shape=box];
"Phase 6: Security" [shape=box];
"User Review" [shape=diamond];
"Suite Complete" [shape=doublecircle];
"Triggered" -> "Phase 1: Assessment";
"Phase 1: Assessment" -> "Phase 2: IaC";
"Phase 2: IaC" -> "User Review";
"User Review" -> "Phase 2: IaC" [label="revise"];
"User Review" -> "Phase 3: CI/CD" [label="approved"];
"Phase 3: CI/CD" -> "Phase 4: Containers";
"Phase 4: Containers" -> "Phase 5: Monitoring";
"Phase 5: Monitoring" -> "Phase 6: Security";
"Phase 6: Security" -> "Suite Complete";
}
Use notify_user to gather (batch into 2-3 calls max):
Generate infrastructure/terraform/ (or paths.terraform from config):
terraform/
├── modules/
│ ├── networking/ # VPC, subnets, security groups, NAT
│ ├── compute/ # ECS/EKS/GKE/AKS clusters
│ ├── database/ # RDS/Cloud SQL/Azure SQL, Redis
│ ├── messaging/ # SQS/Pub-Sub/Service Bus
│ ├── storage/ # S3/GCS/Blob, CDN
│ ├── monitoring/ # CloudWatch/Cloud Monitoring/Azure Monitor
│ ├── security/ # IAM, KMS, WAF, secrets
│ └── dns/ # Route53/Cloud DNS/Azure DNS
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ ├── staging/
│ └── prod/
├── global/ # Shared resources (IAM, DNS zones)
└── README.md
validation blocks on all input variablesenvironment, service, team, cost-center, managed-by=terraformGenerate provider blocks and modules for each target cloud:
| Resource | AWS | GCP | Azure | |----------|-----|-----|-------| | Compute | ECS Fargate / EKS | Cloud Run / GKE | Container Apps / AKS | | Database | RDS Aurora | Cloud SQL | Azure SQL | | Cache | ElastiCache Redis | Memorystore | Azure Cache Redis | | Queue | SQS + SNS | Pub/Sub | Service Bus | | Storage | S3 + CloudFront | GCS + Cloud CDN | Blob + Front Door | | Secrets | Secrets Manager | Secret Manager | Key Vault | | DNS | Route 53 | Cloud DNS | Azure DNS | | WAF | AWS WAF | Cloud Armor | Azure WAF |
Present IaC design to user for approval before proceeding.
Generate CI/CD pipelines at .github/workflows/ (or paths.ci_cd from config) and scripts/:
.github/workflows/
├── ci.yml # Build, test, lint, security scan
├── cd-staging.yml # Deploy to staging on merge to main
├── cd-production.yml # Deploy to prod on release tag
├── pr-checks.yml # PR validation (tests, lint, preview)
└── scheduled.yml # Nightly builds, dependency updates
.gitlab-ci.yml # (if requested, at project root)
scripts/
├── build.sh
├── deploy.sh
├── rollback.sh
└── smoke-test.sh
Generate configs for the selected strategy:
Generate git workflow configuration and documentation to docs/contributing/ and .github/:
Choose based on team size and release cadence:
| Strategy | Best For | How It Works |
|----------|----------|-------------|
| Trunk-Based (Recommended) | Teams with CI/CD, continuous delivery | Short-lived feature branches (< 1 day), merge to main, deploy from main |
| GitHub Flow | Small teams, simple releases | Feature branches from main, PR review, merge to main, auto-deploy |
| Gitflow | Scheduled releases, multiple version support | develop → release/* → main, hotfix branches, version tags |
Generate .github/branch-protection.md and recommend settings:
main: Require PR review (1+ approvals), require CI pass, require up-to-date branch, no force push, no deletiondevelop (if Gitflow): Require CI pass, allow merge only via PRrelease/*: Require 2+ approvals, require all CI stages (including performance tests)Generate .github/workflows/commit-lint.yml:
# Enforce Conventional Commits format: type(scope): description
# Types: feat, fix, docs, chore, refactor, test, perf, ci, build, style
# Example: feat(auth): add OAuth2 login flow
vMAJOR.MINOR.PATCHscripts/release.sh for manual release processGenerate container artifacts at project root and infrastructure/:
services/<service-name>/
└── Dockerfile # Per-service, multi-stage (co-located with service code)
docker-compose.yml # Local development (project root)
docker-compose.test.yml # Integration test environment (project root)
.dockerignore # (project root)
Dockerfile standards:
USER appuser)HEALTHCHECK).dockerignore excluding .git, node_modules, __pycache__, etc.Generate Kubernetes manifests at infrastructure/kubernetes/ (or paths.kubernetes from config):
infrastructure/kubernetes/
├── base/
│ ├── namespace.yaml
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── hpa.yaml
│ ├── pdb.yaml
│ └── networkpolicy.yaml
├── overlays/
│ ├── dev/
│ ├── staging/
│ └── prod/
└── kustomization.yaml
infrastructure/helm/ # (if requested)
└── <service>/
├── Chart.yaml
├── values.yaml
├── values-prod.yaml
└── templates/
K8s standards:
minAvailable: 1 minimumGenerate infrastructure/monitoring/ (or paths.monitoring from config):
monitoring/
├── prometheus/
│ ├── prometheus.yml
│ ├── alerts/
│ │ ├── availability.yml
│ │ ├── latency.yml
│ │ ├── saturation.yml
│ │ └── errors.yml
│ └── recording-rules.yml
├── grafana/
│ ├── dashboards/
│ │ ├── overview.json
│ │ ├── per-service.json
│ │ ├── infrastructure.json
│ │ └── business-metrics.json
│ └── datasources.yml
├── logging/
│ ├── fluentbit.conf # Log collection and forwarding
│ └── log-format.md # Structured logging standard
├── tracing/
│ └── otel-collector.yaml # OpenTelemetry Collector config
└── alerting/
├── pagerduty.yml
├── slack.yml
└── escalation-policy.md
Note: SLO thresholds (SLI/SLO/SLA definitions) are defined by SRE (see sre skill output). DevOps provides the monitoring infrastructure; SRE defines the service level objectives.
Note: Operational runbooks are written by SRE. See SRE output at docs/runbooks/. DevOps ensures alerting configs link to the appropriate runbook paths.
timestamp, level, service, trace_id, messagedocs/runbooks/)Generate infrastructure/security/:
security/
├── scanning/
│ ├── sast-config.yml # Semgrep/CodeQL rules
│ ├── dependency-scan.yml # Snyk/Trivy config
│ ├── container-scan.yml # Image vulnerability scanning
│ └── iac-scan.yml # tfsec/checkov config
├── secrets/
│ ├── secrets-policy.md # Secrets management standard
│ └── external-secrets.yaml # External Secrets Operator config
├── network/
│ ├── waf-rules.tf # WAF rule sets
│ ├── security-groups.tf # Network access control
│ └── tls-config.md # TLS 1.3 minimum, cert management
├── iam/
│ ├── service-roles.tf # Per-service IAM roles
│ ├── ci-cd-roles.tf # Pipeline execution roles
│ └── break-glass.md # Emergency access procedures
├── compliance/
│ ├── checklist.md # SOC2/HIPAA/GDPR checklist
│ └── data-classification.md # PII/PHI data handling
└── incident-response/
├── playbook.md # Incident response process
└── post-mortem-template.md # Blameless post-mortem format
infrastructure/
├── terraform/
│ ├── modules/
│ │ ├── networking/
│ │ ├── compute/
│ │ ├── database/
│ │ ├── messaging/
│ │ ├── storage/
│ │ ├── monitoring/
│ │ ├── security/
│ │ └── dns/
│ ├── environments/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ └── global/
├── kubernetes/
│ ├── base/
│ └── overlays/
├── helm/ # (optional)
├── monitoring/
│ ├── prometheus/
│ ├── grafana/
│ ├── logging/
│ ├── tracing/
│ └── alerting/
└── security/
├── scanning/
├── secrets/
├── network/
├── iam/
├── compliance/
└── incident-response/
.github/workflows/
├── ci.yml
├── cd-staging.yml
├── cd-production.yml
├── pr-checks.yml
└── scheduled.yml
scripts/
├── build.sh
├── deploy.sh
├── rollback.sh
└── smoke-test.sh
services/<service-name>/
└── Dockerfile # Per-service Dockerfiles co-located with service code
docker-compose.yml # Project root
docker-compose.test.yml # Project root
.forgewright/devops/
├── deployment-plan.md # Deployment planning notes
├── infrastructure-assessment.md # Infrastructure assessment documents
└── decisions.md # DevOps decision log
| Mistake | Fix |
|---------|-----|
| Same Terraform state for all envs | Separate state per environment, shared modules |
| Secrets in environment variables | Use cloud Secrets Manager + External Secrets Operator |
| No rollback strategy | Blue-green or canary with automated rollback triggers |
| Monitoring without alerting | Every dashboard metric needs an alert threshold and runbook link |
| Over-permissive IAM | Start with zero permissions, add as needed, review quarterly |
| Skipping staging | Staging must mirror prod topology, use same IaC modules |
| Docker images as root | Always USER nonroot, read-only filesystem where possible |
| Alert fatigue | SLO-based alerting (SLOs from SRE), aggregate similar alerts, escalation tiers |
| Generating SLO definitions | SLOs are the SRE's responsibility — DevOps provides monitoring infra only |
| Writing operational runbooks | Runbooks belong to SRE at docs/runbooks/ — DevOps links alerts to runbook paths |
development
[production-grade internal] Builds AR/VR/MR applications — spatial UI/UX, hand tracking, gaze input, controller interaction, comfort optimization, and cross-platform XR (Quest, Vision Pro, WebXR, PCVR). Routed via the production-grade orchestrator (Game Build mode).
development
[production-grade internal] Creates, edits, analyzes, and validates Excel spreadsheet files (.xlsx, .csv, .tsv). Trigger when the primary deliverable is a spreadsheet — creating financial models, data reports, dashboards, cleaning messy tabular data, adding formulas/formatting, or converting between tabular formats. Also trigger when user references a spreadsheet file by name or path and wants it modified or analyzed. DO NOT trigger when the deliverable is a web page, database pipeline, Google Sheets API integration, or standalone Python script — even if tabular data is involved. Routed via the production-grade orchestrator (Feature/Custom mode).
development
[production-grade internal] Security-first web scraping and data extraction — crawl4ai integration with URL validation, output sanitization, SSRF defense, CSS-first extraction, and browser isolation. Library-only mode (no Docker API). Routed via the production-grade orchestrator (AI Build/Research/Feature mode).
testing
[production-grade internal] Conducts user research — usability testing, user interviews, persona creation, journey mapping, heuristic evaluation, and data-driven design recommendations. Routed via the production-grade orchestrator (Design mode).