nWave/skills/nw-devops/SKILL.md
Designs CI/CD pipelines, infrastructure, observability, and deployment strategy. Use when preparing platform readiness for a feature.
npx skillsauth add nwave-ai/nwave nw-devopsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Wave: DEVOPS (wave 4 of 6) | Agent: Apex (nw-platform-architect) | Command: /nw-devops
Execute DEVOPS wave: platform readiness|CI/CD pipeline setup|observability design|infrastructure preparation. Positioned between DESIGN and DISTILL (DISCOVER > DISCUSS > SPIKE > DESIGN > DEVOPS > DISTILL > DELIVER), ensures infrastructure is ready before acceptance tests and code.
Apex translates DESIGN architecture decisions into operational infrastructure: CI/CD pipelines|logging|monitoring|alerting|observability.
Provenance: feature lean-wave-documentation — D2 (schema-typed sections), D10 (one-line expansion descriptions). Tier-1 [REF] sections (always emitted) + Tier-2 EXPANSION CATALOG items (lazy, on-demand) are the two output bands. Full contract: nWave/skills/nw-density-resolution-contract/SKILL.md.
Under ## Wave: DEVOPS / [REF] <Section> headings:
Rendered under ## Wave: DEVOPS / [WHY|HOW] <Section> only when requested via --expand <id> (DDD-2), the wave-end menu (expansion_prompt = "ask"), mode = "full" auto-expansion, or an ad-hoc user request mid-session.
| Expansion ID | Tier label | One-line description |
|---|---|---|
| infra-cost-analysis | [WHY] | Per-environment monthly cost estimate with vendor pricing assumptions |
| alternative-deploy-targets | [WHY] | Cloud/on-prem/hybrid options weighed and rejected with one-paragraph reason |
| observability-deep-dive | [HOW] | Detailed metric/log/trace schemas, alert thresholds, dashboard layouts |
| runbook-drafts | [HOW] | Incident response runbooks for the top failure modes |
| kpi-instrumentation-recipes | [HOW] | Per-KPI data collection recipe (event names, log fields, metric labels) |
| ci-pipeline-yaml | [HOW] | Full CI/CD pipeline YAML with comments per stage |
| disaster-recovery-plan | [HOW] | Backup, restore, and DR procedures with RPO/RTO targets |
| expansion-catalog-rationale | [WHY] | Why this set of expansions, why these defaults, why D10 enforces one-line descriptions |
Call resolve_density(global_config) from scripts/shared/density_config.py after reading ~/.nwave/global-config.json (missing/malformed = empty dict). Returns mode ("lean" | "full") + expansion_prompt ("ask" | "always-skip" | "always-expand" | "smart") per the D12 cascade (resolver-internal, DDD-5 — do NOT replicate locally). Branch on density.mode for what to emit; branch on density.expansion_prompt at wave end for menu behaviour. Full cascade detail, branch semantics, ad-hoc override workflow: nWave/skills/nw-density-resolution-contract/SKILL.md.
Every expansion choice emits a DocumentationDensityEvent (dataclass at src/des/domain/telemetry/documentation_density_event.py) via event.to_audit_event() → JsonlAuditLogWriter().log_event(...). Schema fields per D4: feature_id, wave, expansion_id, choice, timestamp. For this wave the schema declares "wave": "DEVOPS". Use helper scripts/shared/telemetry.py:write_density_event(...) — do NOT write JSONL directly.
Wave-specific signal: DISTILL consuming a lean DEVOPS environment matrix — downstream --expand requests for runbook drafts or alternative deploy targets indicate the [REF] baseline was insufficient. Full emission rules: nWave/skills/nw-density-resolution-contract/SKILL.md.
Before proceeding, the orchestrator asks:
Question: What is the deployment target? Options:
Question: Container orchestration approach? Options:
Question: CI/CD platform preference? Options:
Question: Is there existing infrastructure or CI/CD to integrate with? Options:
Question: What observability and logging approach? Options:
Question: What deployment strategy? Options:
Question: Is there existing monitoring/alerting infrastructure in place? Options:
If Yes to Decision 7: Follow-up: Which continuous learning capabilities to include? Options:
Question: What Git branching strategy should the project follow? Options:
This directly influences CI/CD pipeline design: trigger rules|branch protection|environment promotion|release automation.
Question: When should mutation testing run? Options:
After selection, Apex asks permission to write to project CLAUDE.md under ## Mutation Testing Strategy:
per-feature: This project uses **per-feature** mutation testing. Runs after refactoring during each delivery, scoped to modified files. Kill rate gate: >= 80%.
nightly-delta: This project uses **nightly-delta** mutation testing. CI runs on files modified each day. NOT run during feature delivery.
pre-release: This project uses **pre-release** mutation testing. Runs on entire solution before each release. Delivery not blocked.
disabled: Mutation testing is **disabled**. Test quality validated through code review and CI coverage.
Default if not chosen: per-feature.
Before beginning DEVOPS work, read targeted prior wave artifacts:
docs/feature/{feature-id}/discuss/outcome-kpis.md — drives observability and instrumentation design.docs/feature/{feature-id}/design/ — architecture drives infrastructure decisions.READING ENFORCEMENT: Read every file listed above using the Read tool before proceeding. After reading, output a confirmation checklist (✓ {file} for each read, ⊘ {file} (not found) for missing). Do NOT skip files that exist — skipping causes infrastructure decisions disconnected from architecture.
After reading, check whether any DEVOPS decisions would contradict DESIGN architecture. Flag contradictions and resolve with user before proceeding. Example: DESIGN specifies "single-region deployment" but DEVOPS discovers latency requirements from outcome-kpis.md that demand multi-region — this must be resolved.
When DEVOPS decisions change assumptions from prior waves:
## Changed Assumptions section at the end of the affected DEVOPS artifact. Gate: section present in artifact.docs/feature/{feature-id}/devops/upstream-changes.md for the architect to review. Gate: file created if architecture impact exists.@nw-platform-architect with the feature-id and configuration below. Gate: agent accepts invocation.outcome-kpis.md exists in the feature's discuss directory, Apex MUST read it and design instrumentation to collect the defined KPIs. Each KPI's "Measured By" and "Measurement Plan" sections drive: data collection infrastructure (events, logs, analytics), dashboard design (which metrics to visualize), alerting rules (guardrail metric thresholds). Gate: all KPIs have corresponding instrumentation design.BEFORE completing the DEVOPS wave, produce the environment inventory:
docs/feature/{feature-id}/devops/environments.yaml with the structure below. Gate: file written.# environments.yaml — consumed by DISTILL for Mandate 4 (Environmental Realism)
target_environments:
- name: clean
description: "Fresh install, no prior state"
platform: [linux, macos, wsl]
preconditions: []
- name: with-pre-commit
description: "Pre-commit hooks installed and active"
platform: [linux, macos, wsl]
preconditions: ["pre-commit installed", "core.hooksPath set to .git/hooks"]
- name: with-stale-config
description: "Outdated configuration from prior version"
platform: [linux, macos]
preconditions: ["legacy config present", "version mismatch"]
coexistence_matrix:
- tool: pre-commit
must_not_break: true
- tool: husky
must_not_break: true
platform_coverage:
macOS: [12.x, 13.x, 14.x]
Linux: [Ubuntu 22.04, Ubuntu 24.04]
WSL: [WSL2]
CI: [GitHub Actions ubuntu-latest]
deployment_assumptions:
- "Installation MUST be idempotent (safe to run twice)"
- "Uninstall MUST remove only nWave artifacts"
- "Hooks MUST survive alongside existing hook managers"
For features that do NOT install into systems (pure business logic), the environment inventory contains only target_environments: [{name: clean, platform: [linux, macos]}].
DISTILL reads this file to parametrize acceptance scenarios over target environments. If this file is missing, DISTILL uses defaults (clean, with-pre-commit, with-stale-config) — but coverage gaps are the PA's responsibility.
Per-wave Forge review is opt-in. Default: skip and proceed to DISTILL. The mandatory consolidated review covering DISCUSS+DESIGN+DEVOPS+DISTILL fires at end of DISTILL where Eclipse + Architect + Forge + Sentinel run in parallel against the full feature-delta.md (all 4 waves visible — catches cross-wave inconsistencies that per-wave review misses).
Structural-correctness reviewer never skips: rigor.reviewer_model: "skip" applies to scale-sensitive cost-driven reviewers (Eclipse / Architect / Forge) only; the structural-correctness reviewer at the end of DISTILL (Sentinel / @nw-acceptance-designer-reviewer) ALWAYS dispatches — silent skip masks the bug class issue #52 fixed.
Invoke per-wave Forge review explicitly via /nw-review nw-platform-architect-reviewer only if:
When triggered, the reviewer covers: CI/CD pipeline correctness and completeness, environment inventory coverage, observability design alignment with outcome KPIs, infrastructure security and deployment strategy soundness. On REJECTION: revise artifacts per findings and re-submit (max 2 revision cycles before escalation). Gate: optional unless triggered.
environments.yaml with target environments and coexistence matrix)Handoff To: nw-acceptance-designer (DISTILL wave)
Deliverables: Infrastructure design documents + environments.yaml (mandatory for DISTILL Mandate 4)
/nw-devops payment-gateway
User selects: cloud-native, Kubernetes, GitHub Actions, no existing infra, OpenTelemetry, blue-green, trunk-based development. Apex designs full infrastructure from scratch with robust CI gates on every commit to main.
/nw-devops auth-upgrade
User selects: hybrid, Docker Compose, GitLab CI (existing), existing CI/CD only, Datadog, rolling, GitFlow. Apex extends existing pipelines with branch-specific stages for develop, release, and hotfix branches.
Before completing DEVOPS, produce docs/feature/{feature-id}/devops/wave-decisions.md:
# DEVOPS Decisions — {feature-id}
## Key Decisions
- [D1] {decision}: {rationale} (see: {source-file})
## Infrastructure Summary
- Deployment: {target + strategy}
- CI/CD: {platform + branching strategy}
- Observability: {stack}
- Mutation testing: {strategy}
## Constraints Established
- {infrastructure constraint}
## Upstream Changes
- {any DESIGN assumptions changed, with rationale}
Single narrative file: docs/feature/{feature-id}/feature-delta.md — environment matrix, CI/CD outline, monitoring contracts, deployment strategy, mutation strategy, observability stack, branching strategy, coexistence matrix all become ## Wave: DEVOPS / [REF|WHY|HOW] <Section> headings.
Machine artifacts (declared, parseable by downstream):
docs/feature/{feature-id}/environments.yaml — target environments + coexistence matrix + platform coverage + deployment assumptions. DISTILL parses this to parametrize acceptance scenarios over environments (Mandate 4 / Environmental Realism).SSOT updates (per Recommendation 3 / back-propagation contract):
docs/product/kpi-contracts.yaml — instrumentation deltas: per-KPI data collection (event names, log fields, metric labels), dashboard mapping, alerting thresholds. Created if absent; extended otherwise.docs/product/architecture/brief.md — append/update deployment topology subsection if the chosen platform changes the system-context diagram (e.g. new managed services, new region).Legacy multi-file outputs (platform-architecture.md, ci-cd-pipeline.md, observability-design.md, monitoring-alerting.md, infrastructure-integration.md, branching-strategy.md, continuous-learning.md, kpi-instrumentation.md, wave-decisions.md as separate files) are NOT produced — that content lives in feature-delta.md. Only environments.yaml survives as a separate machine artifact because it has a parseable downstream consumer. Validator: scripts/validation/validate_feature_layout.py.
testing
Runs feature-scoped mutation testing to validate test suite quality. Use after implementation to verify tests catch real bugs (kill rate >= 80%).
development
Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.
development
Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation