plugin/skills/deploy-production/SKILL.md
Use this skill when shipping to production after staging is green, when a deploy needs a documented rollback plan, or when the APPROVE-gate workflow is required — the deploy-to-production workflow (final checks, approval gate, deploy, verify, rollback plan) that uses the `deployment-procedures` skill and requires explicit APPROVE before any production mutation.
npx skillsauth add avav25/ai-assets deploy-productionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Production deployment with mandatory approval gates, verification, and rollback plan. Every step that mutates production requires explicit user APPROVE.
⚠️ SAFETY: No production mutation runs without explicit user APPROVE. This is a stricter gate than /deploy-staging — production deploys require BOTH the standard "APPROVE" confirmation AND a separate explicit acknowledgment of the rollback plan from Step 5.
Read CLAUDE.md (or AGENTS.md) at the project root to identify:
cloud-platforms skill for platform-specific commandsThis determines which deployment commands and health checks apply.
Before running helm upgrade / kubectl apply directly against production, check whether a controller owns the apply step.
| Marker | System | What this means for production deploys |
|---|---|---|
| Application / ApplicationSet CRDs in Git | Argo CD | Production manifest changes flow through git PR → Argo CD reconciles. Manual helm upgrade will be reverted on next sync. Use argocd app sync <name> only for intentional out-of-band promotion. |
| Flux HelmRelease / Kustomization CRDs | Flux | Same model as Argo CD. Manual override via flux reconcile helmrelease <name>. |
| Rollout CRDs (argoproj.io/v1alpha1) | Argo Rollouts | Progressive delivery — promotion via kubectl argo rollouts promote <name> or automatic per AnalysisTemplate. Replaces manual canary monitoring (Step 3c). |
| Canary CRDs (flagger.app/v1beta1) | Flagger | Same — promotion driven by Prometheus / Datadog metrics analyzer. |
If a controller is detected: skip imperative helm upgrade / kubectl apply in Step 3; promote via the controller. The 5–10 minute monitoring window in Step 4 is replaced by the controller's analyzer-based promotion (which is the modern best practice — automated SLO-based promotion gates the rollout, not a static wall-clock window).
If the codebase imports a feature-flag SDK, prefer decoupled deploy: ship code dark, then flip the flag separately, ramp, observe.
| Marker | Platform | Approach |
|---|---|---|
| import { Client } from '@launchdarkly/...' / LaunchDarkly SDK | LaunchDarkly | Deploy with flag default OFF. Flip flag in LD UI after deploy passes smoke. Rollback = flip flag, no redeploy needed for the new code path. |
| unleash-client / @unleash/... | Unleash | Same pattern via Unleash UI. |
| @openfeature/... | OpenFeature (vendor-neutral) | Same pattern via the underlying provider (LD/Unleash/Flagsmith/Split). |
| flagsmith / splitsoftware SDKs | Flagsmith / Split.io | Same pattern. |
Decoupled deploy fundamentally changes Step 3: the deploy itself is low-risk because the new code path is gated. The risk shifts to the flag flip, which is reversible in seconds.
If the project has a documented change-freeze policy (e.g., end-of-quarter freeze, major-event blackout), verify the current date is OUTSIDE the freeze window. Hard stop if inside; require documented exception.
Confirm that staging deployment was successful:
/deploy-staging completed successfullyIf staging was not verified — STOP. Run /deploy-staging first.
/release completed)| Factor | Assessment |
|---|---|
| Breaking changes | Yes/No — migration guide ready? |
| Database migrations | Yes/No — reversible? |
| Infrastructure changes | Yes/No — /infra-change completed? |
| Third-party dependencies | Yes/No — API compatibility verified? |
| Traffic impact | Low/Medium/High |
| Rollback complexity | Simple (revert image) / Complex (DB migration) |
If Risk = HIGH, apply Agent(sre-engineer) for SLO impact assessment.
Same as /deploy-staging Step 1c — but with production configuration.
# Record current deployment state for rollback
kubectl get deployment -n production -o yaml > pre-deploy-state.yaml
Or for Helm:
helm get values <release> -n production > pre-deploy-values.yaml
helm history <release> -n production
Present the deployment plan:
## Production Deployment Plan
- **Version**: [current] → [new]
- **Method**: [K8s/Helm/Docker/Platform]
- **Migrations**: [list if any]
- **Config changes**: [list if any]
- **Expected downtime**: [none / X minutes]
- **Rollback plan**: [documented in Step 5]
- **Monitoring**: [dashboards to watch]
⚠️ STOP. Request APPROVE before proceeding to Step 3. Production deploys require explicit acknowledgment of the rollback plan in addition to the deployment plan — confirm both with the user.
Only after the user explicitly approves:
# Database migration
<migration-command>
Verify migration completed successfully before proceeding.
Kubernetes / Helm:
helm upgrade --install <release> <chart> \
-n production \
-f values-production.yaml \
--set image.tag=<tag>
Rolling update strategy — monitor pod rollout:
kubectl rollout status deployment/<name> -n production --timeout=300s
If using canary deployment:
// turbo
kubectl get pods -n production -o wide
curl -s <production-url>/health
Run the project's existing smoke suite — do NOT improvise from generic checklists. Detect and execute (in priority order):
| Marker | Run command |
|---|---|
| tests/smoke/ directory + pytest/vitest/etc. | Project test runner against that path |
| e2e/ or tests/e2e/ with Playwright config (playwright.config.ts) | npx playwright test --grep @smoke (filter by @smoke annotation) |
| cypress/e2e/smoke/* | npx cypress run --spec 'cypress/e2e/smoke/**' |
| Postman / Newman collection | npx newman run smoke.postman_collection.json |
| K6 / Artillery script | run against the prod URL with a low-rate profile |
Fallback when no smoke suite exists (write down for follow-up — every project should ship one):
Watch for:
// turbo
kubectl logs -n production -l app=<app-name> --tail=100 --since=5m
// turbo
kubectl get events -n production --sort-by='.lastTimestamp' --field-selector type!=Normal
If issues detected — execute rollback immediately:
Helm:
helm rollback <release> <previous-revision> -n production
Kubernetes:
kubectl rollout undo deployment/<name> -n production
Database migration rollback:
<rollback-migration-command>
After rollback:
Append a deploy event to .ai-skills-memory/runs.jsonl per memory-discipline.md retention rules. Production deploys are long-retention events:
{"ts": "<ISO8601>", "event": "deploy", "env": "production", "service": "<name>", "version_from": "<old>", "version_to": "<new>", "method": "k8s|helm|docker|platform", "status": "success|fail|rolled_back", "duration_ms": N, "rollback": <bool>, "approved_by": "<user>"}
## Production Deployment Summary
- **Version**: [old] → [new]
- **Deployed at**: [timestamp]
- **Method**: [K8s/Helm/Docker/Platform]
- **Migrations**: [applied/N/A]
- **Health check**: [pass/fail]
- **Smoke tests**: [pass/fail]
- **Monitoring**: [stable/issues detected]
- **Rollback**: [not needed / executed at timestamp]
- **Production URL**: [url]
- **Next steps**: [monitoring period, team notification, release announcement]
/deploy-staging (staging verification), /release (version tag)Agent(devops-engineer), Agent(sre-engineer), Agent(devops-architect) (deployment strategy design)deployment-procedures skillruns.jsonl (deploy event per Step 6)development
Use this skill when running the recurring (daily) knowledge-base rescan for a repo that already has knowledge/.knowledge-sync.yml — the main-thread dispatcher that reads the config, computes the git delta since last_scanned_sha, maps changed paths to affected doc areas, early-exits cheaply when nothing changed, then fans out one Agent(content-writer) per affected area, applies the propose/direct update policy, advances the baseline only on success, and writes an L4 run log — all with the G1 untrusted-content choke-point, secret-scan, deny-list, and budget controls woven in. For first-time setup use /knowledge-sync-init.
development
Use this skill when bootstrapping scheduled knowledge-base sync for a repo that has no knowledge/.knowledge-sync.yml yet — to run one-time setup that detects the knowledge_root from CLAUDE.md/AGENTS.md, maps doc areas to source globs, records opt-in external sources (Linear/Notion/WebFetch, all disabled by default), captures a baseline last_scanned_sha, sets the per-area update policy, generates or seeds knowledge/CONVENTIONS.md, provisions the L4 memory dir, and offers to register the daily routine. Routes ongoing recurring sync operations to /knowledge-sync.
tools
Use this skill when bootstrapping a target repository to be ai-skills-aware — on the first run of any ai-skills workflow in a fresh repo, when adopting the ai-skills plugin in an existing repo, or after upgrading to a plugin version that adds new memory paths or templates, including when the user does not say "init" but asks to "set up" or "onboard" the repo — to detect codebase type, create CLAUDE.md + AGENTS.md scaffolding, initialize the .ai-skills-memory/ directory tree from L1 templates, and configure .gitignore. Idempotent — safe to re-run. Accepts `--codebase-type <type>` and `--overwrite`. Not for re-initializing only memory — use `/memory-init` instead.
tools
Use this skill when extending, repairing, or improving plugin assets, when ingesting a `/feedback` report as a fix-cycle backlog, or when you do not remember which lower-level command is right for the job — the umbrella workflow for ai-skills plugin-asset authoring and maintenance: creating, auditing, fixing, improving, refactoring, and migrating skills, agents, rules, hooks, prompts, schemas, and rubrics inside the plugin. Auto-classifies the request, loads the right knowledge skills (`@prompt-engineering`, `@context-engineering`, `@team-protocols`), and spawns the right subagents (`prompt-engineer`, `system-architect`, `python-engineer`, `software-engineer`, `qa-engineer`, `eval-judge`) via the `Agent` tool.