skills/blue-green-deployment-orchestrator/SKILL.md
Blue-green and canary deployment orchestrator with traffic shifting and automated rollback. Activate on: blue-green deployment, canary release, rolling deployment, traffic shifting, rollback automation, progressive delivery, Argo Rollouts, Flagger. NOT for: K8s manifest generation (use kubernetes-manifest-generator), CI/CD pipeline setup (use github-actions-pipeline-builder), monitoring (use monitoring-stack-deployer).
npx skillsauth add curiositech/windags-skills blue-green-deployment-orchestratorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert in progressive delivery strategies — blue-green, canary, and rolling deployments with automated traffic shifting and rollback.
Activate on: "blue-green deployment", "canary release", "rolling deployment", "traffic shifting", "rollback automation", "progressive delivery", "Argo Rollouts", "Flagger", "deployment strategy"
NOT for: K8s manifests → kubernetes-manifest-generator | CI/CD pipelines → github-actions-pipeline-builder | Monitoring → monitoring-stack-deployer
| Domain | Technologies | |--------|-------------| | Controllers | Argo Rollouts 1.7, Flagger 1.38, Spinnaker | | Traffic Splitting | Istio VirtualService, Linkerd TrafficSplit, Gateway API HTTPRoute | | Analysis | Prometheus queries, Datadog metrics, CloudWatch, custom webhooks | | Strategies | Blue-green, canary (linear/exponential), A/B testing, rolling | | Platforms | Kubernetes, AWS ECS (CodeDeploy), Cloudflare Workers (gradual) |
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-server
spec:
replicas: 5
strategy:
canary:
canaryService: api-canary-svc
stableService: api-stable-svc
trafficRouting:
istio:
virtualService:
name: api-vsvc
steps:
- setWeight: 5 # 5% traffic to canary
- pause: { duration: 5m }
- analysis:
templates:
- templateName: error-rate-check
- setWeight: 25 # 25% if analysis passes
- pause: { duration: 10m }
- analysis:
templates:
- templateName: latency-check
- setWeight: 50 # 50%
- pause: { duration: 10m }
- setWeight: 100 # Full promotion
rollbackWindow:
revisions: 2
┌──────────────┐
│ Router │
│ (Ingress/ │
│ Gateway) │
└──────┬───────┘
│
┌────────────┼────────────┐
▼ ▼
┌────────────────┐ ┌────────────────┐
│ BLUE (active) │ │ GREEN (preview)│
│ v1.2.0 │ │ v1.3.0 │
│ 3 replicas │ │ 3 replicas │
└────────────────┘ └────────────────┘
Workflow:
1. Deploy v1.3.0 to GREEN (preview, no traffic)
2. Run smoke tests against GREEN preview URL
3. Switch router: 100% traffic BLUE → GREEN
4. Monitor for 15 minutes
5. If healthy: scale down BLUE (now standby)
6. If unhealthy: instant rollback — switch back to BLUE
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: error-rate-check
spec:
metrics:
- name: error-rate
interval: 1m
count: 5
successCondition: result[0] < 0.01 # < 1% error rate
failureLimit: 2
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(http_requests_total{status=~"5..",
app="{{args.service}}",
rollout_hash="{{args.canary-hash}}"}[2m]))
/
sum(rate(http_requests_total{
app="{{args.service}}",
rollout_hash="{{args.canary-hash}}"}[2m]))
[ ] Deployment strategy documented (blue-green, canary, or rolling)
[ ] Progressive delivery controller deployed (Argo Rollouts or Flagger)
[ ] Traffic splitting configured via service mesh or Gateway API
[ ] Canary analysis templates defined with Prometheus queries
[ ] Automated rollback triggers on error rate or latency degradation
[ ] Preview/canary service accessible for pre-promotion testing
[ ] Rollback tested independently (not just on failure)
[ ] Deployment takes less than 15 minutes end-to-end
[ ] Resource budget accounts for blue-green double capacity
[ ] Deployment status visible in Grafana or ArgoCD dashboard
[ ] Notification sent on promotion and rollback events
[ ] Runbook documents manual override procedures
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.