skills/service-mesh-microservices-expert/SKILL.md
Istio, Envoy, circuit breakers, and service discovery for microservices. Activate on: service mesh, Istio, Envoy, sidecar, circuit breaker, service discovery, mTLS, traffic management. NOT for: API gateway edge routing (use api-gateway-reverse-proxy-expert), application-level observability (use observability-apm-expert).
npx skillsauth add curiositech/windags-skills service-mesh-microservices-expertInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design and operate service meshes for secure, observable, and resilient microservice communication using Istio, Envoy, and Linkerd.
Activate on: "service mesh", "Istio", "Envoy", "sidecar proxy", "circuit breaker", "service discovery", "mTLS", "traffic management", "canary deployment", "Linkerd"
NOT for: Edge/API gateway → api-gateway-reverse-proxy-expert | Application instrumentation → observability-apm-expert | Container orchestration basics → relevant DevOps skill
| Domain | Technologies | |--------|-------------| | Meshes | Istio 1.24+, Linkerd 2.16+, Cilium Service Mesh | | Data Plane | Envoy Proxy, Linkerd2-proxy, eBPF (Cilium) | | Security | mTLS, SPIFFE/SPIRE, AuthorizationPolicy | | Traffic | VirtualService, DestinationRule, traffic splitting | | Observability | Kiali, automatic Prometheus metrics, distributed tracing |
┌────────────────── Pod ──────────────────┐
│ ┌──────────┐ ┌──────────────────┐ │
│ │ App │────→│ Envoy Sidecar │──┼──→ Other services
│ │ Container│←────│ (injected auto) │←─┼── (via their sidecars)
│ └──────────┘ └──────────────────┘ │
└─────────────────────────────────────────┘
Envoy intercepts all inbound/outbound traffic:
- mTLS encryption/decryption
- Retry, timeout, circuit breaking
- Metrics collection (RED)
- Access logging
- Traffic routing rules
# Istio DestinationRule: circuit breaker for payment-service
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: payment-service
spec:
host: payment-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: DEFAULT
maxRequestsPerConnection: 10
outlierDetection:
consecutive5xxErrors: 5 # trip after 5 errors
interval: 10s # check every 10s
baseEjectionTime: 30s # eject for 30s minimum
maxEjectionPercent: 50 # never eject >50% of hosts
---
# Istio VirtualService: retry policy
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: payment-service
spec:
hosts: [payment-service]
http:
- route:
- destination:
host: payment-service
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,reset,connect-failure
timeout: 10s
# Route 90% to v1, 10% to v2 (canary)
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: order-service
spec:
hosts: [order-service]
http:
- route:
- destination:
host: order-service
subset: v1
weight: 90
- destination:
host: order-service
subset: v2
weight: 10
# Progressive: 90/10 → 70/30 → 50/50 → 0/100
# Roll back instantly by setting v1 weight to 100
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.