skills/istio-troubleshooting/SKILL.md
Diagnoses and resolves common Istio service mesh problems across traffic management, security, observability, and upgrades. Use when debugging Istio networking issues (503 errors, route rules not working, TLS mismatches, gateway 404s), security problems (authorization policies, mTLS, JWT authentication), observability gaps (missing traces, Grafana output issues), EnvoyFilter breakage, or when upgrading Istio and migrating from EnvoyFilter to first-class APIs.
npx skillsauth add peterj/skills istio-troubleshootingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
kubectl logs PODNAME -c istio-proxy -n NAMESPACE
NR — No route configured. Check DestinationRule or VirtualService.UO — Upstream overflow (circuit breaking). Check circuit breaker in DestinationRule.UF — Failed to connect upstream. Check for mTLS configuration conflict.If requests return HTTP 503 immediately after applying a DestinationRule (and stop when you remove it), the DestinationRule is causing a TLS conflict. When mTLS is enabled globally, every DestinationRule must include:
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
Otherwise mode defaults to DISABLE, causing plaintext requests that conflict with the server expecting encrypted traffic.
When a gateway VirtualService routes to a service, a separate VirtualService for that service's subsets won't apply to ingress traffic. The gateway uses its own host matching.
Fix: Include the subset directly in the gateway's VirtualService:
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "myapp.com"
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /hello
route:
- destination:
host: helloworld.default.svc.cluster.local
subset: v1
Or combine both VirtualServices into one with explicit gateways matching (mesh for internal, gateway name for external).
Check file descriptor limits: ulimit -a. Default 1024 is too low. Fix: ulimit -n 16384.
Envoy requires HTTP/1.1 or HTTP/2. For NGINX backends, set proxy_http_version 1.1; in config.
When accessing a headless service by Pod IP with an http- prefixed port name, the sidecar can't find the route. Fixes:
Host header (e.g., Host: nginx.default).tcp or tcp-web to bypass HTTP routing.web-0.nginx.default) instead of Pod IP.Istio doesn't support fault injection and retry/timeout on the same VirtualService. The retry config is ignored. Workaround: Remove fault from VirtualService; inject faults via EnvoyFilter on the upstream proxy instead.
INSERT_BEFORE operations depend on the referenced filter existing with an older creation time. After Istio upgrades, version-specific filters may be replaced. Fix: Use INSERT_FIRST or set explicit priority (e.g., priority: 10).
For detailed TLS troubleshooting scenarios, see tls-issues.md.
Key issues:
400 DPE or SSL errorshttp routing after termination)tls routing)--resolve or set hosts: "*")Docker for Mac time skew causes traces to appear days early. Fix: restart Docker, then reinstall Istio.
Client and server time must match. Ensure NTP/Chrony is configured correctly on both the cluster and the browser machine.
kubectl -n kube-system get pod -l k8s-app=istio-cni-node
If using PodSecurityPolicy, ensure istio-cni service account can use a PSP allowing NET_ADMIN and NET_RAW.
# Check proxy config
istioctl proxy-config listener POD -n NAMESPACE --port 80 --type HTTP -o json
istioctl proxy-config routes POD -n NAMESPACE
istioctl proxy-config secret POD
# Check authorization
istioctl x authz check POD.NAMESPACE
# Enable debug logging
istioctl admin log --level authorization:debug
istioctl proxy-config log deploy/DEPLOYMENT --level "rbac:debug"
# Get proxy config dump
kubectl exec POD -c istio-proxy -- pilot-agent request GET config_dump
# View Istiod logs
kubectl logs $(kubectl -n istio-system get pods -l app=istiod -o jsonpath='{.items[0].metadata.name}') -c discovery -n istio-system
development
Guide for installing, configuring, and deploying SPIRE servers and agents. Use when working with SPIRE, SPIFFE, workload identity, trust domains, node attestation, workload attestation, service identity, or X.509/JWT SVIDs on Kubernetes or Linux.
tools
Configures Istio traffic management including multicluster traffic control, gateway network topology (XFF/XFCC headers, PROXY protocol), protocol selection, and TLS configuration. Use when working with Istio service mesh traffic routing, multicluster setups, gateway configuration, protocol detection, mTLS settings, or when troubleshooting TLS/proxy header issues.
development
Guide for installing, deploying, debugging, and cleaning up Istio's ambient mode mesh. Use when working with Istio ambient mode, ztunnel proxies, ambient mesh traffic redirection, istio-cni, HBONE encryption, Bookinfo sample application deployment, or istioctl commands for ambient profile setup and teardown.
development
Configures PagerDuty v1 and v2 notification services for Argo CD. Use when setting up PagerDuty incident creation or event triggering from Argo CD notifications, including secrets, ConfigMaps, templates, and annotations for PagerDuty integrations.