skills/core/dns-debug/SKILL.md
Diagnose DNS resolution failures in the cluster (NXDOMAIN, timeouts, SERVFAIL). Checks CoreDNS health, service endpoints, and DNS configuration.
npx skillsauth add scitix/siclaw dns-debugInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When pods report DNS resolution failures (service discovery not working, NXDOMAIN errors, DNS timeouts), follow this flow to identify the root cause.
Scope: This skill is for diagnosis only. Once you identify the root cause, report it to the user and stop. Do NOT attempt to modify CoreDNS configuration or network policies — that should be left to the user or cluster administrator.
If a specific pod is having DNS issues, test DNS resolution from within that pod:
pod_exec: pod=<pod>, namespace=<ns>, command="nslookup <service-name>"
For cross-namespace service resolution:
pod_exec: pod=<pod>, namespace=<ns>, command="nslookup <service-name>.<target-namespace>.svc.cluster.local"
If nslookup is not available in the container, try:
pod_exec: pod=<pod>, namespace=<ns>, command="cat /etc/resolv.conf"
This shows the DNS server the pod is configured to use and the search domains.
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
All CoreDNS pods should be Running and Ready. Note which nodes they are running on.
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
Look for error messages, SERVFAIL responses, or upstream DNS failures.
kubectl get svc -n kube-system kube-dns
kubectl get endpoints -n kube-system kube-dns
The kube-dns service should have a ClusterIP, and the endpoints should list the CoreDNS pod IPs. If endpoints are empty, CoreDNS pods are not ready.
NXDOMAIN / server can't find — Name does not existThe DNS name cannot be resolved. Common causes:
kubectl get svc -n <target-namespace> <service-name><service>.<namespace>.svc.cluster.local from other namespacesAdvise the user to verify the service name, namespace, and that the target service exists with ready endpoints.
connection timed out / no servers could be reached — DNS timeoutDNS queries are not reaching CoreDNS or CoreDNS is not responding.
Check CoreDNS pod health (step 2). If CoreDNS pods are healthy, possible causes:
kubectl top pods -n kube-system -l k8s-app=kube-dns
SERVFAIL — Server failureCoreDNS received the query but failed to resolve it. Common causes:
kubectl get configmap -n kube-system coredns -o yaml
Look for the forward directive — it defines where CoreDNS forwards external queries.
ndots / slow external DNS resolution — Search domain misconfigurationBy default, Kubernetes sets ndots:5 in pods' resolv.conf, causing external domains (e.g., api.example.com) to be tried with cluster search domains first, leading to unnecessary NXDOMAIN queries before the real resolution.
Check the pod's DNS configuration:
pod_exec: pod=<pod>, namespace=<ns>, command="cat /etc/resolv.conf"
If ndots:5 is set and the pod frequently resolves external domains, advise the user to set dnsConfig.options in the pod spec to lower ndots or add specific searches entries.
CrashLoopBackOff / not ready — CoreDNS failureCoreDNS itself is failing. Check CoreDNS logs (step 3) for the specific error.
Common causes:
forward target is not reachablekube-dns — No CoreDNS backendsThe DNS service has no endpoints, meaning no CoreDNS pods are ready. All DNS queries in the cluster will fail.
Check CoreDNS pod status and events for why they are not ready.
kube-dns service name is used even when CoreDNS is the DNS provider — this is for backwards compatibility.hostNetwork: true, DNS resolution uses the node's /etc/resolv.conf instead of the cluster DNS. These pods cannot resolve cluster-internal service names by default.*.svc.cluster.local names are only resolvable from within the cluster.networkpolicy-debug to check before investigating CoreDNS.development
Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.
development
Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.
development
Retrieve and analyze Volcano scheduler logs. Filter by keyword, time range, or pod name to debug scheduling decisions.
tools
View Volcano scheduler configuration. Check scheduler ConfigMap, actions, plugins, and tier settings.