skills/core/pod-pending-debug/SKILL.md
Diagnose pod scheduling failures (Pending, Unschedulable). Checks events, node resources, taints, affinity, and PVC bindings to identify why a pod cannot be scheduled.
npx skillsauth add scitix/siclaw pod-pending-debugInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a pod is stuck in Pending state, follow this flow to identify why the scheduler cannot place it on a node.
Scope: This skill is for diagnosis only. Once you identify the root cause, report it to the user and stop. Do NOT attempt to modify node taints, labels, or pod specs — that should be left to the user.
kubectl describe pod <pod> -n <ns>
Focus on the Events section. The scheduler's FailedScheduling event contains the reason. Note the full event message — it lists how many nodes were evaluated and why each was rejected.
Match the FailedScheduling message against the patterns below.
Insufficient cpu / Insufficient memory — Not enough resourcesNo node has enough allocatable CPU or memory to satisfy the pod's resource requests.
Check node resource usage:
kubectl top nodes
Check what the pod is requesting:
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[*].resources.requests}'
Advise the user to either reduce the pod's resource requests, scale up existing nodes, or add new nodes to the cluster.
didn't match Pod's node affinity/selector — Node affinity/selector mismatchThe pod has a nodeSelector or nodeAffinity that no available node satisfies.
Check the pod's node selection criteria:
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.nodeSelector}'
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.affinity}'
Check available node labels:
kubectl get nodes --show-labels
Advise the user to either update the pod's selector/affinity or add the required labels to appropriate nodes.
had taint ... that the pod didn't tolerate — Taint/toleration mismatchNodes have taints that the pod does not tolerate.
Check node taints:
kubectl get nodes -o custom-columns='NAME:.metadata.name,TAINTS:.spec.taints[*].key'
Check the pod's tolerations:
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.tolerations}'
Advise the user to either add the appropriate toleration to the pod or remove the taint from a node.
persistentvolumeclaim ... not found / not bound — PVC issueThe pod references a PVC that does not exist or is not bound to a PV.
Check PVC status:
kubectl get pvc -n <ns>
If the PVC exists but is Pending, check its events:
kubectl describe pvc <pvc-name> -n <ns>
Common causes: no matching PV, StorageClass not found, or provisioner failed.
0/N nodes are available (all filtered) — No nodes availableEvery node in the cluster was rejected. The message usually lists multiple reasons. Address each reason individually — the most impactful one is typically resource insufficiency or taints.
didn't find available persistent volumes — No matching PVThe PVC exists but no PV matches its requirements (size, access mode, storage class).
kubectl get pv
kubectl get pvc <pvc-name> -n <ns> -o yaml
pod has unbound immediate PersistentVolumeClaims — PVC not yet boundThe PVC is waiting for a PV to be provisioned. Check if the StorageClass provisioner is working:
kubectl get storageclass
kubectl get events -n <ns> --field-selector involvedObject.name=<pvc-name>
Preempting — Scheduler is preempting lower-priority podsThe scheduler is attempting to evict lower-priority pods to make room. This is normal behavior for priority-based scheduling. If the pod remains Pending after preemption, there may be additional constraints.
FailedScheduling event exists, the pod may not have been processed by the scheduler yet — check if the scheduler pod itself is healthy: kubectl get pods -n kube-system -l component=kube-scheduler.scheduling.volcano.sh/pod-group annotation, it is managed by Volcano scheduler — use volcano-diagnose-pod skill instead for Volcano-specific issues (PodGroup, Queue, Gang scheduling).testing
Show and ping the gateway of a network interface, on a Kubernetes node or inside a pod's network namespace. Auto-detects the gateway from the routing table (ip -j route), reports interface type (RoCE / Ethernet / IB), and tests reachability with ping. Use for default-route / gateway questions, network reachability checks, RoCE/RDMA data-path validation, and "can this node/pod reach its gateway" investigations.
development
Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.
devops
Retrieve logs from a Kubernetes node. Supports journalctl (systemd units) and file-based logs. Use when you need to inspect node-level logs (containerd, kubelet, etc.). Run via host_script (preferred) or node_script.
development
Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.