dot_config/opencode/skills/debugging-k8s-scheduling/SKILL.md
Debugs Kubernetes pod scheduling issues including pods stuck in Pending, node affinity/anti-affinity problems, taints and tolerations, insufficient node resources, and PodDisruptionBudget blocking. Use when pods won't schedule, stuck in Pending, or node placement issues.
npx skillsauth add rio/dotfiles debugging-k8s-schedulingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Investigates why pods are not being scheduled to nodes.
| Symptom | Likely Cause | First Check | |---------|-------------|-------------| | Pending (no nodes available) | Resource shortage | Node capacity | | Pending (taints) | Missing toleration | Node taints | | Pending (affinity) | No matching node | Affinity rules | | Pending (PDB) | Disruption budget | PDB status | | Unschedulable | Node cordoned | Node status |
# Describe pod - look at Events section
kubectl describe pod <pod> -n <ns>
# Events will show scheduling failure reason:
# "0/3 nodes are available: 1 Insufficient cpu, 2 node(s) had taint..."
# List nodes with status
kubectl get nodes
# Node details
kubectl describe node <node>
# Check for cordoned nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,SCHEDULABLE:.spec.unschedulable
# Resource availability per node
kubectl describe node <node> | grep -A10 "Allocated resources:"
# Quick capacity check
kubectl top nodes
# Node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# Detailed taints
kubectl describe node <node> | grep -A5 "Taints:"
# Pod tolerations
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.tolerations}'
# Check what pod needs
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[*].resources.requests}'
# Check what nodes have available
kubectl describe nodes | grep -A10 "Allocated resources:"
Error: 0/3 nodes are available: 3 Insufficient cpu
Options:
# Common taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# Example: node.kubernetes.io/not-ready:NoSchedule
Pods must have matching tolerations to schedule on tainted nodes.
Common taints:
node.kubernetes.io/not-ready - Node not readynode.kubernetes.io/unreachable - Node unreachablenode.kubernetes.io/disk-pressure - Disk pressurenode.kubernetes.io/memory-pressure - Memory pressure# Check pod affinity rules
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.affinity}'
# Check node labels
kubectl get nodes --show-labels
# Check specific label
kubectl get nodes -l <label-key>=<label-value>
Error: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector
# Check podAntiAffinity
kubectl get pod <pod> -n <ns> -o yaml | grep -A20 "podAntiAffinity"
Anti-affinity might prevent scheduling if:
# List PDBs
kubectl get pdb -n <ns>
# PDB details
kubectl describe pdb <pdb> -n <ns>
# Check allowed disruptions
kubectl get pdb -n <ns> -o custom-columns=NAME:.metadata.name,MIN-AVAILABLE:.spec.minAvailable,ALLOWED-DISRUPTIONS:.status.disruptionsAllowed
If ALLOWED-DISRUPTIONS: 0, pods can't be evicted/rescheduled.
# Pod's nodeSelector
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.nodeSelector}'
# Find nodes matching selector
kubectl get nodes -l <key>=<value>
# All pending pods cluster-wide
kubectl get pods -A --field-selector=status.phase=Pending
# Scheduling events
kubectl get events -A --field-selector reason=FailedScheduling
# Node conditions summary
kubectl get nodes -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions[-1].type,SCHEDULABLE:.spec.unschedulable
Filter nodes: Remove nodes that don't meet requirements
Score nodes: Rank remaining nodes
Bind: Assign pod to highest-scoring node
| Constraint | Hard/Soft | Effect | |------------|-----------|--------| | nodeSelector | Hard | Must match node label | | requiredDuringScheduling | Hard | Must satisfy affinity | | preferredDuringScheduling | Soft | Prefer if possible | | Taint (NoSchedule) | Hard | Must have toleration | | Taint (PreferNoSchedule) | Soft | Avoid if possible | | PDB | Hard | Blocks eviction |
debugging-k8s-resources for resource-related scheduling issuesanalyzing-k8s-events for scheduling event historydocumentation
Compact the current conversation into a handoff document for another agent to pick up.
development
Create new agent skills with proper structure, progressive disclosure, and bundled resources. Use when user wants to create, write, or build a new skill.
testing
Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
development
Retrieves Kubernetes container logs with various patterns including multi-container pods, previous container logs, init containers, and label-based aggregation. Use when checking application logs, debugging crashes, or analyzing container output.