dot_config/opencode/skills/debugging-k8s-pods/SKILL.md
Debugs Kubernetes pod failures including CrashLoopBackOff, OOMKilled, ImagePullBackOff, init container failures, and CreateContainerConfigError. Use when pods crash, restart repeatedly, fail to start, or show container errors.
npx skillsauth add rio/dotfiles debugging-k8s-podsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Investigates pod lifecycle issues and container failures.
| Status | Likely Cause | First Check |
|--------|-------------|-------------|
| CrashLoopBackOff | App crash or misconfiguration | Logs + exit code |
| ImagePullBackOff | Wrong image, missing tag, auth failure | Image name + pull secret |
| OOMKilled | Memory limit exceeded | Resource limits vs actual usage |
| CreateContainerConfigError | Missing ConfigMap/Secret | Referenced configs exist |
| Init:Error | Init container failed | Init container logs |
| Pending | Scheduling issue | Load debugging-k8s-scheduling |
kubectl get pod <pod> -n <ns> -o wide
kubectl describe pod <pod> -n <ns>
Look for:
# Current container logs
kubectl logs <pod> -n <ns>
# Previous crashed container logs
kubectl logs <pod> -n <ns> --previous
# Specific container in multi-container pod
kubectl logs <pod> -n <ns> -c <container>
# Init container logs
kubectl logs <pod> -n <ns> -c <init-container-name>
| Exit Code | Meaning | |-----------|---------| | 0 | Success (check why it exited) | | 1 | Application error | | 137 | SIGKILL (OOMKilled or external kill) | | 139 | SIGSEGV (segmentation fault) | | 143 | SIGTERM (graceful shutdown requested) |
Get exit code:
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
# Check logs from crashed container
kubectl logs <pod> -n <ns> --previous
# Check restart count and last state
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.containerStatuses[0].restartCount}'
Common causes:
# Check image name in events
kubectl describe pod <pod> -n <ns> | grep -A5 "Events:"
# Check if pull secret exists
kubectl get secrets -n <ns>
Common causes:
latest removed)# Check memory limits
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[*].resources}'
# Check if OOMKilled
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'
If OOMKilled: either increase memory limits or investigate memory leak.
# Check what ConfigMap/Secret is referenced
kubectl get pod <pod> -n <ns> -o yaml | grep -A10 "env:\|envFrom:\|volumes:"
# Verify ConfigMap exists
kubectl get configmap -n <ns>
# Verify Secret exists
kubectl get secrets -n <ns>
# List init containers
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.initContainers[*].name}'
# Check init container logs
kubectl logs <pod> -n <ns> -c <init-container-name>
# Full pod YAML for deep inspection
kubectl get pod <pod> -n <ns> -o yaml
# Events for this pod only
kubectl get events -n <ns> --field-selector involvedObject.name=<pod>
# Check all containers status
kubectl get pod <pod> -n <ns> -o jsonpath='{range .status.containerStatuses[*]}{.name}: {.state}{"\n"}{end}'
retrieving-k8s-logs for advanced log patternsdebugging-k8s-resources if OOMKilled due to limitsdebugging-k8s-scheduling if stuck in Pendingdocumentation
Compact the current conversation into a handoff document for another agent to pick up.
development
Create new agent skills with proper structure, progressive disclosure, and bundled resources. Use when user wants to create, write, or build a new skill.
testing
Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
development
Retrieves Kubernetes container logs with various patterns including multi-container pods, previous container logs, init containers, and label-based aggregation. Use when checking application logs, debugging crashes, or analyzing container output.