Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

notque/kubernetes-debugging

Name: kubernetes-debugging
Author: notque

skills/infrastructure/kubernetes-debugging/SKILL.md

npx skillsauth add notque/claude-code-toolkit kubernetes-debugging

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Kubernetes Debugging Skill

Systematic diagnosis of pod failures, networking issues, and resource problems using a structured triage flow: describe, logs, events, exec.

Reference Loading Table

| Signal | Reference | Size | |--------|-----------|------| | CrashLoopBackOff, OOMKilled, config error, health check, liveness probe, ImagePullBackOff, image pull, registry auth, Pending, FailedScheduling, node affinity, taint, PVC | references/crash-diagnosis.md | ~140 lines | | service resolution, DNS, nslookup, CoreDNS, port-forward, NetworkPolicy, ingress, egress | references/network-debugging.md | ~50 lines | | CPU throttling, memory limit, OOMKill, ephemeral storage, DiskPressure, debug container, distroless, kubectl reference, rollout, exec | references/resource-debugging.md | ~100 lines |

Load greedily. If the user's question touches any signal keyword, load the matching reference before responding. Multiple signals matching = load all matching references.

Instructions

Triage Flow

Follow this sequence for every pod or workload issue. Do not skip steps -- many failures (scheduling, image pull, volume mount) are only visible in events and describe output, not in logs, so jumping straight to logs misses them.

Always specify -n <namespace> explicitly in every command; never rely on the default context namespace, because the wrong namespace silently returns empty or misleading results.

# 1. Get an overview of the resource state
kubectl get pods -n <namespace> -o wide

# 2. Describe the resource for events, conditions, and status
kubectl describe pod <pod-name> -n <namespace>

# 3. Check current container logs
kubectl logs <pod-name> -n <namespace> -c <container-name>

# 4. Check previous container logs (critical for CrashLoopBackOff)
# Always check --previous before current logs for crashed containers,
# because deleting or restarting the pod destroys these logs permanently.
kubectl logs <pod-name> -n <namespace> -c <container-name> --previous

# 5. Check namespace events sorted by time
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# 6. If the container is running, exec in for live inspection
kubectl exec -it <pod-name> -n <namespace> -c <container-name> -- /bin/sh

Use read-only commands (describe, logs, get) to gather evidence before proposing any modifications. Never suggest changes based on assumptions -- gather diagnostic output first.

Diagnosis Routing

Based on triage output, load the appropriate reference and follow its diagnosis flow:

| Symptom | Reference | |---------|-----------| | Pod status CrashLoopBackOff, ImagePullBackOff, or Pending | references/crash-diagnosis.md | | Service unreachable, DNS failure, connection refused | references/network-debugging.md | | CPU throttling, OOMKill, disk pressure, need debug container | references/resource-debugging.md |

Error: "no endpoints available for service"

Cause: The Service selector does not match any running pod labels. Solution: Compare kubectl get svc <name> -o yaml selector with kubectl get pods --show-labels. Fix the label mismatch.

References

kubernetes-security skill -- NetworkPolicy patterns and RBAC debugging

notque/kubernetes-debugging

skills/infrastructure/kubernetes-debugging/SKILL.md

Kubernetes debugging for pod failures and networking.

404 stars

development

Updated Jul 6, 2026

$ install --global

skillsauth

npx skillsauth add notque/claude-code-toolkit kubernetes-debugging

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 6, 2026, 5:09 AM279.7s4 files scanned

SKILL.md

name:: kubernetes-debugging
promoted_to:: kubernetes
description:: Kubernetes debugging for pod failures and networking.
user-invocable:: false
context:: fork
agent:: kubernetes-helm-engineer
category:: kubernetes

Kubernetes Debugging Skill

Systematic diagnosis of pod failures, networking issues, and resource problems using a structured triage flow: describe, logs, events, exec.

Reference Loading Table

Load greedily. If the user's question touches any signal keyword, load the matching reference before responding. Multiple signals matching = load all matching references.

Instructions

Triage Flow

Always specify -n <namespace> explicitly in every command; never rely on the default context namespace, because the wrong namespace silently returns empty or misleading results.

# 1. Get an overview of the resource state
kubectl get pods -n <namespace> -o wide

# 2. Describe the resource for events, conditions, and status
kubectl describe pod <pod-name> -n <namespace>

# 3. Check current container logs
kubectl logs <pod-name> -n <namespace> -c <container-name>

# 4. Check previous container logs (critical for CrashLoopBackOff)
# Always check --previous before current logs for crashed containers,
# because deleting or restarting the pod destroys these logs permanently.
kubectl logs <pod-name> -n <namespace> -c <container-name> --previous

# 5. Check namespace events sorted by time
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# 6. If the container is running, exec in for live inspection
kubectl exec -it <pod-name> -n <namespace> -c <container-name> -- /bin/sh

Use read-only commands (describe, logs, get) to gather evidence before proposing any modifications. Never suggest changes based on assumptions -- gather diagnostic output first.

Diagnosis Routing

Based on triage output, load the appropriate reference and follow its diagnosis flow:

Error: "no endpoints available for service"

Cause: The Service selector does not match any running pod labels. Solution: Compare kubectl get svc <name> -o yaml selector with kubectl get pods --show-labels. Fix the label mismatch.

References

kubernetes-security skill -- NetworkPolicy patterns and RBAC debugging

Related Skills

notque/shell-config

tools

VerifiedTrustedCommunity

Shell configuration: Fish and Zsh setup, PATH, completions, plugins.

412SKILL.mdUpdated Jul 6, 2026