Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

rohitg00/k8s-troubleshoot

Name: k8s-troubleshoot
Author: rohitg00

kubernetes-skills/claude/k8s-troubleshoot/SKILL.md

npx skillsauth add rohitg00/kubectl-mcp-server k8s-troubleshoot

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Kubernetes Troubleshooting

Expert debugging and diagnostics for Kubernetes clusters using kubectl-mcp-server tools.

When to Apply

Use this skill when:

User mentions: "debug", "troubleshoot", "diagnose", "failing", "crash", "not starting", "broken"
Pod states: Pending, CrashLoopBackOff, ImagePullBackOff, OOMKilled, Error, Unknown
Node issues: NotReady, MemoryPressure, DiskPressure, NetworkUnavailable, PIDPressure
Keywords: "logs", "events", "describe", "why isn't working", "stuck", "not responding"

Priority Rules

| Priority | Rule | Impact | Tools | |----------|------|--------|-------| | 1 | Check pod status first | CRITICAL | get_pods, describe_pod | | 2 | View recent events | CRITICAL | get_events | | 3 | Inspect logs (including previous) | HIGH | get_pod_logs | | 4 | Check resource metrics | HIGH | get_pod_metrics | | 5 | Verify endpoints | MEDIUM | get_endpoints | | 6 | Review network policies | MEDIUM | get_network_policies | | 7 | Examine node status | LOW | get_nodes, describe_node |

Quick Reference

| Symptom | First Tool | Next Steps | |---------|------------|------------| | Pod Pending | describe_pod | Check events, node capacity, resource requests | | CrashLoopBackOff | get_pod_logs(previous=True) | Check exit code, resources, liveness probes | | ImagePullBackOff | describe_pod | Verify image name, registry auth, network | | OOMKilled | get_pod_metrics | Increase memory limits, check for memory leaks | | ContainerCreating | describe_pod | Check PVC binding, secrets, configmaps | | Terminating (stuck) | describe_pod | Check finalizers, PDBs, preStop hooks |

Diagnostic Workflows

Pod Not Starting

1. get_pods(namespace, label_selector) - Get pod status
2. describe_pod(name, namespace) - See events and conditions
3. get_events(namespace, field_selector="involvedObject.name=<pod>") - Check events
4. get_pod_logs(name, namespace, previous=True) - For crash loops

Common Pod States

| State | Likely Cause | Tools to Use | |-------|-------------|--------------| | Pending | Scheduling issues | describe_pod, get_nodes, get_events | | ImagePullBackOff | Registry/auth | describe_pod, check image name | | CrashLoopBackOff | App crash | get_pod_logs(previous=True) | | OOMKilled | Memory limit | get_pod_metrics, adjust limits | | ContainerCreating | Volume/network | describe_pod, get_pvc |

Node Issues

1. get_nodes() - List nodes and status
2. describe_node(name) - See conditions and capacity
3. Check: Ready, MemoryPressure, DiskPressure, PIDPressure
4. node_logs_tool(name, "kubelet") - Kubelet logs

Deep Debugging Workflows

CrashLoopBackOff Investigation

1. get_pod_logs(name, namespace, previous=True) - See why it crashed
2. describe_pod(name, namespace) - Check resource limits, probes
3. get_pod_metrics(name, namespace) - Memory/CPU at crash time
4. If OOM: compare requests/limits to actual usage
5. If app error: check logs for stack trace

Networking Issues

1. get_services(namespace) - Verify service exists
2. get_endpoints(namespace) - Check endpoint backends
3. If empty endpoints: pods don't match selector
4. get_network_policies(namespace) - Check traffic rules
5. For Cilium: cilium_endpoints_list_tool(), hubble_flows_query_tool()

Storage Problems

1. get_pvc(namespace) - Check PVC status
2. describe_pvc(name, namespace) - See binding issues
3. get_storage_classes() - Verify provisioner exists
4. If Pending: check storage class, access modes

DNS Resolution

1. kubectl_exec(pod, namespace, "nslookup kubernetes.default") - Test DNS
2. If fails: check coredns pods in kube-system
3. get_pods(namespace="kube-system", label_selector="k8s-app=kube-dns")
4. get_pod_logs(name="coredns-*", namespace="kube-system")

Multi-Cluster Debugging

All tools support context parameter for targeting different clusters:

get_pods(namespace="kube-system", context="production-cluster")
get_events(namespace="default", context="staging-cluster")
describe_pod(name="myapp-xyz", namespace="prod", context="prod-east")

Diagnostic Scripts

For comprehensive diagnostics, run the bundled scripts:

See scripts/diagnose-pod.py for automated pod analysis
See scripts/health-check.sh for cluster health checks

Decision Tree

See references/DECISION-TREE.md for visual troubleshooting flowcharts.

Common Errors Reference

See references/COMMON-ERRORS.md for error message explanations and fixes.

Related Tools

Core Diagnostics

get_pods, describe_pod, get_pod_logs, get_pod_metrics
get_events, get_nodes, describe_node
get_resource_usage, compare_namespaces

Advanced (Ecosystem)

Cilium: cilium_endpoints_list_tool, hubble_flows_query_tool
Istio: istio_proxy_status_tool, istio_analyze_tool

Related Skills

k8s-diagnostics - Metrics and health checks
k8s-incident - Emergency runbooks
k8s-networking - Network troubleshooting

rohitg00/k8s-troubleshoot

kubernetes-skills/claude/k8s-troubleshoot/SKILL.md

Debug Kubernetes pods, nodes, and workloads. Use when pods are failing, containers crash, nodes are unhealthy, or users mention debugging, troubleshooting, or diagnosing Kubernetes issues.

865 stars

development

Updated Apr 11, 2026

$ install --global

skillsauth

npx skillsauth add rohitg00/kubectl-mcp-server k8s-troubleshoot

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 11, 2026, 10:33 PM15.1s5 files scanned

SKILL.md

name:: k8s-troubleshoot
description:: Debug Kubernetes pods, nodes, and workloads. Use when pods are failing, containers crash, nodes are unhealthy, or users mention debugging, troubleshooting, or diagnosing Kubernetes issues.
license:: Apache-2.0
author:: rohitg00
version:: 1.0.0
tools:: 15
category:: observability

Kubernetes Troubleshooting

Expert debugging and diagnostics for Kubernetes clusters using kubectl-mcp-server tools.

When to Apply

Use this skill when:

User mentions: "debug", "troubleshoot", "diagnose", "failing", "crash", "not starting", "broken"
Pod states: Pending, CrashLoopBackOff, ImagePullBackOff, OOMKilled, Error, Unknown
Node issues: NotReady, MemoryPressure, DiskPressure, NetworkUnavailable, PIDPressure
Keywords: "logs", "events", "describe", "why isn't working", "stuck", "not responding"

Priority Rules

Quick Reference

Diagnostic Workflows

Pod Not Starting

1. get_pods(namespace, label_selector) - Get pod status
2. describe_pod(name, namespace) - See events and conditions
3. get_events(namespace, field_selector="involvedObject.name=<pod>") - Check events
4. get_pod_logs(name, namespace, previous=True) - For crash loops

Common Pod States

Node Issues

1. get_nodes() - List nodes and status
2. describe_node(name) - See conditions and capacity
3. Check: Ready, MemoryPressure, DiskPressure, PIDPressure
4. node_logs_tool(name, "kubelet") - Kubelet logs

Deep Debugging Workflows

CrashLoopBackOff Investigation

1. get_pod_logs(name, namespace, previous=True) - See why it crashed
2. describe_pod(name, namespace) - Check resource limits, probes
3. get_pod_metrics(name, namespace) - Memory/CPU at crash time
4. If OOM: compare requests/limits to actual usage
5. If app error: check logs for stack trace

Networking Issues

1. get_services(namespace) - Verify service exists
2. get_endpoints(namespace) - Check endpoint backends
3. If empty endpoints: pods don't match selector
4. get_network_policies(namespace) - Check traffic rules
5. For Cilium: cilium_endpoints_list_tool(), hubble_flows_query_tool()

Storage Problems

1. get_pvc(namespace) - Check PVC status
2. describe_pvc(name, namespace) - See binding issues
3. get_storage_classes() - Verify provisioner exists
4. If Pending: check storage class, access modes

DNS Resolution

1. kubectl_exec(pod, namespace, "nslookup kubernetes.default") - Test DNS
2. If fails: check coredns pods in kube-system
3. get_pods(namespace="kube-system", label_selector="k8s-app=kube-dns")
4. get_pod_logs(name="coredns-*", namespace="kube-system")

Multi-Cluster Debugging

All tools support context parameter for targeting different clusters:

get_pods(namespace="kube-system", context="production-cluster")
get_events(namespace="default", context="staging-cluster")
describe_pod(name="myapp-xyz", namespace="prod", context="prod-east")

Diagnostic Scripts

For comprehensive diagnostics, run the bundled scripts:

See scripts/diagnose-pod.py for automated pod analysis
See scripts/health-check.sh for cluster health checks

Decision Tree

See references/DECISION-TREE.md for visual troubleshooting flowcharts.

Common Errors Reference

See references/COMMON-ERRORS.md for error message explanations and fixes.

Related Tools

Core Diagnostics

get_pods, describe_pod, get_pod_logs, get_pod_metrics
get_events, get_nodes, describe_node
get_resource_usage, compare_namespaces

Advanced (Ecosystem)

Cilium: cilium_endpoints_list_tool, hubble_flows_query_tool
Istio: istio_proxy_status_tool, istio_analyze_tool

Related Skills

k8s-diagnostics - Metrics and health checks
k8s-incident - Emergency runbooks
k8s-networking - Network troubleshooting

Related Skills

rohitg00/k8s-vind

development

VerifiedTrustedCommunity

Manage vCluster (virtual Kubernetes clusters) instances using vind. Use when creating, managing, or operating lightweight virtual clusters for development, testing, or multi-tenancy.

865SKILL.mdUpdated Apr 11, 2026

rohitg00/k8s-storage

devops

VerifiedTrustedCommunity

Kubernetes storage management for PVCs, storage classes, and persistent volumes. Use when provisioning storage, managing volumes, or troubleshooting storage issues.

865SKILL.mdUpdated Apr 11, 2026

rohitg00/k8s-service-mesh

testing

VerifiedTrustedCommunity

Manage Istio service mesh for traffic management, security, and observability. Use for traffic shifting, canary releases, mTLS, and service mesh troubleshooting.

865SKILL.mdUpdated Apr 11, 2026

rohitg00/k8s-service-mesh

rohitg00/k8s-security

testing

VerifiedTrustedCommunity

Audit Kubernetes RBAC, enforce policies, and manage secrets. Use for security reviews, permission audits, policy enforcement with Kyverno/Gatekeeper, and secret management.

865SKILL.mdUpdated Apr 11, 2026

rohitg00/k8s-security

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/rohitg00/kubectl-mcp-server.git

# Copy into Claude Code skills folder (global)
cp -r kubectl-mcp-server/kubernetes-skills/claude/k8s-troubleshoot ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

rohitg00/kubectl-mcp-server

865 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT