Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

rohitg00/k8s-incident

Name: k8s-incident
Author: rohitg00

kubernetes-skills/claude/k8s-incident/SKILL.md

npx skillsauth add rohitg00/kubectl-mcp-server k8s-incident

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Kubernetes Incident Response

Runbooks and diagnostic workflows for common Kubernetes incidents.

When to Apply

Use this skill when:

User mentions: "incident", "outage", "emergency", "down", "not working"
Operations: emergency response, production issues, service degradation
Keywords: "urgent", "broken", "fix", "restore", "recover"

Priority Rules

| Priority | Rule | Impact | Tools | |----------|------|--------|-------| | 1 | Check control plane first | CRITICAL | get_pods(namespace="kube-system") | | 2 | Assess node health | CRITICAL | get_nodes | | 3 | Gather events before changes | HIGH | get_events | | 4 | Document timeline | HIGH | Manual notes | | 5 | Rollback if safe | MEDIUM | rollback_deployment |

Quick Reference

| Incident | First Tool | Next Steps | |----------|------------|------------| | Pod failure | get_pod_logs(previous=True) | describe_pod, get_events | | Node down | describe_node | Check kubelet logs | | Service unreachable | get_endpoints | get_network_policies | | Control plane | get_pods(namespace="kube-system") | Check API server logs |

Incident Triage

Quick Health Check

get_nodes()
get_pods(namespace="kube-system")
get_events(namespace)

Severity Assessment

| Indicator | Severity | Action | |-----------|----------|--------| | Multiple nodes NotReady | Critical | Escalate immediately | | kube-system pods failing | Critical | Control plane issue | | Single pod CrashLoop | Medium | Debug pod | | High latency | Medium | Check resources |

Runbook: Pod Failures

CrashLoopBackOff

get_pod_logs(name, namespace, previous=True)
describe_pod(name, namespace)
get_events(namespace, field_selector="involvedObject.name=<pod>")
get_pod_metrics(name, namespace)

Common Causes:

OOMKilled → Increase memory limits
Exit code 1 → Application error in logs
Exit code 137 → Killed by OOM or SIGKILL
Exit code 143 → Graceful SIGTERM

ImagePullBackOff

describe_pod(name, namespace)
get_secrets(namespace)

Pending Pod

describe_pod(name, namespace)
get_nodes()
get_events(namespace)

Runbook: Node Issues

Node NotReady

describe_node(name)
get_events(namespace="", field_selector="involvedObject.name=<node>")
node_logs_tool(name, "kubelet")

Node DiskPressure

describe_node(name)
get_pods(field_selector="spec.nodeName=<node>")

Runbook: Network Issues

Service Not Accessible

get_services(namespace)
get_endpoints(namespace)
get_pods(namespace, label_selector="<service-selector>")
get_network_policies(namespace)

DNS Resolution Failures

get_pods(namespace="kube-system", label_selector="k8s-app=kube-dns")
get_pod_logs("coredns-xxx", "kube-system")

With Cilium

cilium_status_tool()
cilium_endpoints_list_tool(namespace)
hubble_flows_query_tool(namespace)

With Istio

istio_analyze_tool(namespace)
istio_proxy_status_tool()

Runbook: Storage Issues

PVC Pending

describe_pvc(name, namespace)
get_storage_classes()
get_events(namespace)

Pod Stuck in ContainerCreating

describe_pod(name, namespace)
get_pvc(namespace)
get_events(namespace)

Runbook: Control Plane Issues

API Server Unavailable

get_pods(namespace="kube-system", label_selector="component=kube-apiserver")
get_events(namespace="kube-system")

etcd Issues

get_pods(namespace="kube-system", label_selector="component=etcd")
get_pod_logs("etcd-xxx", "kube-system")

Emergency Actions

Force Delete Pod

delete_pod(name, namespace, grace_period=0, force=True)

Rollback Deployment

rollback_deployment(name, namespace, revision=0)

Helm Rollback

rollback_helm_release(name, namespace, revision=1)

Diagnostic Collection Script

For comprehensive incident diagnostics, see scripts/collect-diagnostics.py.

Multi-Cluster Incident Response

Check all clusters:

for context in ["prod-1", "prod-2", "staging"]:
    get_nodes(context=context)
    get_pods(namespace="kube-system", context=context)
    get_events(namespace="kube-system", context=context)

Post-Incident

Document Timeline

When did the incident start?
What was the impact?
What was the root cause?
What fixed it?

Prevent Recurrence

Add monitoring/alerting
Improve resource limits
Add readiness probes
Document runbook

Related Skills

k8s-troubleshoot - Detailed debugging
k8s-security - Security incidents

rohitg00/k8s-incident

kubernetes-skills/claude/k8s-incident/SKILL.md

Respond to Kubernetes incidents with runbooks and diagnostics. Use for outages, pod failures, node issues, network problems, and emergency response.

865 stars

testing

Updated Apr 11, 2026

$ install --global

skillsauth

npx skillsauth add rohitg00/kubectl-mcp-server k8s-incident

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 11, 2026, 10:32 PM212.1s2 files scanned

SKILL.md

name:: k8s-incident
description:: Respond to Kubernetes incidents with runbooks and diagnostics. Use for outages, pod failures, node issues, network problems, and emergency response.
license:: Apache-2.0
author:: rohitg00
version:: 1.0.0
tools:: 15
category:: observability

Kubernetes Incident Response

Runbooks and diagnostic workflows for common Kubernetes incidents.

When to Apply

Use this skill when:

User mentions: "incident", "outage", "emergency", "down", "not working"
Operations: emergency response, production issues, service degradation
Keywords: "urgent", "broken", "fix", "restore", "recover"

Priority Rules

Quick Reference

Incident Triage

Quick Health Check

get_nodes()
get_pods(namespace="kube-system")
get_events(namespace)

Severity Assessment

Runbook: Pod Failures

CrashLoopBackOff

get_pod_logs(name, namespace, previous=True)
describe_pod(name, namespace)
get_events(namespace, field_selector="involvedObject.name=<pod>")
get_pod_metrics(name, namespace)

Common Causes:

OOMKilled → Increase memory limits
Exit code 1 → Application error in logs
Exit code 137 → Killed by OOM or SIGKILL
Exit code 143 → Graceful SIGTERM

ImagePullBackOff

describe_pod(name, namespace)
get_secrets(namespace)

Pending Pod

describe_pod(name, namespace)
get_nodes()
get_events(namespace)

Runbook: Node Issues

Node NotReady

describe_node(name)
get_events(namespace="", field_selector="involvedObject.name=<node>")
node_logs_tool(name, "kubelet")

Node DiskPressure

describe_node(name)
get_pods(field_selector="spec.nodeName=<node>")

Runbook: Network Issues

Service Not Accessible

get_services(namespace)
get_endpoints(namespace)
get_pods(namespace, label_selector="<service-selector>")
get_network_policies(namespace)

DNS Resolution Failures

get_pods(namespace="kube-system", label_selector="k8s-app=kube-dns")
get_pod_logs("coredns-xxx", "kube-system")

With Cilium

cilium_status_tool()
cilium_endpoints_list_tool(namespace)
hubble_flows_query_tool(namespace)

With Istio

istio_analyze_tool(namespace)
istio_proxy_status_tool()

Runbook: Storage Issues

PVC Pending

describe_pvc(name, namespace)
get_storage_classes()
get_events(namespace)

Pod Stuck in ContainerCreating

describe_pod(name, namespace)
get_pvc(namespace)
get_events(namespace)

Runbook: Control Plane Issues

API Server Unavailable

get_pods(namespace="kube-system", label_selector="component=kube-apiserver")
get_events(namespace="kube-system")

etcd Issues

get_pods(namespace="kube-system", label_selector="component=etcd")
get_pod_logs("etcd-xxx", "kube-system")

Emergency Actions

Force Delete Pod

delete_pod(name, namespace, grace_period=0, force=True)

Rollback Deployment

rollback_deployment(name, namespace, revision=0)

Helm Rollback

rollback_helm_release(name, namespace, revision=1)

Diagnostic Collection Script

For comprehensive incident diagnostics, see scripts/collect-diagnostics.py.

Multi-Cluster Incident Response

Check all clusters:

for context in ["prod-1", "prod-2", "staging"]:
    get_nodes(context=context)
    get_pods(namespace="kube-system", context=context)
    get_events(namespace="kube-system", context=context)

Post-Incident

Document Timeline

When did the incident start?
What was the impact?
What was the root cause?
What fixed it?

Prevent Recurrence

Add monitoring/alerting
Improve resource limits
Add readiness probes
Document runbook

Related Skills

k8s-troubleshoot - Detailed debugging
k8s-security - Security incidents

Related Skills

rohitg00/k8s-vind

development

VerifiedTrustedCommunity

Manage vCluster (virtual Kubernetes clusters) instances using vind. Use when creating, managing, or operating lightweight virtual clusters for development, testing, or multi-tenancy.

865SKILL.mdUpdated Apr 11, 2026

rohitg00/k8s-troubleshoot

development

VerifiedTrustedCommunity

Debug Kubernetes pods, nodes, and workloads. Use when pods are failing, containers crash, nodes are unhealthy, or users mention debugging, troubleshooting, or diagnosing Kubernetes issues.

865SKILL.mdUpdated Apr 11, 2026

rohitg00/k8s-troubleshoot

rohitg00/k8s-storage

devops

VerifiedTrustedCommunity

Kubernetes storage management for PVCs, storage classes, and persistent volumes. Use when provisioning storage, managing volumes, or troubleshooting storage issues.

865SKILL.mdUpdated Apr 11, 2026

rohitg00/k8s-service-mesh

testing

VerifiedTrustedCommunity

Manage Istio service mesh for traffic management, security, and observability. Use for traffic shifting, canary releases, mTLS, and service mesh troubleshooting.

865SKILL.mdUpdated Apr 11, 2026

rohitg00/k8s-service-mesh

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/rohitg00/kubectl-mcp-server.git

# Copy into Claude Code skills folder (global)
cp -r kubectl-mcp-server/kubernetes-skills/claude/k8s-incident ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

rohitg00/kubectl-mcp-server

865 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT