skills/core/cluster-events/SKILL.md
Analyze cluster-wide Kubernetes events to identify issues and patterns. Aggregates Warning events, detects high-frequency patterns, and correlates related events.
npx skillsauth add scitix/siclaw cluster-eventsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this flow to analyze cluster-wide events for identifying issues, patterns, and correlations across resources.
Scope: This skill is for analysis and diagnosis only. It helps you understand what is happening across the cluster by examining events. Do NOT attempt to fix issues directly — identify root causes and either use a specific diagnostic skill or report findings to the user.
Get all events sorted by time, focusing on Warning events:
kubectl get events -A --sort-by='.lastTimestamp' --field-selector type=Warning
If you need all event types for context:
kubectl get events -A --sort-by='.lastTimestamp'
For events in a specific namespace:
kubectl get events -n <ns> --sort-by='.lastTimestamp'
Look for events with high COUNT values — these indicate repeated occurrences and often point to persistent issues.
For a structured view:
kubectl get events -A --field-selector type=Warning -o custom-columns='LAST SEEN:.lastTimestamp,COUNT:.count,KIND:.involvedObject.kind,NAME:.involvedObject.name,NAMESPACE:.involvedObject.namespace,REASON:.reason,MESSAGE:.message'
When you find Warning events, check if the same resource has related events that tell a more complete story:
kubectl get events -n <ns> --field-selector involvedObject.name=<resource-name>
Match the Warning events against the patterns below. For each matched pattern, recommend the appropriate diagnostic skill or action.
FailedScheduling — Pod cannot be scheduledThe scheduler cannot place a pod on any node.
Next step: Use the pod-pending-debug skill to diagnose the specific pod. If the pod has a scheduling.volcano.sh/pod-group annotation (managed by Volcano scheduler), use volcano-diagnose-pod skill instead for Volcano-specific issues (PodGroup, Queue, Gang scheduling).
BackOff / Back-off restarting failed container — Container crash loopA container is repeatedly crashing and restarting.
Next step: Use the pod-crash-debug skill to diagnose the specific pod.
Failed / ErrImagePull / ImagePullBackOff — Image pull failureThe container image cannot be pulled.
Next step: Use the image-pull-debug skill to diagnose the specific pod.
FailedMount / FailedAttachVolume — Volume mount failureA volume (PVC, ConfigMap, Secret, or other) cannot be mounted.
Check the specific error message:
not found — the referenced ConfigMap/Secret/PVC does not existalready attached — the volume is stuck on another node (common with RWO PVs)timed out waiting — the storage provisioner is slow or failingUnhealthy — Probe failureA liveness or readiness probe is failing.
Check which probe is failing from the event message:
Advise the user to check probe configuration (endpoint, port, timing parameters).
NodeNotReady — Node became unhealthyA node transitioned to NotReady state, which may affect all pods on that node.
Next step: Use the node-health-check skill to diagnose the specific node.
Evicted — Pod was evictedA pod was evicted from a node, typically due to resource pressure (DiskPressure, MemoryPressure).
Check which node evicted the pod and investigate node health:
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.reason} {.status.message}'
FailedCreate — Controller cannot create podsA ReplicaSet, Job, or other controller cannot create pods. Common causes: resource quota exceeded, admission webhook rejection.
Check the controller's events:
kubectl describe rs <replicaset> -n <ns>
OOMKilling — Kernel OOM killer invokedThe kernel killed a process due to memory exhaustion. This may affect containers on the node.
Next step: Use the pod-crash-debug skill for the affected pod, or node-health-check for the node.
count > 1 show the first and last timestamp — the actual frequency may be higher than it appears.development
Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.
development
Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.
development
Retrieve and analyze Volcano scheduler logs. Filter by keyword, time range, or pod name to debug scheduling decisions.
tools
View Volcano scheduler configuration. Check scheduler ConfigMap, actions, plugins, and tier settings.