Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

scitix/pod-crash-debug

Name: pod-crash-debug
Author: scitix

skills/core/pod-crash-debug/SKILL.md

npx skillsauth add scitix/siclaw pod-crash-debug

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Pod Crash Failure Diagnosis

When a pod is stuck in CrashLoopBackOff, Error, OOMKilled, or RunContainerError, follow this flow to identify the root cause.

Scope: This skill is for diagnosis only. Once you identify the root cause, report it to the user and stop. Do NOT attempt to fix the application code or modify resource limits — that should be left to the user.

Diagnostic Flow

1. Get pod status

kubectl get pod <pod> -n <ns> -o wide

Note the STATUS, RESTARTS count, and NODE. A high restart count confirms the pod is crash-looping.

2. Describe the pod

kubectl describe pod <pod> -n <ns>

Focus on:

State and Last State under each container — note the reason, exit code, and signal
Events section at the bottom — look for BackOff, Failed, Unhealthy, or OOMKilling events

3. Get previous container logs

kubectl logs <pod> -n <ns> --previous --tail=200

If the pod has multiple containers, specify the crashing container:

kubectl logs <pod> -n <ns> -c <container> --previous --tail=200

If --previous fails with "previous terminated container not found", try current logs:

kubectl logs <pod> -n <ns> --tail=200

4. Match error and conclude

Match the information from steps 2-3 against the patterns below. Once a pattern matches, report the root cause to the user and stop.

`OOMKilled` / exit code 137 (from OOM) — Out of Memory

The container exceeded its memory limit and was killed by the kernel OOM killer.

Check the container's resource limits:

kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[*].resources}'

Advise the user to either increase the memory limit or investigate the application's memory usage (possible memory leak).

Exit code 137 (no OOMKilled) — SIGKILL

The container was killed by SIGKILL but not due to OOM. Common causes:

Liveness probe failure — check kubectl describe pod for Unhealthy events with Liveness probe failed
Manual kill or preemption

If liveness probe failures are present, advise the user to adjust probe timing (initialDelaySeconds, timeoutSeconds, periodSeconds) or fix the health endpoint.

Exit code 1 — Application error

The application exited with a generic error code. The root cause is in the container logs from step 3. Report the relevant error lines to the user.

Exit code 2 — Shell/binary misuse

Often indicates a missing binary, incorrect command syntax, or shell script error. Check the container's command and args:

kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[0].command} {.spec.containers[0].args}'

Exit code 126 — Permission denied

The entrypoint binary exists but is not executable. Advise the user to check file permissions in the container image.

Exit code 127 — Command not found

The entrypoint binary does not exist in the container image. Advise the user to verify the image contains the expected binary and the command field is correct.

`RunContainerError` — Container failed to start

The container runtime failed to start the container. Common causes:

Volume mount errors (invalid path, read-only filesystem)
ConfigMap/Secret not found
Invalid security context

Check events from step 2 for the specific error message and report it to the user.

`CreateContainerConfigError` — Invalid container configuration

A referenced ConfigMap, Secret, or other resource does not exist or is misconfigured. Check the events for the specific missing resource name and report it to the user.

`PostStartHookError` — Lifecycle hook failed

The container's postStart hook failed, causing the container to be killed. Check events and logs for the hook's error output.

Notes

If --previous logs are empty and the container exits immediately, the issue is likely with the entrypoint command — check the image's ENTRYPOINT/CMD and the pod's command/args override.
For init container crashes, use -c <init-container-name> to get the specific init container's logs.
If the pod has been restarted many times, logs from earlier crashes may be lost. The most recent crash's --previous logs are usually sufficient.

scitix/pod-crash-debug

skills/core/pod-crash-debug/SKILL.md

Diagnose pod crash failures (CrashLoopBackOff, OOMKilled, Error, RunContainerError). Checks pod status, events, and previous logs to identify root cause.

88 stars

development

Updated Apr 12, 2026

$ install --global

skillsauth

npx skillsauth add scitix/siclaw pod-crash-debug

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 12, 2026, 11:19 PM45.8s1 file scanned

SKILL.md

name:: pod-crash-debug
description:: >-

Pod Crash Failure Diagnosis

When a pod is stuck in CrashLoopBackOff, Error, OOMKilled, or RunContainerError, follow this flow to identify the root cause.

Diagnostic Flow

1. Get pod status

kubectl get pod <pod> -n <ns> -o wide

Note the STATUS, RESTARTS count, and NODE. A high restart count confirms the pod is crash-looping.

2. Describe the pod

kubectl describe pod <pod> -n <ns>

Focus on:

State and Last State under each container — note the reason, exit code, and signal
Events section at the bottom — look for BackOff, Failed, Unhealthy, or OOMKilling events

3. Get previous container logs

kubectl logs <pod> -n <ns> --previous --tail=200

If the pod has multiple containers, specify the crashing container:

kubectl logs <pod> -n <ns> -c <container> --previous --tail=200

If --previous fails with "previous terminated container not found", try current logs:

kubectl logs <pod> -n <ns> --tail=200

4. Match error and conclude

Match the information from steps 2-3 against the patterns below. Once a pattern matches, report the root cause to the user and stop.

`OOMKilled` / exit code 137 (from OOM) — Out of Memory

The container exceeded its memory limit and was killed by the kernel OOM killer.

Check the container's resource limits:

kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[*].resources}'

Advise the user to either increase the memory limit or investigate the application's memory usage (possible memory leak).

Exit code 137 (no OOMKilled) — SIGKILL

The container was killed by SIGKILL but not due to OOM. Common causes:

Liveness probe failure — check kubectl describe pod for Unhealthy events with Liveness probe failed
Manual kill or preemption

If liveness probe failures are present, advise the user to adjust probe timing (initialDelaySeconds, timeoutSeconds, periodSeconds) or fix the health endpoint.

Exit code 1 — Application error

The application exited with a generic error code. The root cause is in the container logs from step 3. Report the relevant error lines to the user.

Exit code 2 — Shell/binary misuse

Often indicates a missing binary, incorrect command syntax, or shell script error. Check the container's command and args:

kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[0].command} {.spec.containers[0].args}'

Exit code 126 — Permission denied

The entrypoint binary exists but is not executable. Advise the user to check file permissions in the container image.

Exit code 127 — Command not found

The entrypoint binary does not exist in the container image. Advise the user to verify the image contains the expected binary and the command field is correct.

`RunContainerError` — Container failed to start

The container runtime failed to start the container. Common causes:

Volume mount errors (invalid path, read-only filesystem)
ConfigMap/Secret not found
Invalid security context

Check events from step 2 for the specific error message and report it to the user.

`CreateContainerConfigError` — Invalid container configuration

A referenced ConfigMap, Secret, or other resource does not exist or is misconfigured. Check the events for the specific missing resource name and report it to the user.

`PostStartHookError` — Lifecycle hook failed

The container's postStart hook failed, causing the container to be killed. Check events and logs for the hook's error output.

Notes

If --previous logs are empty and the container exits immediately, the issue is likely with the entrypoint command — check the image's ENTRYPOINT/CMD and the pod's command/args override.
For init container crashes, use -c <init-container-name> to get the specific init container's logs.
If the pod has been restarted many times, logs from earlier crashes may be lost. The most recent crash's --previous logs are usually sufficient.

Related Skills

scitix/gateway-diagnostics

testing

VerifiedTrustedCommunity

Show and ping the gateway of a network interface, on a Kubernetes node or inside a pod's network namespace. Auto-detects the gateway from the routing table (ip -j route), reports interface type (RoCE / Ethernet / IB), and tests reachability with ping. Use for default-route / gateway questions, network reachability checks, RoCE/RDMA data-path validation, and "can this node/pod reach its gateway" investigations.

209SKILL.mdUpdated Jun 7, 2026

scitix/gateway-diagnostics

scitix/skill-authoring

development

VerifiedTrustedCommunity

Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.

209SKILL.mdUpdated Apr 23, 2026

scitix/skill-authoring

scitix/node-logs

devops

VerifiedTrustedCommunity

Retrieve logs from a Kubernetes node. Supports journalctl (systemd units) and file-based logs. Use when you need to inspect node-level logs (containerd, kubelet, etc.). Run via host_script (preferred) or node_script.

209SKILL.mdUpdated Apr 12, 2026

scitix/manage-skill

development

VerifiedTrustedCommunity

Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.

88SKILL.mdUpdated Apr 12, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/scitix/siclaw.git

# Copy into Claude Code skills folder (global)
cp -r siclaw/skills/core/pod-crash-debug ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

scitix/siclaw

88 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

scitix/pod-crash-debug

$ install --global

Security Scan Results

SKILL.md

Pod Crash Failure Diagnosis

Diagnostic Flow

1. Get pod status

2. Describe the pod

3. Get previous container logs

4. Match error and conclude

OOMKilled / exit code 137 (from OOM) — Out of Memory

Exit code 137 (no OOMKilled) — SIGKILL

Exit code 1 — Application error

Exit code 2 — Shell/binary misuse

Exit code 126 — Permission denied

Exit code 127 — Command not found

RunContainerError — Container failed to start

CreateContainerConfigError — Invalid container configuration

PostStartHookError — Lifecycle hook failed

Notes

Related Skills

scitix/gateway-diagnostics

scitix/skill-authoring

scitix/node-logs

scitix/manage-skill

scitix/pod-crash-debug

$ install --global

Security Scan Results

SKILL.md

Pod Crash Failure Diagnosis

Diagnostic Flow

1. Get pod status

2. Describe the pod

3. Get previous container logs

4. Match error and conclude

OOMKilled / exit code 137 (from OOM) — Out of Memory

Exit code 137 (no OOMKilled) — SIGKILL

Exit code 1 — Application error

Exit code 2 — Shell/binary misuse

Exit code 126 — Permission denied

Exit code 127 — Command not found

RunContainerError — Container failed to start

CreateContainerConfigError — Invalid container configuration

PostStartHookError — Lifecycle hook failed

Notes

Related Skills

scitix/gateway-diagnostics

scitix/skill-authoring

scitix/node-logs

scitix/manage-skill

`OOMKilled` / exit code 137 (from OOM) — Out of Memory

`RunContainerError` — Container failed to start

`CreateContainerConfigError` — Invalid container configuration

`PostStartHookError` — Lifecycle hook failed

`OOMKilled` / exit code 137 (from OOM) — Out of Memory

`RunContainerError` — Container failed to start

`CreateContainerConfigError` — Invalid container configuration

`PostStartHookError` — Lifecycle hook failed