Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

scitix/node-health-check

Name: node-health-check
Author: scitix

skills/core/node-health-check/SKILL.md

npx skillsauth add scitix/siclaw node-health-check

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Node Health Check

When nodes are NotReady, experiencing resource pressure, or suspected of causing pod failures, follow this flow to diagnose node-level issues.

Scope: This skill is for diagnosis only. Once you identify the root cause, report it to the user and stop. Do NOT attempt to drain, cordon, or restart nodes — that should be left to the user or cluster administrator.

Diagnostic Flow

1. Get node overview

kubectl get nodes -o wide

Note the STATUS of each node. Healthy nodes show Ready. Look for NotReady, SchedulingDisabled, or condition-related flags like Ready,SchedulingDisabled.

2. Inspect specific node conditions

For any node showing issues:

kubectl describe node <node>

Focus on the Conditions section. Key conditions:

| Condition | Healthy Value | Problem Value | Meaning | |-----------|--------------|---------------|---------| | Ready | True | False/Unknown | Kubelet is healthy and can accept pods | | MemoryPressure | False | True | Node memory usage is critically high | | DiskPressure | False | True | Node disk usage exceeds eviction threshold | | PIDPressure | False | True | Too many processes running on node | | NetworkUnavailable | False | True | Node network is not configured correctly |

Also check:

Allocatable vs Capacity — shows total resources and what's available for pods
Allocated resources section — shows how much is requested/limited by pods on this node
Events — look for recent warnings

3. Check real-time resource usage

kubectl top node <node>

Compare actual CPU and memory usage against the node's allocatable resources from step 2.

4. Match condition and conclude

`NotReady` — Kubelet not responding

The kubelet on the node is not communicating with the API server. Common causes:

Kubelet service crashed or stopped
Node is powered off or unreachable
Network partition between node and control plane

If node-level logs are available, use the node-logs skill to check kubelet logs:

bash skills/core/node-logs/scripts/get-node-logs.sh \
  --node <node> --unit kubelet --since "30m ago" --tail 100

Report the node's NotReady status and any kubelet errors to the user.

`DiskPressure` — Disk usage exceeds threshold

The node's disk usage exceeds the eviction threshold (typically 85%). The kubelet will start evicting pods.

Check which pods are using the most ephemeral storage:

kubectl get pods --field-selector spec.nodeName=<node> -A -o wide

Advise the user to clean up unused images/containers, increase disk size, or move workloads to other nodes.

`MemoryPressure` — Memory usage critically high

The node's memory usage is critically high. The kubelet may evict pods based on their QoS class (BestEffort first, then Burstable).

Check pod memory usage on the node:

kubectl top pods --field-selector spec.nodeName=<node> -A --sort-by=memory

If the above doesn't work (field-selector may not be supported for top), list pods on the node and check their usage:

kubectl get pods --field-selector spec.nodeName=<node> -A -o wide
kubectl top pods -A --sort-by=memory | head -20

`PIDPressure` — Too many processes

The node is running too many processes. This can prevent new containers from starting.

Advise the user to investigate which pods are creating excessive processes, and consider setting PID limits in the container runtime or kubelet configuration.

`NetworkUnavailable` — Node network not configured

The node's network plugin (CNI) has not configured networking. The CNI plugin may not be installed, crashed, or failed to initialize.

Check CNI pod status on the node:

kubectl get pods -A --field-selector spec.nodeName=<node> | grep -E 'cni|calico|cilium|flannel|weave'

`SchedulingDisabled` — Node is cordoned

The node has been cordoned (kubectl cordon) and will not accept new pods. Existing pods continue running.

This is usually intentional (maintenance). Report to the user that the node is cordoned.

5. Check allocated resources (optional)

If resource overcommitment is suspected:

kubectl describe node <node> | grep -A 20 "Allocated resources"

Compare the total requests against allocatable resources. If CPU or memory requests exceed 90% of allocatable, new pods may fail to schedule on this node.

6. Node hardware / resource diagnostics

After general health checks, search your skill list for node-level hardware or resource diagnostic skills (e.g., RDMA/RoCE config checks, GPU diagnostics, storage checks). Run any that match this node's characteristics. Each skill auto-detects whether it applies to the node — no pre-check needed on your part.

Notes

kubectl top requires the Metrics Server to be installed in the cluster. If it returns an error, the metrics server may not be available.
Node conditions have a lastTransitionTime — this tells you when the condition last changed, which helps correlate with events or changes.
For multi-node issues, check if there's a common pattern (same zone, same instance type, same kernel version) that might indicate an infrastructure-level problem.

scitix/node-health-check

skills/core/node-health-check/SKILL.md

Check node health and diagnose node-level issues (NotReady, DiskPressure, MemoryPressure, PIDPressure). Inspects node conditions, resource allocation, and real-time usage.

88 stars

testing

Updated Apr 12, 2026

$ install --global

skillsauth

npx skillsauth add scitix/siclaw node-health-check

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 11:13 AM5.8s1 file scanned

SKILL.md

name:: node-health-check
description:: >-

Node Health Check

When nodes are NotReady, experiencing resource pressure, or suspected of causing pod failures, follow this flow to diagnose node-level issues.

Diagnostic Flow

1. Get node overview

kubectl get nodes -o wide

Note the STATUS of each node. Healthy nodes show Ready. Look for NotReady, SchedulingDisabled, or condition-related flags like Ready,SchedulingDisabled.

2. Inspect specific node conditions

For any node showing issues:

kubectl describe node <node>

Focus on the Conditions section. Key conditions:

Also check:

Allocatable vs Capacity — shows total resources and what's available for pods
Allocated resources section — shows how much is requested/limited by pods on this node
Events — look for recent warnings

3. Check real-time resource usage

kubectl top node <node>

Compare actual CPU and memory usage against the node's allocatable resources from step 2.

4. Match condition and conclude

`NotReady` — Kubelet not responding

The kubelet on the node is not communicating with the API server. Common causes:

Kubelet service crashed or stopped
Node is powered off or unreachable
Network partition between node and control plane

If node-level logs are available, use the node-logs skill to check kubelet logs:

bash skills/core/node-logs/scripts/get-node-logs.sh \
  --node <node> --unit kubelet --since "30m ago" --tail 100

Report the node's NotReady status and any kubelet errors to the user.

`DiskPressure` — Disk usage exceeds threshold

The node's disk usage exceeds the eviction threshold (typically 85%). The kubelet will start evicting pods.

Check which pods are using the most ephemeral storage:

kubectl get pods --field-selector spec.nodeName=<node> -A -o wide

Advise the user to clean up unused images/containers, increase disk size, or move workloads to other nodes.

`MemoryPressure` — Memory usage critically high

The node's memory usage is critically high. The kubelet may evict pods based on their QoS class (BestEffort first, then Burstable).

Check pod memory usage on the node:

kubectl top pods --field-selector spec.nodeName=<node> -A --sort-by=memory

If the above doesn't work (field-selector may not be supported for top), list pods on the node and check their usage:

kubectl get pods --field-selector spec.nodeName=<node> -A -o wide
kubectl top pods -A --sort-by=memory | head -20

`PIDPressure` — Too many processes

The node is running too many processes. This can prevent new containers from starting.

Advise the user to investigate which pods are creating excessive processes, and consider setting PID limits in the container runtime or kubelet configuration.

`NetworkUnavailable` — Node network not configured

The node's network plugin (CNI) has not configured networking. The CNI plugin may not be installed, crashed, or failed to initialize.

Check CNI pod status on the node:

kubectl get pods -A --field-selector spec.nodeName=<node> | grep -E 'cni|calico|cilium|flannel|weave'

`SchedulingDisabled` — Node is cordoned

The node has been cordoned (kubectl cordon) and will not accept new pods. Existing pods continue running.

This is usually intentional (maintenance). Report to the user that the node is cordoned.

5. Check allocated resources (optional)

If resource overcommitment is suspected:

kubectl describe node <node> | grep -A 20 "Allocated resources"

Compare the total requests against allocatable resources. If CPU or memory requests exceed 90% of allocatable, new pods may fail to schedule on this node.

6. Node hardware / resource diagnostics

Notes

kubectl top requires the Metrics Server to be installed in the cluster. If it returns an error, the metrics server may not be available.
Node conditions have a lastTransitionTime — this tells you when the condition last changed, which helps correlate with events or changes.
For multi-node issues, check if there's a common pattern (same zone, same instance type, same kernel version) that might indicate an infrastructure-level problem.

Related Skills

scitix/skill-authoring

development

VerifiedTrustedCommunity

Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.

207SKILL.mdUpdated Apr 23, 2026

scitix/skill-authoring

scitix/manage-skill

development

VerifiedTrustedCommunity

Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.

88SKILL.mdUpdated Apr 12, 2026

scitix/volcano-scheduler-logs

development

VerifiedTrustedCommunity

Retrieve and analyze Volcano scheduler logs. Filter by keyword, time range, or pod name to debug scheduling decisions.

88SKILL.mdUpdated Apr 12, 2026

scitix/volcano-scheduler-logs

scitix/volcano-scheduler-config

tools

VerifiedTrustedCommunity

View Volcano scheduler configuration. Check scheduler ConfigMap, actions, plugins, and tier settings.

88SKILL.mdUpdated Apr 12, 2026

scitix/volcano-scheduler-config

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/scitix/siclaw.git

# Copy into Claude Code skills folder (global)
cp -r siclaw/skills/core/node-health-check ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

scitix/siclaw

88 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

scitix/node-health-check

$ install --global

Security Scan Results

SKILL.md

Node Health Check

Diagnostic Flow

1. Get node overview

2. Inspect specific node conditions

3. Check real-time resource usage

4. Match condition and conclude

NotReady — Kubelet not responding

DiskPressure — Disk usage exceeds threshold

MemoryPressure — Memory usage critically high

PIDPressure — Too many processes

NetworkUnavailable — Node network not configured

SchedulingDisabled — Node is cordoned

5. Check allocated resources (optional)

6. Node hardware / resource diagnostics

Notes

Related Skills

scitix/skill-authoring

scitix/manage-skill

scitix/volcano-scheduler-logs

scitix/volcano-scheduler-config

scitix/node-health-check

$ install --global

Security Scan Results

SKILL.md

Node Health Check

Diagnostic Flow

1. Get node overview

2. Inspect specific node conditions

3. Check real-time resource usage

4. Match condition and conclude

NotReady — Kubelet not responding

DiskPressure — Disk usage exceeds threshold

MemoryPressure — Memory usage critically high

PIDPressure — Too many processes

NetworkUnavailable — Node network not configured

SchedulingDisabled — Node is cordoned

5. Check allocated resources (optional)

6. Node hardware / resource diagnostics

Notes

Related Skills

scitix/skill-authoring

scitix/manage-skill

scitix/volcano-scheduler-logs

scitix/volcano-scheduler-config

`NotReady` — Kubelet not responding

`DiskPressure` — Disk usage exceeds threshold

`MemoryPressure` — Memory usage critically high

`PIDPressure` — Too many processes

`NetworkUnavailable` — Node network not configured

`SchedulingDisabled` — Node is cordoned

`NotReady` — Kubelet not responding

`DiskPressure` — Disk usage exceeds threshold

`MemoryPressure` — Memory usage critically high

`PIDPressure` — Too many processes

`NetworkUnavailable` — Node network not configured

`SchedulingDisabled` — Node is cordoned