Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

cnoe-io/cluster-resource-health

Name: cluster-resource-health
Author: cnoe-io

ui/src/skills/cluster-resource-health/SKILL.md

npx skillsauth add cnoe-io/ai-platform-engineering cluster-resource-health

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Cluster Resource Health

Query AWS EKS clusters for node health, pod status, resource utilization, and alerts to produce a cluster health dashboard.

Instructions

Phase 1: Cluster Overview (AWS Agent)

List EKS clusters and their status:
- Cluster name, version, and status
- Node group configurations (instance types, desired/min/max counts)
- Current node count and readiness
Check Kubernetes version:
- Current version vs. latest available
- End-of-support date for current version

Phase 2: Node Health

Inspect node conditions using kubectl via the AWS agent:
- Ready, MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable
- Node allocatable vs. requested resources
- Unschedulable nodes (cordoned/drained)
Resource utilization per node:
- CPU requested vs. allocatable (%)
- Memory requested vs. allocatable (%)
- Pod count vs. pod limit

Phase 3: Pod Health

Identify problematic pods:
- CrashLoopBackOff, ImagePullBackOff, OOMKilled
- Pending pods (unable to schedule)
- Pods with high restart counts (>5)
- Evicted pods
Namespace-level summary:
- Pods running, pending, failed per namespace
- Resource quotas and limit ranges

Phase 4: Resource Capacity Analysis

Cluster-wide utilization:
- Total CPU requested vs. total allocatable
- Total memory requested vs. total allocatable
- Headroom for new workloads
Capacity risks:
- Nodes at >80% resource utilization
- Namespaces exceeding resource quotas
- PersistentVolume claims pending or near capacity

Output Format

## Cluster Resource Health Report
**Generated**: February 9, 2026

### Cluster Summary
| Cluster | Version | Nodes | Status | Overall Health |
|---------|---------|-------|--------|----------------|
| prod-us-west-2 | 1.29 | 12/12 Ready | Active | HEALTHY |
| staging-us-west-2 | 1.28 | 4/4 Ready | Active | WARNING |

### Resource Utilization (prod-us-west-2)
| Resource | Requested | Allocatable | Utilization |
|----------|-----------|-------------|-------------|
| CPU | 38 cores | 48 cores | 79% |
| Memory | 96 Gi | 128 Gi | 75% |
| Pods | 187 | 440 | 43% |

**Headroom**: Can schedule ~10 more standard pods (1 CPU, 2Gi each)

### Problematic Pods
| Pod | Namespace | Status | Restarts | Node |
|-----|-----------|--------|----------|------|
| payment-api-7d4b8c | production | CrashLoopBackOff | 23 | node-3 |
| data-pipeline-abc | batch | OOMKilled | 5 | node-7 |
| image-proc-xyz | processing | ImagePullBackOff | 0 | node-2 |

### Node Health
| Node | Status | CPU Req% | Mem Req% | Pods | Conditions |
|------|--------|----------|----------|------|------------|
| node-1 | Ready | 82% | 71% | 18 | OK |
| node-7 | Ready | 91% | 88% | 22 | MemoryPressure |

### Capacity Risks
1. **HIGH**: node-7 at 91% CPU / 88% memory - consider scaling node group
2. **MEDIUM**: staging cluster on EKS 1.28 - EOL in 60 days, plan upgrade
3. **LOW**: 3 PVCs at >80% capacity in `data` namespace

### Recommendations
1. **Immediate**: Investigate payment-api CrashLoopBackOff (23 restarts)
2. **Short-term**: Scale prod node group from 12 to 14 nodes (headroom at 79%)
3. **Planned**: Upgrade staging cluster from EKS 1.28 to 1.29
4. **Optimization**: Right-size data-pipeline pods (OOMKilled - increase memory limit)

Examples

"Check the health of our EKS clusters"
"Are there any failing pods in production?"
"Show me cluster resource utilization"
"Which nodes are under memory pressure?"
"Do we have enough capacity for a new deployment?"

Guidelines

Check all clusters unless a specific cluster is requested
Flag any node above 85% resource utilization as a capacity risk
For CrashLoopBackOff pods, suggest checking logs as the immediate action
EKS version end-of-support should be flagged at least 90 days before EOL
Group pods by issue type (crash, OOM, image pull) for easier triage
Include pod restart counts - high restarts indicate chronic issues even if currently running
When capacity is tight, recommend specific scaling actions (node count, instance type)
Use kubectl read-only commands only (never modify cluster state during health checks)

cnoe-io/cluster-resource-health

ui/src/skills/cluster-resource-health/SKILL.md

Check Kubernetes cluster health including pod status, node conditions, resource utilization, and pending alerts across EKS clusters. Use when monitoring infrastructure health, investigating capacity issues, or performing cluster audits.

338 stars

testing

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add cnoe-io/ai-platform-engineering cluster-resource-health

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 4, 2026, 3:19 PM13.0s1 file scanned

SKILL.md

name:: cluster-resource-health
description:: Check Kubernetes cluster health including pod status, node conditions, resource utilization, and pending alerts across EKS clusters. Use when monitoring infrastructure health, investigating capacity issues, or performing cluster audits.

Cluster Resource Health

Query AWS EKS clusters for node health, pod status, resource utilization, and alerts to produce a cluster health dashboard.

Instructions

Phase 1: Cluster Overview (AWS Agent)

List EKS clusters and their status:
- Cluster name, version, and status
- Node group configurations (instance types, desired/min/max counts)
- Current node count and readiness
Check Kubernetes version:
- Current version vs. latest available
- End-of-support date for current version

Phase 2: Node Health

Inspect node conditions using kubectl via the AWS agent:
- Ready, MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable
- Node allocatable vs. requested resources
- Unschedulable nodes (cordoned/drained)
Resource utilization per node:
- CPU requested vs. allocatable (%)
- Memory requested vs. allocatable (%)
- Pod count vs. pod limit

Phase 3: Pod Health

Identify problematic pods:
- CrashLoopBackOff, ImagePullBackOff, OOMKilled
- Pending pods (unable to schedule)
- Pods with high restart counts (>5)
- Evicted pods
Namespace-level summary:
- Pods running, pending, failed per namespace
- Resource quotas and limit ranges

Phase 4: Resource Capacity Analysis

Cluster-wide utilization:
- Total CPU requested vs. total allocatable
- Total memory requested vs. total allocatable
- Headroom for new workloads
Capacity risks:
- Nodes at >80% resource utilization
- Namespaces exceeding resource quotas
- PersistentVolume claims pending or near capacity

Output Format

## Cluster Resource Health Report
**Generated**: February 9, 2026

### Cluster Summary
| Cluster | Version | Nodes | Status | Overall Health |
|---------|---------|-------|--------|----------------|
| prod-us-west-2 | 1.29 | 12/12 Ready | Active | HEALTHY |
| staging-us-west-2 | 1.28 | 4/4 Ready | Active | WARNING |

### Resource Utilization (prod-us-west-2)
| Resource | Requested | Allocatable | Utilization |
|----------|-----------|-------------|-------------|
| CPU | 38 cores | 48 cores | 79% |
| Memory | 96 Gi | 128 Gi | 75% |
| Pods | 187 | 440 | 43% |

**Headroom**: Can schedule ~10 more standard pods (1 CPU, 2Gi each)

### Problematic Pods
| Pod | Namespace | Status | Restarts | Node |
|-----|-----------|--------|----------|------|
| payment-api-7d4b8c | production | CrashLoopBackOff | 23 | node-3 |
| data-pipeline-abc | batch | OOMKilled | 5 | node-7 |
| image-proc-xyz | processing | ImagePullBackOff | 0 | node-2 |

### Node Health
| Node | Status | CPU Req% | Mem Req% | Pods | Conditions |
|------|--------|----------|----------|------|------------|
| node-1 | Ready | 82% | 71% | 18 | OK |
| node-7 | Ready | 91% | 88% | 22 | MemoryPressure |

### Capacity Risks
1. **HIGH**: node-7 at 91% CPU / 88% memory - consider scaling node group
2. **MEDIUM**: staging cluster on EKS 1.28 - EOL in 60 days, plan upgrade
3. **LOW**: 3 PVCs at >80% capacity in `data` namespace

### Recommendations
1. **Immediate**: Investigate payment-api CrashLoopBackOff (23 restarts)
2. **Short-term**: Scale prod node group from 12 to 14 nodes (headroom at 79%)
3. **Planned**: Upgrade staging cluster from EKS 1.28 to 1.29
4. **Optimization**: Right-size data-pipeline pods (OOMKilled - increase memory limit)

Examples

"Check the health of our EKS clusters"
"Are there any failing pods in production?"
"Show me cluster resource utilization"
"Which nodes are under memory pressure?"
"Do we have enough capacity for a new deployment?"

Guidelines

Check all clusters unless a specific cluster is requested
Flag any node above 85% resource utilization as a capacity risk
For CrashLoopBackOff pods, suggest checking logs as the immediate action
EKS version end-of-support should be flagged at least 90 days before EOL
Group pods by issue type (crash, OOM, image pull) for easier triage
Include pod restart counts - high restarts indicate chronic issues even if currently running
When capacity is tight, recommend specific scaling actions (node count, instance type)
Use kubectl read-only commands only (never modify cluster state during health checks)

Related Skills

cnoe-io/streaming-testing

testing

VerifiedTrustedCommunity

Compare A2A streaming behaviour across supervisor versions. Captures SSE events, analyzes metadata flags (is_narration, is_final_answer), and produces side-by-side comparison reports.

345SKILL.mdUpdated Apr 15, 2026

cnoe-io/streaming-testing

cnoe-io/sprint-progress-report

testing

VerifiedTrustedCommunity

Generate a comprehensive sprint progress report from Jira with velocity metrics, burndown analysis, blocker identification, and team workload distribution. Use when preparing sprint reviews, standups, or tracking sprint health mid-cycle.

345SKILL.mdUpdated Apr 4, 2026

cnoe-io/sprint-progress-report

cnoe-io/security-vulnerability-report

development

VerifiedTrustedCommunity

Scan GitHub repositories for security vulnerabilities including Dependabot alerts, code scanning results, and secret scanning findings. Use when auditing repository security, preparing compliance reports, or triaging vulnerability alerts.

345SKILL.mdUpdated Apr 4, 2026

cnoe-io/security-vulnerability-report

cnoe-io/review-specific-pr

development

VerifiedTrustedCommunity

Perform a comprehensive code review of a specific GitHub Pull Request. Analyzes code changes, checks for bugs, security issues, test coverage, and coding standards compliance. Use when a user provides a PR URL or asks to review a specific pull request.

345SKILL.mdUpdated Apr 4, 2026

cnoe-io/review-specific-pr

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/cnoe-io/ai-platform-engineering.git

# Copy into Claude Code skills folder (global)
cp -r ai-platform-engineering/ui/src/skills/cluster-resource-health ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

cnoe-io/ai-platform-engineering

338 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT