skills/core/volcano-node-resources/SKILL.md
Query cluster node resources for Volcano scheduling. Check allocatable CPU, memory, GPU, and current usage.
npx skillsauth add scitix/siclaw volcano-node-resourcesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Query cluster node resources to understand capacity and availability for Volcano scheduling. This skill helps identify resource bottlenecks at the node level.
Scope: This skill is for diagnosis only. It retrieves resource information but does not modify any cluster state.
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh [options]
| Parameter | Required | Description |
|-----------|----------|-------------|
| --node NODE | no | Query specific node only |
| --label LABEL | no | Filter nodes by label (e.g., gpu=true) |
| --show-usage | no | Show current resource usage (requires metrics-server) |
| --show-pods | no | Show pods running on each node |
| --format FORMAT | no | Output format: table (default), json, wide |
Get overview of all nodes:
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh
Check specific node:
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --node worker-1
Check GPU nodes:
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --label nvidia.com/gpu.present=true
Show resource usage:
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --show-usage
Show with pod information:
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --show-pods
JSON output for parsing:
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --format json
| Resource | Description | Scheduling Impact |
|----------|-------------|-------------------|
| cpu | CPU cores (millicores) | Primary scheduling constraint |
| memory | RAM (bytes) | Primary scheduling constraint |
| nvidia.com/gpu | GPU devices | Hardware-specific scheduling |
| pods | Max pods per node | Density limit |
| ephemeral-storage | Disk space | Secondary constraint |
Reservations include:
When --show-usage is enabled and metrics-server is available:
Key insights:
Available = Allocatable - Allocated (sum of all requests)
Nodes with zero or negative available resources cannot accept new pods.
For Gang scheduling, you need:
Number of nodes with Available >= Pod Request >= minMember
Example:
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh
Look for nodes with positive Available CPU/Memory. Nodes with zero or near-zero availability cannot schedule new pods.
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --label nvidia.com/gpu.present=true --show-usage
Check:
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --show-usage
Fragmentation indicators:
If pods require specific labels:
# Check nodes with required labels
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --label <required-label>
# Verify sufficient resources
kubectl describe node <node-name> | grep -A 10 "Allocated resources"
Monitor trends over time:
# Current capacity
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh --format json
# Check usage trends (if metrics available)
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
echo "=== $node ==="
kubectl top node $node 2>/dev/null || echo "Metrics not available"
done
| Status | Meaning | Action |
|--------|---------|--------|
| Ready | Node healthy and schedulable | Normal |
| NotReady | Node unhealthy | Check node conditions |
| SchedulingDisabled | Node cordoned | May need uncordon |
Check node conditions:
kubectl get nodes -o json | jq '.items[].status.conditions'
Taints prevent pod scheduling:
kubectl get nodes -o custom-columns='NAME:.metadata.name,TAINTS:.spec.taints[*].key'
Common taints:
node.kubernetes.io/not-readynode.kubernetes.io/unreachablenode.kubernetes.io/disk-pressurenode.kubernetes.io/memory-pressurenode.kubernetes.io/pid-pressureSymptom: Available resources near zero, new pods stuck pending
Check:
kubectl describe node <node-name> | grep -A 5 "Allocated resources"
Solution:
Symptom: Node has GPUs but not showing as allocatable
Check:
kubectl get node <node> -o jsonpath='{.status.allocatable.nvidia\.com/gpu}'
Solution:
Symptom: Node has node.kubernetes.io/memory-pressure taint
Check:
kubectl describe node <node-name> | grep -A 3 "MemoryPressure"
Solution:
Symptom: Node has node.kubernetes.io/disk-pressure taint
Check:
kubectl describe node <node-name> | grep -A 3 "DiskPressure"
Solution:
Human-readable table:
NAME CPU_ALLOC MEM_ALLOC GPU_ALLOC CPU_AVAIL MEM_AVAIL
node-1 32 64Gi 4 8 16Gi
node-2 16 32Gi 0 2 4Gi
Additional columns:
NAME CPU MEM GPU CPU_AVAIL MEM_AVAIL STATUS AGE
Machine-parseable output:
{
"nodes": [
{
"name": "node-1",
"allocatable": {
"cpu": "32",
"memory": "64Gi",
"nvidia.com/gpu": "4"
},
"available": {
"cpu": "8",
"memory": "16Gi"
}
}
]
}
| Variable | Default | Description |
|----------|---------|-------------|
| NODE_LABEL | "" | Default label selector for nodes |
Combine with other skills for comprehensive analysis:
# 1. Check node resources
bash skills/core/volcano-node-resources/scripts/get-node-resources.sh
# 2. Check queue resources
bash skills/core/volcano-queue-diagnose/scripts/diagnose-queue.sh
# 3. If insufficient resources, refer to volcano-resource-insufficient skill guide
# (This is a documentation skill - follow the diagnostic steps in the SKILL.md)
volcano-resource-insufficient - Resource shortage diagnosisvolcano-diagnose-pod - Pod-specific scheduling issuesvolcano-gang-scheduling - Gang constraint analysisvolcano-queue-diagnose - Queue resource distributiondevelopment
Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.
development
Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.
development
Retrieve and analyze Volcano scheduler logs. Filter by keyword, time range, or pod name to debug scheduling decisions.
tools
View Volcano scheduler configuration. Check scheduler ConfigMap, actions, plugins, and tier settings.