skills/core/volcano-scheduler-logs/SKILL.md
Retrieve and analyze Volcano scheduler logs. Filter by keyword, time range, or pod name to debug scheduling decisions.
npx skillsauth add scitix/siclaw volcano-scheduler-logsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Retrieve and analyze Volcano scheduler logs to understand scheduling decisions, failures, and performance issues.
Scope: This skill is for diagnosis only. It retrieves logs for analysis but does not modify any cluster state.
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh [options]
| Parameter | Required | Description |
|-----------|----------|-------------|
| --keyword KEYWORD | no | Filter logs by keyword (case-insensitive) |
| --pod POD | no | Filter logs related to specific pod name |
| --since TIME | no | Show logs newer than relative time (e.g., 10m, 1h) |
| --lines N | no | Number of lines to show (default: 100) |
| --follow | no | Stream logs in real-time (Ctrl+C to stop) |
| --previous | no | Show logs from previous container instance (after restart) |
Get recent scheduler logs:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh
Search for error messages:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --keyword error
Get logs for a specific pod:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --pod my-job-0
Get last 500 lines from the past hour:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --since 1h --lines 500
Stream logs for gang scheduling issues:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --keyword gang --follow
Check logs from previous scheduler instance (after crash/restart):
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --previous --lines 200
| Keyword | Use Case |
|---------|----------|
| error | Find error messages and failures |
| FailedScheduling | Scheduling failures |
| allocate | Resource allocation attempts |
| gang | Gang scheduling decisions |
| minMember | MinMember constraint issues |
| preempt | Preemption events |
| reclaim | Resource reclamation |
| enqueue | Queue admission decisions |
| bind | Pod binding attempts |
| queue | Queue-related decisions |
| proportion | Proportion plugin decisions |
| priority | Priority-related decisions |
Volcano scheduler logs typically follow this format:
I0102 15:04:05.123456 1 scheduler.go:123] Starting scheduling session
I0102 15:04:05.234567 1 allocate.go:456] Try to allocate resources for Job <namespace>/<job-name>
E0102 15:04:05.345678 1 gang.go:789] Failed to schedule pod <pod-name>: minMember not satisfied
Log levels:
I - Info: Normal operation informationW - Warning: Unusual but non-fatal conditionsE - Error: Failures and errorsF - Fatal: Critical errors causing shutdownStarting scheduling session
Starting scheduling loop
Try to enqueue pod group
PodGroup <name> is enqueued
PodGroup <name> is pending
Try to allocate resources for Job
Try to allocate for task
minMember not satisfied
gang member not ready
Waiting for gang members
Insufficient cpu
Insufficient memory
0 nodes are available
Preempting pods
Found victim pods
Try to reclaim resources
Reclaiming resources from queue
Find relevant scheduler decisions:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --pod <pod-name> --since 30m
Look for:
FailedScheduling eventsminMember not satisfiedInsufficient resource messagesenqueue decisions (is the PodGroup being admitted?)Check Gang plugin behavior:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --keyword gang --since 1h
Look for:
minMember related messagesCheck proportion and reclaim decisions:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --keyword "reclaim\|proportion" --since 30m
Look for:
Check for scheduling delays:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --lines 500 | grep -E "(Starting|Finished) scheduling"
Look for:
Check preemption decisions:
bash skills/core/volcano-scheduler-logs/scripts/get-scheduler-logs.sh --keyword preempt --since 1h
Look for:
| Variable | Default | Description |
|----------|---------|-------------|
| VOLCANO_SCHEDULER_NS | volcano-system | Scheduler namespace |
| VOLCANO_SCHEDULER_LABEL | app=volcano-scheduler | Label selector for scheduler pods |
--previous only works if the container has restarted--since to focus on recent issueserror\|Failed\|failed to catch all failures--pod when investigating specific podskubectl get events timestampsvolcano-diagnose-pod - Diagnose individual pod issuesvolcano-gang-scheduling - Gang scheduling specific diagnosisvolcano-queue-diagnose - Queue resource analysisvolcano-resource-insufficient - Resource shortage diagnosistesting
Show and ping the gateway of a network interface, on a Kubernetes node or inside a pod's network namespace. Auto-detects the gateway from the routing table (ip -j route), reports interface type (RoCE / Ethernet / IB), and tests reachability with ping. Use for default-route / gateway questions, network reachability checks, RoCE/RDMA data-path validation, and "can this node/pod reach its gateway" investigations.
development
Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.
devops
Retrieve logs from a Kubernetes node. Supports journalctl (systemd units) and file-based logs. Use when you need to inspect node-level logs (containerd, kubelet, etc.). Run via host_script (preferred) or node_script.
development
Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.