skills/core/volcano-scheduler-config/SKILL.md
View Volcano scheduler configuration. Check scheduler ConfigMap, actions, plugins, and tier settings.
npx skillsauth add scitix/siclaw volcano-scheduler-configInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
View Volcano scheduler configuration to understand scheduling policies, enabled plugins, and actions. This skill helps diagnose configuration-related scheduling behaviors.
Scope: This skill is for diagnosis only. It retrieves configuration for analysis but does not modify any cluster state.
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh [options]
| Parameter | Required | Description |
|-----------|----------|-------------|
| --section SECTION | no | Show specific section: actions, plugins, tiers, all (default: all) |
| --format FORMAT | no | Output format: yaml, json, summary (default: summary) |
| --raw | no | Show raw ConfigMap data without parsing |
Get summary of scheduler configuration:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh
View actions configuration only:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section actions
View plugins configuration:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section plugins
Show full YAML configuration:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --format yaml
Get raw ConfigMap:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --raw
The Volcano scheduler configuration is stored in:
volcano-systemvolcano-scheduler-configmapvolcano-scheduler.confactions: "enqueue, allocate, backfill" # Pipeline order
tiers:
- plugins:
- name: priority
- name: gang
- name: conformance
- plugins:
- name: overcommit
- name: drf
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack
Actions define the two-phase scheduling pipeline:
A job can pass enqueue (PodGroup moves to Inqueue) but fail allocation (pods stay Pending) if node-level constraints block placement. This two-phase model explains the common scenario: "queue has capacity but pods won't schedule."
Actions in order:
| Action | Required | Purpose |
|--------|----------|---------|
| enqueue | Yes | Admit pod groups to queue |
| allocate | Yes | Allocate resources to pods |
| backfill | No | Fill idle resources with best-effort pods |
| preempt | No | Evict low-priority pods for high-priority |
| reclaim | No | Reclaim resources from over-allocated queues |
| elect | No | Select target workload (removed in v1.6+) |
Tiers divide plugins into priority levels:
Common tier organization:
| Plugin | Function | Critical For |
|--------|----------|--------------|
| gang | Gang scheduling | Batch jobs, ML training |
| priority | Priority sorting | Workload prioritization |
| conformance | Protect critical pods | System stability |
| Plugin | Function | Use Case |
|--------|----------|----------|
| proportion | Fair share allocation | Multi-tenant clusters |
| drf | Dominant Resource Fairness | Fair GPU/CPU sharing |
| overcommit | Allow overcommit | Resource efficiency |
| Plugin | Function | Use Case |
|--------|----------|----------|
| predicates | Node filtering | Resource/affinity matching |
| nodeorder | Node ranking | Node selection optimization |
| binpack | Dense packing | Reduce fragmentation |
| numaaware | NUMA topology | HPC workloads |
| Plugin | Function | Use Case |
|--------|----------|----------|
| priority | Queue ordering | Queue prioritization |
| proportion | Proportional share | Fair queue allocation |
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section plugins
Look for name: gang in the plugin list. If missing, Gang scheduling will not work.
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section actions
Look for reclaim in the actions list. If missing, queues cannot reclaim resources.
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section plugins
Look for name: proportion. If missing, queue fair-share allocation is disabled.
Default configuration:
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
enablePreemptable: false
- name: conformance
- plugins:
- name: overcommit
- name: drf
enablePreemptable: false
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack
If your configuration differs significantly, it may explain scheduling behaviors.
Some plugins accept arguments:
- name: overcommit
arguments:
overcommit-factor: "1.2"
Use --format yaml or --raw to see full plugin configurations including arguments.
Symptom: PodGroups stay Pending, minMember never satisfied
Check:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section plugins | grep gang
Solution: Add gang plugin to configuration
Symptom: Queues cannot reclaim resources from over-allocated queues
Check:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section actions | grep reclaim
Solution: Add reclaim to actions list
Symptom: Unexpected scheduling behavior
Check:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section actions
Common orders:
enqueue, allocate, backfillenqueue, allocate, backfill, preemptenqueue, allocate, backfill, reclaimenqueue, allocate, backfill, preempt, reclaimImportant: enqueue must come before allocate. allocate must be present.
Symptom: Priority/preemption not working as expected
Check:
bash skills/core/volcano-scheduler-config/scripts/get-scheduler-config.sh --section tiers
Guideline:
| Plugin | Argument | Default | Description |
|--------|----------|---------|-------------|
| gang | enablePreemptable | true | Allow Gang pods to be preempted |
| overcommit | overcommit-factor | 1.2 | Multiplier for allocatable resources |
| drf | enablePreemptable | true | Allow DRF pods to be preempted |
| nodeorder | various weights | 0 | Node scoring weights |
| proportion | (none) | - | No arguments |
| predicates | GPUSharingEnable | false | Enable GPU sharing |
jobEnqueueableFn from pluginsovercommit-factor of 1.2 means 20% overcommitallocate action for each tierpriority plugin for comparisonpreemptableFn from pluginsproportion pluginreclaimableFn from plugins| Variable | Default | Description |
|----------|---------|-------------|
| VOLCANO_SCHEDULER_NS | volcano-system | Scheduler namespace |
| VOLCANO_SCHEDULER_CONFIG | volcano-scheduler-configmap | ConfigMap name |
Human-readable summary:
Scheduler Configuration
=======================
Actions: enqueue, allocate, backfill
Tier 1 Plugins:
- priority
- gang
- conformance
Tier 2 Plugins:
- overcommit
- drf
- predicates
- proportion
- nodeorder
- binpack
Parsed YAML structure:
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
Machine-parseable:
{
"actions": "enqueue, allocate, backfill",
"tiers": [
{
"plugins": [
{"name": "priority"},
{"name": "gang"}
]
}
]
}
ConfigMap raw output:
volcano-scheduler.conf:
---
actions: "enqueue, allocate, backfill"
tiers:
...
volcano-queue-diagnose - Queue resource analysisvolcano-gang-scheduling - Gang scheduling issuesvolcano-diagnose-pod - Pod scheduling diagnosisvolcano-scheduler-logs - Scheduler decision logstesting
Show and ping the gateway of a network interface, on a Kubernetes node or inside a pod's network namespace. Auto-detects the gateway from the routing table (ip -j route), reports interface type (RoCE / Ethernet / IB), and tests reachability with ping. Use for default-route / gateway questions, network reachability checks, RoCE/RDMA data-path validation, and "can this node/pod reach its gateway" investigations.
development
Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.
devops
Retrieve logs from a Kubernetes node. Supports journalctl (systemd units) and file-based logs. Use when you need to inspect node-level logs (containerd, kubelet, etc.). Run via host_script (preferred) or node_script.
development
Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.