.claude/skills/gpu-status/SKILL.md
Check GPU allocation and utilization across all nodes. Use when asked about GPUs, VRAM, or model capacity.
npx skillsauth add Dirty13itch/kaizen gpu-statusInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Check GPU state across the cluster:
GPU Allocation per Node:
kubectl describe nodes | node -e '
let d="";process.stdin.on("data",c=>d+=c);process.stdin.on("end",()=>{
const nodes=d.split("Name:");
nodes.slice(1).forEach(n=>{
const name=n.split("\n")[0].trim();
const gpuCap=(n.match(/nvidia.com\/gpu:\s+(\d+)/g)||[]);
console.log(name+":",gpuCap.join(", ")||"no GPUs");
});
})'
Pods Using GPUs:
kubectl get pods -A -o json | node -e '
let d="";process.stdin.on("data",c=>d+=c);process.stdin.on("end",()=>{
const pods=JSON.parse(d).items;
pods.forEach(p=>{
const gpu=p.spec.containers.some(c=>c.resources&&c.resources.limits&&c.resources.limits["nvidia.com/gpu"]);
if(gpu) console.log(p.metadata.namespace+"/"+p.metadata.name,"- GPU:",
p.spec.containers.map(c=>(c.resources?.limits?.["nvidia.com/gpu"]||0)).join(","));
});
})'
Live nvidia-smi (if inference pods running):
kubectl exec -n inference deploy/sglang-reasoning -- nvidia-smi --query-gpu=name,memory.used,memory.total,utilization.gpu --format=csv,noheader 2>/dev/null || echo "Cannot exec into reasoning pod"
Report VRAM used vs available, which models are loaded, and remaining capacity.
testing
Pre-commit validation suite for manifests, scripts, and configs
testing
Run the full integration test suite
testing
Check Kaizen system health across all nodes and services
testing
Resume from last checkpoint — show progress, cluster state, next actions