.claude/skills/cluster-audit/SKILL.md
Full cluster state audit. Use when asked for cluster status, health check, or infrastructure audit. Produces comprehensive report of all nodes, pods, services, PVCs, GPUs, and issues.
npx skillsauth add Dirty13itch/kaizen cluster-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run a comprehensive audit of the Kaizen K8s cluster. Execute ALL of these in sequence:
kubectl get nodes -o widekubectl get pods -A -o widekubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeededkubectl get svc -Akubectl get pvc -Akubectl describe nodes | grep -A5 "nvidia.com/gpu"kubectl get events -A --sort-by=.lastTimestamp --no-headers | tail -20curl -s --max-time 3 http://10.10.10.10:30000/v1/models and curl -s --max-time 3 http://10.10.10.10:30001/v1/modelscurl -s --max-time 3 http://10.10.10.10:30800/health and curl -s --max-time 3 http://10.10.10.10:30810/healthProduce a summary table: | Component | Status | Details | |-----------|--------|---------|
Flag any CrashLoopBackOff, OOMKilled, or Pending pods. Note GPU utilization vs allocation.
testing
Pre-commit validation suite for manifests, scripts, and configs
testing
Run the full integration test suite
testing
Check Kaizen system health across all nodes and services
testing
Resume from last checkpoint — show progress, cluster state, next actions