config/skills/infra/k8s-operator/SKILL.md
Query and diagnose the home Kubernetes cluster. Use when checking cluster health, troubleshooting pods/services/routes, inspecting storage, or understanding what's deployed. Covers Talos node management, Ceph storage, Cilium networking.
npx skillsauth add gavinmcfall/agentic-config k8s-operatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Query before acting. Understand scope, then drill down.
talosctl for node management.home-ops-deployer skill).kubectl apply or kubectl edit — Flux reverts manual changes.kubectl delete pods to "fix" things — find the root cause.| Node | IP | Roles | OS | |------|-----|-------|----| | your-cluster-01 | 10.90.3.101 | control-plane | Talos v1.11.5 | | your-cluster-02 | 10.90.3.102 | control-plane | Talos v1.11.5 | | your-cluster-03 | 10.90.3.103 | control-plane | Talos v1.11.5 |
3x NVMe OSDs, ~5.2 TiB total capacity.
| Storage Class | Provisioner | Use |
|---------------|-------------|-----|
| ceph-block (default) | rook-ceph.rbd.csi.ceph.com | Single-instance apps, databases |
| ceph-filesystem | rook-ceph.cephfs.csi.ceph.com | Shared/multi-instance (RWX) |
| ceph-bucket | rook-ceph.ceph.rook.io/bucket | S3-compatible object storage |
| openebs-hostpath | openebs.io/local | Node-local storage (monitoring, caches) |
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd status
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph df
| IP | Service | |----|---------| | 10.99.8.201 | External gateway (public via Cloudflare tunnel) | | 10.99.8.202 | Internal gateway (LAN direct) |
| IP | Device | |----|--------| | 10.90.254.1 | UDM Pro (router/DNS/firewall) | | 10.96.0.10 | CoreDNS (cluster DNS) |
| Service | Type | Namespace | Notes | |---------|------|-----------|-------| | postgres18-cluster | PostgreSQL 18 | database | 3 replicas, CloudNative-PG, pgBackRest backups | | postgres18-immich | PostgreSQL 18 | database | 3 replicas, dedicated to Immich | | mariadb | MariaDB | database | Single instance | | dragonfly | Redis-compatible | database | 6 replicas, Dragonfly operator | | mosquitto | MQTT | database | Single instance |
See references/service-inventory.md for the full list.
Key namespaces for game development context:
Talos is immutable. No SSH, no package manager, no shell access on nodes.
# Node health
talosctl --nodes 10.90.3.101 health
talosctl --nodes 10.90.3.101 get members
# System logs
talosctl --nodes 10.90.3.101 logs kubelet
talosctl --nodes 10.90.3.101 dmesg
# Node config (read-only)
talosctl --nodes 10.90.3.101 get machineconfig
# Apply config changes (use with caution)
talosctl --nodes 10.90.3.101 apply-config --file <config.yaml>
# Upgrade Talos
task talos:upgrade node=your-cluster-01
# Upgrade Kubernetes
task talos:upgrade-k8s
See references/troubleshooting.md for detailed workflows.
# Cluster overview
kubectl get nodes -o wide
kubectl top nodes
# All pods not Running
kubectl get pods -A --field-selector status.phase!=Running
# Recent events
kubectl get events -A --sort-by=.lastTimestamp | tail -20
# Flux status
flux get kustomizations
flux get helmreleases -A | grep -v True
# Specific app
flux get hr <name> -n <namespace>
kubectl logs -n <namespace> -l app.kubernetes.io/name=<app>
kubectl describe pod -n <namespace> -l app.kubernetes.io/name=<app>
| Command | Purpose |
|---------|---------|
| task kubernetes:kubeconform | Validate YAML schemas |
| task kubernetes:resources | List all cluster resources |
| task kubernetes:sync-secrets | Force ExternalSecret refresh |
| task kubernetes:network ns=<ns> | Debug pod networking (spawns netshoot) |
| task flux:reconcile | Force full Flux reconciliation |
| task flux:hr-restart | Restart failed HelmReleases |
| task volsync:snapshot app=<app> ns=<ns> | Snapshot an app PVC |
| task volsync:restore app=<app> ns=<ns> | Restore a PVC from snapshot |
| task volsync:list ns=<ns> | List snapshots |
references/service-inventory.md — Full list of deployed servicesreferences/troubleshooting.md — Step-by-step diagnostic workflows~/home-ops/docs/ai-context/NETWORKING.md — Full networking architecture~/home-ops/docs/ai-context/ARCHITECTURE.md — Backup strategy, operational limitsUnderstand first. Act through GitOps.
development
Deeply personal mentor and guide. Use when struggling, wanting to quit, feeling overwhelmed, or doubting yourself. Empathy-first. Build this skill around YOUR psychology.
tools
Build automation workflows with n8n for game dev tasks. Use when automating repetitive processes, setting up notifications, scheduling backups, or connecting services. Reduces manual overhead that ADHD brains find hardest to maintain.
devops
Deploy and manage applications in the home-ops Kubernetes cluster via GitOps. Use when deploying new apps, modifying existing ones, adding routing, managing secrets, or working with the home-ops repo structure.
development
Navigate the Steam publishing pipeline from Steamworks registration to post-launch. Use when setting up a store page, preparing builds for upload, planning a release timeline, pricing, marketing, or deciding about Early Access. For a solo dev publishing their first game.