.claude/skills/kaizen-infrastructure/SKILL.md
--- name: kaizen-infrastructure description: Kaizen infrastructure management. Use when working on Kubernetes manifests, Talos configuration, Flux GitOps, or cluster operations. Triggers on: k8s, kubernetes, flux, talos, cluster, node, deployment, helm, kustomize. allowed-tools: Read, Write, Bash, Grep, Glob --- # Kaizen Infrastructure Skill ## When to Use - Creating or modifying Kubernetes manifests - Configuring Talos Linux nodes - Setting up Flux GitOps resources - Deploying applications to
npx skillsauth add Dirty13itch/kaizen .claude/skills/kaizen-infrastructureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
k8s/ — All Kubernetes manifeststalos/ — Talos Linux configurationspecs/architecture.md — System design decisionsspecs/hardware-inventory.md — Available hardwareFollow the Flux monorepo pattern:
k8s/
├── clusters/<cluster>/ # Bootstrap per cluster
├── infrastructure/base/ # Shared infrastructure
├── infrastructure/overlays/<env>/
├── apps/base/ # Application definitions
└── apps/overlays/<env>/
inference, mcp-gateway)<app>-<component> (e.g., sglang-server, qdrant-cluster)<app>-config<app>-secretmetadata:
labels:
app.kubernetes.io/name: <app>
app.kubernetes.io/component: <component>
app.kubernetes.io/part-of: kaizen
nodeSelector:
nvidia.com/gpu.present: "true"
resources:
limits:
nvidia.com/gpu: 1
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: <app>
namespace: <namespace>
spec:
interval: 5m
chart:
spec:
chart: <chart>
version: "<version>"
sourceRef:
kind: HelmRepository
name: <repo>
values:
# Inline values or reference valuesFrom
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: <name>
namespace: flux-system
spec:
interval: 10m
path: ./k8s/apps/<path>
prune: true
sourceRef:
kind: GitRepository
name: flux-system
Store per-node patches in talos/patches/:
# talos/patches/interface.yaml
machine:
install:
disk: /dev/nvme0n1
network:
hostname: interface
talosctl apply-config -n <node-ip> -f controlplane.yaml --config-patch @patches/interface.yaml
k8s/apps/base/<app>/k8s/apps/overlays/<env>/<app>/kubectl get nodes
kubectl get pods -A | grep -v Running
flux get all
flux reconcile source git flux-system
flux reconcile kustomization flux-system
Before committing:
yamllint on all YAML fileskubectl diff if cluster is availableflux diff kustomizationk8s/README.md for detailed conventionsspecs/architecture.md for design decisionstesting
Pre-commit validation suite for manifests, scripts, and configs
testing
Run the full integration test suite
testing
Check Kaizen system health across all nodes and services
testing
Resume from last checkpoint — show progress, cluster state, next actions