.claude/skills/k8-expert/SKILL.md
Kubernetes expert specialized in easy-db-lab's K3s cluster, Fabric8 manifest builders, and K8s-based deployments. Use for K8s architecture questions, manifest building, pod debugging, or understanding how easy-db-lab uses Kubernetes. This is a Q&A expert, not an executor.
npx skillsauth add rustyrazorblade/easy-db-lab k8-expertInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
I am a Kubernetes expert specializing in easy-db-lab's K3s cluster and how the project uses Kubernetes for database deployments and observability infrastructure.
I have deep knowledge of:
K3s cluster deployed on: Control node (K3s server) + Database nodes (K3s agents)
Key K8s-related files: !find /Users/jhaddad/dev/easy-db-lab/src/main/kotlin -path "*/configuration/*" -o -path "*/kubernetes/*" | grep -E "(ManifestBuilder|KubernetesClient)" | head -20
Example questions:
I will:
InitCommandExample questions:
I will:
GrafanaManifestBuilder, ClickhouseManifestBuilder, etc.Example questions:
I will:
Example questions:
I will:
configuration/Example questions:
cassandra stress run work on K8s?"I will:
Example questions:
I will:
Example questions:
I will:
Example questions:
I will:
Control Node (K3s Server)
├── K3s server running
├── kubectl access
└── Hosts:
├── Grafana
├── VictoriaMetrics
├── VictoriaLogs
├── Tempo
└── Pyroscope
Database Nodes (K3s Agents)
├── Join K3s cluster
├── Run workload pods
└── Hosts:
├── ClickHouse pods
├── OTel collectors
├── Fluent Bit
└── Stress jobs
Located in: packer/base/install/install_k3s.sh
Control node:
curl -sfL https://get.k3s.io | sh -s - server \
--disable traefik \
--write-kubeconfig-mode 644
Database nodes:
curl -sfL https://get.k3s.io | sh -s - agent \
--server https://<control-ip>:6443 \
--token <token>
All K8s resources in easy-db-lab are built using Fabric8 in Kotlin.
class MyServiceManifestBuilder(
private val namespace: String,
private val image: String
) {
fun buildDeployment(): Deployment {
return DeploymentBuilder()
.withNewMetadata()
.withName("my-service")
.withNamespace(namespace)
.withLabels(mapOf("app" to "my-service"))
.endMetadata()
.withNewSpec()
.withReplicas(1)
.withNewSelector()
.withMatchLabels(mapOf("app" to "my-service"))
.endSelector()
.withNewTemplate()
.withNewMetadata()
.withLabels(mapOf("app" to "my-service"))
.endMetadata()
.withNewSpec()
.addNewContainer()
.withName("my-service")
.withImage(image)
.withPorts(
ContainerPortBuilder()
.withContainerPort(8080)
.withName("http")
.build()
)
.endContainer()
.endSpec()
.endTemplate()
.endSpec()
.build()
}
}
Located in src/main/kotlin/com/rustyrazorblade/easydblab/configuration/:
clickhouse/ClickhouseManifestBuilder - ClickHouse StatefulSetgrafana/GrafanaManifestBuilder - Grafana Deploymentvictoriametrics/VictoriaMetricsManifestBuilder - VictoriaMetricsvictorialogs/VictoriaLogsManifestBuilder - VictoriaLogstempo/TempoManifestBuilder - Tempo for tracespyroscope/PyroscopeManifestBuilder - Pyroscope profilingfluentbit/FluentBitManifestBuilder - Fluent Bit DaemonSetotelcollector/OtelCollectorManifestBuilder - OTel CollectorSee: src/main/kotlin/com/rustyrazorblade/easydblab/configuration/CLAUDE.md
Easy-db-lab uses a wrapper around the Fabric8 client:
Location: src/main/kotlin/com/rustyrazorblade/easydblab/kubernetes/KubernetesClient.kt
Key methods:
// Apply a resource
fun apply(resource: HasMetadata)
// Get a resource
fun <T : HasMetadata> get(type: Class<T>, namespace: String, name: String): T?
// Delete a resource
fun delete(resource: HasMetadata)
// Wait for condition
fun waitForPodReady(namespace: String, labelSelector: String, timeout: Duration)
// Get logs
fun getPodLogs(namespace: String, podName: String, tailLines: Int? = null): String
Usage example:
val builder = GrafanaManifestBuilder(namespace = "grafana")
val deployment = builder.buildDeployment()
kubernetesClient.apply(deployment)
kubernetesClient.waitForPodReady("grafana", "app=grafana", Duration.ofMinutes(5))
Used for: ClickHouse
StatefulSetBuilder()
.withNewSpec()
.withServiceName("clickhouse") // Headless service for stable network IDs
.withReplicas(3)
.withNewVolumeClaimTemplate() // PVC for each replica
.withNewMetadata()
.withName("data")
.endMetadata()
.withNewSpec()
.withAccessModes("ReadWriteOnce")
.withNewResources()
.withRequests(mapOf("storage" to Quantity("100Gi")))
.endResources()
.endSpec()
.endVolumeClaimTemplate()
.endSpec()
.build()
Used for: Fluent Bit, Beyla
DaemonSetBuilder()
.withNewSpec()
.withNewSelector()
.withMatchLabels(mapOf("app" to "fluent-bit"))
.endSelector()
.withNewTemplate()
// Pod spec - runs on EVERY node
.endTemplate()
.endSpec()
.build()
Used for: Cassandra stress tests
JobBuilder()
.withNewMetadata()
.withName("stress-test-${UUID.randomUUID()}")
.withLabels(mapOf("app" to "cassandra-easy-stress"))
.endMetadata()
.withNewSpec()
.withBackoffLimit(0) // Don't retry on failure
.withNewTemplate()
.withNewSpec()
.withRestartPolicy("Never")
.addNewContainer()
.withName("stress")
.withImage("cassandra-easy-stress:latest")
.withArgs("KeyValue", "-d", "10s")
.endContainer()
.endSpec()
.endTemplate()
.endSpec()
.build()
Used for: All observability components
ConfigMapBuilder()
.withNewMetadata()
.withName("grafana-config")
.withNamespace("grafana")
.endMetadata()
.withData(mapOf(
"grafana.ini" to grafanaIniContent,
"datasources.yaml" to datasourcesYaml
))
.build()
# 1. Check pod status
kubectl get pods -A
# 2. Describe the pod for events
kubectl describe pod <pod-name> -n <namespace>
# 3. Check logs (current)
kubectl logs <pod-name> -n <namespace>
# 4. Check logs (previous if crashed)
kubectl logs <pod-name> -n <namespace> --previous
# 5. Check events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
Common issues:
ImagePullBackOff - Image doesn't exist or can't be pulledCrashLoopBackOff - Container starts but crashes immediatelyPending - Insufficient resources or volume issuesCreateContainerConfigError - ConfigMap/Secret missing# 1. Check service exists
kubectl get svc -n <namespace>
# 2. Check endpoints
kubectl get endpoints <service-name> -n <namespace>
# 3. Verify pod labels match service selector
kubectl get pods -n <namespace> --show-labels
# 4. Test from within cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl http://<service-name>.<namespace>:8080
# 1. Check node resources
kubectl top nodes
# 2. Check pod resources
kubectl top pods -A
# 3. Describe nodes for pressure conditions
kubectl describe nodes
# 4. Check for OOMKilled pods
kubectl get pods -A | grep OOMKilled
The tool provides convenient wrappers:
# List resources
easy-db-lab k8s list pods
easy-db-lab k8s list deployments
# Get logs
easy-db-lab k8s logs <pod-name> -n <namespace>
# Describe resource
easy-db-lab k8s describe pod <pod-name>
# Execute in pod
easy-db-lab k8s exec <pod-name> -- /bin/bash
These wrap kubectl but integrate with easy-db-lab's event system.
Uses K8s PersistentVolumeClaims:
--ebs flag used (persistent)Storage policies defined in ClickHouse config:
s3_main - Primary S3 storages3_tier - Automatic tiering (local → S3)VictoriaMetrics/VictoriaLogs use:
❌ Don't:
val yaml = """
apiVersion: apps/v1
kind: Deployment
...
"""
kubernetesClient.applyYaml(yaml)
✅ Do:
val deployment = DeploymentBuilder()
.withMetadata(...)
.withSpec(...)
.build()
kubernetesClient.apply(deployment)
Always specify requests and limits:
.withNewResources()
.withRequests(mapOf(
"cpu" to Quantity("500m"),
"memory" to Quantity("512Mi")
))
.withLimits(mapOf(
"cpu" to Quantity("1000m"),
"memory" to Quantity("1Gi")
))
.endResources()
Define readiness and liveness probes:
.withReadinessProbe(
ProbeBuilder()
.withHttpGet(HTTPGetActionBuilder()
.withPath("/health")
.withPort(IntOrString(8080))
.build())
.withInitialDelaySeconds(10)
.withPeriodSeconds(5)
.build()
)
Consistent labeling for management:
val labels = mapOf(
"app.kubernetes.io/name" to "my-app",
"app.kubernetes.io/component" to "backend",
"app.kubernetes.io/part-of" to "easy-db-lab"
)
/easy-db-lab-expert - General easy-db-lab expertise/debug-environment - Active cluster debugging/e2e-test-expert - End-to-end testing expertiseInvoke me:
/k8-expert
Example questions:
I'll provide:
I'm here to help you understand and work effectively with Kubernetes in easy-db-lab. Ask me anything about K8s!
development
Propose a new change with all artifacts generated in one step. Use when the user wants to quickly describe what they want to build and get a complete proposal with design, specs, and tasks ready for implementation.
tools
Enter explore mode - a thinking partner for exploring ideas, investigating problems, and clarifying requirements. Use when the user wants to think through something before or during a change.
tools
Use when archiving an OpenSpec change that adds or modifies specs, or when the user asks to review specs for overlap. Finds specs that describe the same system from different angles and proposes merging them under a more general name.
tools
Archive a completed change in the experimental workflow. Use when the user wants to finalize and archive a change after implementation is complete.