.claude/skills/debug-environment/SKILL.md
Comprehensive debugging for easy-db-lab environments. Diagnose cluster issues, service failures, connectivity problems, K8s pod failures, SSH issues, and configuration problems. Use when troubleshooting any easy-db-lab deployment or runtime issue.
npx skillsauth add rustyrazorblade/easy-db-lab debug-environmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are debugging an easy-db-lab environment. This skill helps diagnose and fix issues with cluster deployments, service health, connectivity, and configuration.
IMPORTANT: These clusters are NEVER long running. Issues are:
Current branch: !git branch --show-current
First, check that the current directory has the required environment files:
Required files:
sshConfig - SSH configuration for connecting to cluster nodeskubeconfig - Kubernetes configuration for K8s API accessstate.json - Cluster state and metadataCheck for files: !ls -lh sshConfig kubeconfig state.json 2>&1
If any files are missing: The environment is not properly initialized. Check if you're in the correct directory or if the cluster needs to be created/started.
Cluster state information: !cat state.json 2>/dev/null | head -50
Extract key information from state.json:
Test SSH connectivity to cluster nodes using the sshConfig:
# List all SSH hosts configured
grep "^Host " sshConfig
# Test connection to control node (usually named "control")
ssh -F sshConfig control "echo 'SSH connectivity OK'" 2>&1
# Test connection to first database node (usually named "db-0" or "cassandra-0")
ssh -F sshConfig db-0 "echo 'SSH connectivity OK'" 2>&1
Test K8s connectivity and check pod status:
# Use the kubeconfig from the environment
export KUBECONFIG=$(pwd)/kubeconfig
# Check K8s connectivity
kubectl cluster-info 2>&1
# List all pods across all namespaces
kubectl get pods -A 2>&1
# Check for failed or pending pods
kubectl get pods -A | grep -v "Running\|Completed" 2>&1
# Check recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20 2>&1
You have access to all easy-db-lab commands. Build the project first if needed:
# Build the project (if not already built)
./gradlew shadowJar
# Common debugging commands:
java -jar build/libs/easy-db-lab-*-all.jar <command>
Key commands for debugging:
list - List all environmentsinfo - Show cluster informationstatus - Check cluster statusssh <node> - SSH into a specific nodelogs query - Query logs from VictoriaLogslogs ls - List available log streamsmetrics backup - Backup metricsgrafana update-config - Update Grafana configurationstart - Start cluster nodesstop - Stop cluster nodesrestart - Restart cluster nodesk8s list - List K8s resourcesk8s logs <pod> - Get logs from a podCheck pod status and events:
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get pods -A
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
Common causes:
Check sshConfig and test connectivity:
cat sshConfig
ssh -F sshConfig -v control
Common causes:
Check service health on nodes:
# SSH to the node
ssh -F sshConfig <node-name>
# Check systemd services
systemctl status <service-name>
journalctl -u <service-name> -n 100
# Check ports
ss -tlnp | grep <port>
Check observability services (VictoriaMetrics, VictoriaLogs, Grafana, etc.):
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get pods -n monitoring
kubectl get pods -n grafana
kubectl logs <pod-name> -n <namespace>
If the issue may be related to recent config changes:
# Check what changed in the current branch
git diff main -- src/main/resources/
git diff main -- src/main/kotlin/com/rustyrazorblade/easydblab/configuration/
# Check K8s manifests
ls -la src/main/resources/com/rustyrazorblade/easydblab/commands/k8s/
Use SSH to connect to nodes and investigate:
# SSH to control node
ssh -F sshConfig control
# SSH to database node
ssh -F sshConfig db-0
# Common investigation commands on nodes:
docker ps # Check running containers
docker logs <container> # Check container logs
systemctl list-units --failed # Check failed services
journalctl -n 100 # Check system logs
df -h # Check disk space
free -h # Check memory
top # Check CPU/processes
Use kubeconfig to investigate K8s resources:
export KUBECONFIG=$(pwd)/kubeconfig
# Check all resources
kubectl get all -A
# Check specific resources
kubectl get deployments -A
kubectl get statefulsets -A
kubectl get daemonsets -A
kubectl get services -A
kubectl get configmaps -A
kubectl get secrets -A
# Check resource details
kubectl describe <resource-type> <resource-name> -n <namespace>
# Check logs
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous # Previous container logs
# Check events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
Based on the information gathered:
Identify the problem area: Cluster startup? Service failure? Network issue? Configuration?
Gather evidence:
Form hypothesis:
Test hypothesis:
Propose solution:
Verify fix:
docs/ directory for user documentationsrc/main/kotlin/com/rustyrazorblade/easydblab/configuration/ for K8s manifest builderssrc/main/resources/com/rustyrazorblade/easydblab/ for templates and configsProvide a structured debugging report:
development
Propose a new change with all artifacts generated in one step. Use when the user wants to quickly describe what they want to build and get a complete proposal with design, specs, and tasks ready for implementation.
tools
Enter explore mode - a thinking partner for exploring ideas, investigating problems, and clarifying requirements. Use when the user wants to think through something before or during a change.
tools
Use when archiving an OpenSpec change that adds or modifies specs, or when the user asks to review specs for overlap. Finds specs that describe the same system from different angles and proposes merging them under a more general name.
tools
Archive a completed change in the experimental workflow. Use when the user wants to finalize and archive a change after implementation is complete.