skills/rancher-troubleshooter/SKILL.md
Diagnose and troubleshoot Rancher Desktop on WSL2, focusing on Kubernetes/K3s issues including slow API operations, etcd health problems, cluster component failures, and pod networking issues. Use when encountering Rancher Desktop errors, timeouts, or performance degradation.
npx skillsauth add lupus/my-dot-claude rancher-troubleshooterInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides systematic diagnostic workflows and solutions for troubleshooting Rancher Desktop on WSL2. It focuses on common Kubernetes cluster issues including control plane failures, etcd health problems, slow API operations, and resource constraints.
Use this skill when:
kubectl commands take longer than expected or failFollow this systematic approach to troubleshoot Rancher Desktop issues:
Start by gathering comprehensive diagnostic information to understand the current state:
Run the diagnostic script:
bash /path/to/scripts/diagnose-rancher.sh
This script collects:
Manual quick check (if script unavailable):
# Component health (most important)
kubectl get componentstatuses
# Node and resource status
kubectl get nodes -o wide
kubectl top nodes
# Unhealthy pods
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
Analyze the diagnostic output to identify the primary issue category:
Indicators:
kubectl get componentstatuses shows etcd-0 as Unhealthycontext deadline exceededAction: Proceed to "Resolving ETCD Issues" section below.
Indicators:
ImagePullBackOff or ErrImagePull statefailed to pull and unpack imagepull access denied, repository does not existAction: Proceed to "Resolving Image Issues" section below.
Indicators:
Action: Proceed to "Resolving Performance Issues" section below.
Indicators:
wsl.exe -d rancher-desktop shows distribution stoppedAction: Proceed to "Resolving Startup Issues" section below.
ETCD health issues are the most common cause of Rancher Desktop problems. K3s uses embedded etcd (not a separate pod).
Solution 1: Restart Rancher Desktop (fixes 80% of cases)
# From Windows: Right-click Rancher Desktop tray icon → Quit
# Wait 10-15 seconds
# Start Rancher Desktop again
# Wait 2-3 minutes for full initialization
Verification:
kubectl get componentstatuses
# All components should show "Healthy"
# Test API operation speed
time kubectl create service clusterip test --tcp=80:80 -n default --dry-run=client
# Should complete in < 2 seconds
Solution 2: Reset Kubernetes (if restart doesn't work)
kubectl get componentstatusesSolution 3: Check WSL2 Resources (if issue persists)
Insufficient resources can cause etcd slowness:
# Check current memory usage
free -h
# Check if .wslconfig exists and review limits
cat /mnt/c/Users/<username>/.wslconfig
If memory is constrained, increase WSL2 resources:
C:\Users\<username>\.wslconfig (create if missing)[wsl2]
memory=8GB
processors=4
swap=2GB
wsl.exe --shutdown (from PowerShell)For detailed solutions: Load references/common-issues.md section "ETCD Unhealthy"
Local images showing ImagePullBackOff typically means the image wasn't built or isn't accessible to Kubernetes.
Diagnosis:
# Get detailed pod information
kubectl describe pod <pod-name> -n <namespace>
# Look for the image name and error message
# Example: Failed to pull image "dev-main:latest"
Solution 1: Build with DevSpace (if project uses DevSpace)
# DevSpace handles image building and registry setup
devspace build
# Or full deployment
devspace dev
Solution 2: Build with nerdctl (Rancher Desktop's CLI)
# Check if image exists
nerdctl images | grep <image-name>
# Build if missing
nerdctl build -t <image-name>:<tag> .
# Verify
nerdctl images | grep <image-name>
Solution 3: Set imagePullPolicy (for testing)
# In pod/deployment spec
spec:
containers:
- name: container
image: imagename:tag
imagePullPolicy: Never # Forces use of local images only
For detailed solutions: Load references/common-issues.md section "ImagePullBackOff for Local Images"
If all components are healthy but operations are slow:
Check resource utilization:
kubectl top nodes
free -h
df -h
If high resource usage:
kubectl top pods -A --sort-by=memoryIf disk I/O is slow:
Test API responsiveness:
time kubectl get nodes
time kubectl create deployment test --image=nginx --dry-run=client
# Both should complete in < 2 seconds
For detailed solutions: Load references/common-issues.md section "Slow Kubernetes API Operations"
If Rancher Desktop won't start or K3s service fails:
Check WSL status:
wsl.exe -l -v
# Look for: rancher-desktop Stopped
Solution 1: Restart WSL
# Run from PowerShell
wsl.exe --shutdown
# Wait 10 seconds
# Start Rancher Desktop
Solution 2: Check port conflicts
# Check if port 6443 is in use
netstat -ano | findstr ":6443"
# If in use by another process, stop that process or change K3s port
Solution 3: Verify Hyper-V
# Run in elevated PowerShell
Get-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V
# Should show: State: Enabled
For detailed solutions: Load references/common-issues.md section "Rancher Desktop Service Not Starting"
Location: scripts/diagnose-rancher.sh
Run comprehensive diagnostics:
bash scripts/diagnose-rancher.sh > rancher-diagnostics.txt
The script automates data collection for all major health indicators and creates a report suitable for sharing or analysis.
Location: references/common-issues.md
Load this reference when encountering issues not covered in the main workflow or when detailed solution steps are needed:
# Example: For deep dive into ETCD issues
# Read: references/common-issues.md section "ETCD Unhealthy"
The reference includes:
kubectl get componentstatuses # Control plane health
kubectl get nodes -o wide # Node status
kubectl top nodes # Resource usage
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
kubectl describe pod <pod-name> -n <namespace>
kubectl logs -n kube-system <pod-name>
wsl.exe -l -v # WSL distributions
wsl.exe -d rancher-desktop rc-status # Service status
wsl.exe -d rancher-desktop ps aux | grep k3s # Process check
time kubectl create service clusterip test --tcp=80:80 --dry-run=client
time kubectl get nodes
kubectl get componentstatuses reveals most issueskubectl get events shows what happened recentlykubectl top nodes and free -hwsl.exe -d rancher-desktop prefixConsider escalating beyond this skill when:
For GitHub issues or community support, include output from scripts/diagnose-rancher.sh.
tools
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
tools
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
tools
Guide for working with DevSpace, a Kubernetes development tool that automates building, deploying, and developing applications. Use when users need to create or modify devspace.yaml configuration files, build and deploy images to Kubernetes, manage multi-environment deployments with profiles, upload files to pods, or troubleshoot DevSpace workflows. Includes patterns for CI/CD integration, image tagging strategies, and secret management.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.