skills/kubeblocks-troubleshoot/SKILL.md
Diagnostic guide for KubeBlocks-managed database clusters. Use when the user reports troubleshoot, debug, diagnose, not working, error, failed, stuck, CrashLoopBackOff, cluster exception, or similar problems with their database cluster. This skill guides the agent through diagnostic steps — it does NOT perform actions.
npx skillsauth add apecloud/kubeblocks-skills kubeblocks-troubleshootInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill helps diagnose and fix common issues with KubeBlocks-managed database clusters. Follow the diagnostic flowchart and sections below to systematically identify root causes. This skill guides the agent through diagnostic steps; it does not perform actions.
User reports cluster problem
│
├─ Is the KubeBlocks operator healthy?
│ └─ kubectl -n kb-system get pods
│ ├─ Pods not Running → Operator Issues (Section 5)
│ └─ Pods Running → continue
│
├─ Is the cluster in an abnormal state?
│ └─ kubectl get cluster <name> -n <ns>
│ ├─ Phase: Creating/Updating/Abnormal/Deleting → Cluster Status Issues (Section 1)
│ └─ Phase: Running → continue
│
├─ Is there a failed OpsRequest?
│ └─ kubectl get opsrequest -n <ns>
│ ├─ Failed/Running (stuck) → OpsRequest Failures (Section 3)
│ └─ Succeeded → continue
│
└─ Are pods crashing?
└─ kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster>
├─ CrashLoopBackOff/ImagePullBackOff/Pending → Pod Issues (Section 2)
└─ All Running → check pod logs and events for application errors
Creating| Cause | How to verify | Action |
|-------|---------------|--------|
| Addon not installed | kubectl get addon | Install the required addon (e.g. kbcli addon enable mysql) |
| Insufficient resources | kubectl describe pod <pending-pod> → Events | Increase node resources or reduce cluster requests |
| Image pull errors | kubectl describe pod → ImagePullBackOff | Fix registry access, image name, or imagePullSecrets |
UpdatingRunning: kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster>kubectl logs <pod> -c <container>kubectl get po <pod> -L kubeblocks.io/rolekubectl get opsrequest -n <ns> and kubectl describe opsrequest <name>status.containerStatuses matches spec.containers.image; mismatches can block updatesAbnormalUsually caused by pod failures or storage issues. Check pod status and events (see Pod Issues below), then kubectl describe cluster <name> -n <ns> for conditions.
DeletingIf KubeBlocks logs show has no pods to running the pre-terminate action, the cluster cannot run the pre-terminate lifecycle. To skip:
kubectl annotate component <COMPONENT_NAME> -n <ns> apps.kubeblocks.io/skip-pre-terminate-action=true
kubectl logs <pod> -c <container> --previouskubectl describe pod <pod> -n <ns> (check Events)kubectl describe pod <pod> → Events show pull errorimagePullSecrets, wrong image name/tag, network/registry unreachablekubectl describe pod <pod> → Eventskubectl get opsrequest -n <namespace>
kubectl describe opsrequest <ops-name> -n <namespace>
Check status.conditions and Events for failure reason.
Only VerticalScaling and HorizontalScaling OpsRequests in Running state can be cancelled:
kubectl patch opsrequest <OPSREQUEST_NAME> -n <ns> -p '{"spec":{"cancel":true}}' --type=merge
kubectl -n kb-system get pods
kubectl -n kb-system describe pod <kubeblocks-pod>
If installing on K8s ≤ 1.23, you may see unknown field "x-kubernetes-validations". Apply CRDs with --validate=false (see official docs).
kubectl -n kb-system logs -l app.kubernetes.io/name=kubeblocks --tail=100 -f
# or
kubectl -n kb-system logs deployments/kubeblocks -f
If kubectl get componentdefinition shows Unavailable (e.g. "immutable fields can't be updated"):
kubectl annotate componentdefinition <NAME> apps.kubeblocks.io/skip-immutable-check=true
| Purpose | Command |
|---------|---------|
| Operator health | kubectl -n kb-system get pods |
| Cluster status | kubectl get cluster -A |
| Cluster details | kubectl describe cluster <name> -n <ns> |
| OpsRequest status | kubectl get opsrequest -n <ns> |
| Pod status | kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster> |
| Pod logs | kubectl logs <pod> -c <container> -n <ns> |
| Pod events | kubectl describe pod <pod> -n <ns> |
| Operator logs | kubectl -n kb-system logs -l app.kubernetes.io/name=kubeblocks --tail=100 |
| Cluster resources | kubectl get cmp,its,po -l app.kubernetes.io/instance=<cluster> -n <ns> |
| Generate report | kbcli report cluster <name> --with-logs --mask |
| Resource | URL | |----------|-----| | FAQs (cluster exception) | https://kubeblocks.io/docs/preview/user_docs/troubleshooting/handle-a-cluster-exception | | Known Issues | https://kubeblocks.io/docs/preview/user_docs/troubleshooting/known-issues | | Full doc index | https://kubeblocks.io/llms-full.txt |
symbolCharacters in passwordConfig to avoid YAML parsing errors.devops
Expand persistent volume storage for KubeBlocks database clusters via OpsRequest. Requires the StorageClass to support volume expansion (allowVolumeExpansion=true). Use when the user needs more disk space, wants to increase storage, expand volumes, or resize PVCs. NOT for changing CPU/memory (see vertical-scaling) or adding more replicas (see horizontal-scaling). Note that volume shrinking is not supported by Kubernetes.
data-ai
Scale CPU and memory resources for KubeBlocks database clusters via OpsRequest (vertical scaling). Supports in-place updates when the feature gate is enabled. Use when the user wants to change, increase, decrease, resize, or adjust CPU or memory resources of a database cluster. NOT for adding/removing replicas or shards (see horizontal-scaling) or expanding disk storage (see volume-expansion).
data-ai
Upgrade the KubeBlocks operator itself via Helm. Covers update operator, upgrade to v1.0, update kubeblocks version, and CRD updates. Use when the user wants to upgrade KubeBlocks, update the operator, or upgrade to a new KubeBlocks release. NOT for upgrading database engine versions (see minor-version-upgrade).
data-ai
Perform planned primary-secondary switchover for KubeBlocks database clusters via OpsRequest. Promotes a replica to primary with minimal downtime. Use when the user wants to promote a replica, switch primary, change leader, perform a planned failover, or do maintenance on the current primary node. NOT for unplanned failover recovery (handled automatically by HA middleware like Patroni, Orchestrator, or Sentinel) or restarting all pods (see kubeblocks-cluster-lifecycle).