Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

apecloud/kubeblocks-troubleshoot

Name: kubeblocks-troubleshoot
Author: apecloud

skills/kubeblocks-troubleshoot/SKILL.md

npx skillsauth add apecloud/kubeblocks-skills kubeblocks-troubleshoot

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Troubleshoot KubeBlocks Clusters

Overview

This skill helps diagnose and fix common issues with KubeBlocks-managed database clusters. Follow the diagnostic flowchart and sections below to systematically identify root causes. This skill guides the agent through diagnostic steps; it does not perform actions.

Quick Diagnostic Flowchart

User reports cluster problem
│
├─ Is the KubeBlocks operator healthy?
│  └─ kubectl -n kb-system get pods
│     ├─ Pods not Running → Operator Issues (Section 5)
│     └─ Pods Running → continue
│
├─ Is the cluster in an abnormal state?
│  └─ kubectl get cluster <name> -n <ns>
│     ├─ Phase: Creating/Updating/Abnormal/Deleting → Cluster Status Issues (Section 1)
│     └─ Phase: Running → continue
│
├─ Is there a failed OpsRequest?
│  └─ kubectl get opsrequest -n <ns>
│     ├─ Failed/Running (stuck) → OpsRequest Failures (Section 3)
│     └─ Succeeded → continue
│
└─ Are pods crashing?
   └─ kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster>
      ├─ CrashLoopBackOff/ImagePullBackOff/Pending → Pod Issues (Section 2)
      └─ All Running → check pod logs and events for application errors

1. Cluster Status Issues

Cluster stuck in `Creating`

| Cause | How to verify | Action | |-------|---------------|--------| | Addon not installed | kubectl get addon | Install the required addon (e.g. kbcli addon enable mysql) | | Insufficient resources | kubectl describe pod <pending-pod> → Events | Increase node resources or reduce cluster requests | | Image pull errors | kubectl describe pod → ImagePullBackOff | Fix registry access, image name, or imagePullSecrets |

Cluster stuck in `Updating`

Check if all pods are Running: kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster>
Check pod logs for errors: kubectl logs <pod> -c <container>
Verify pod roles: kubectl get po <pod> -L kubeblocks.io/role
Check for failed OpsRequest: kubectl get opsrequest -n <ns> and kubectl describe opsrequest <name>
Ensure container image in status.containerStatuses matches spec.containers.image; mismatches can block updates

Cluster in `Abnormal`

Usually caused by pod failures or storage issues. Check pod status and events (see Pod Issues below), then kubectl describe cluster <name> -n <ns> for conditions.

Cluster stuck in `Deleting`

If KubeBlocks logs show has no pods to running the pre-terminate action, the cluster cannot run the pre-terminate lifecycle. To skip:

kubectl annotate component <COMPONENT_NAME> -n <ns> apps.kubeblocks.io/skip-pre-terminate-action=true

2. Pod Issues

CrashLoopBackOff

Check logs: kubectl logs <pod> -c <container> --previous
Describe pod: kubectl describe pod <pod> -n <ns> (check Events)
Common causes: Wrong config, bad credentials, storage mount failure, OOM, database init failure

ImagePullBackOff

Describe pod: kubectl describe pod <pod> → Events show pull error
Common causes: Private registry without imagePullSecrets, wrong image name/tag, network/registry unreachable

Pending

Describe pod: kubectl describe pod <pod> → Events
Common causes: Insufficient CPU/memory, no nodes matching nodeSelector/affinity, PVC pending (StorageClass or capacity)

3. OpsRequest Failures

How to check failed OpsRequest

kubectl get opsrequest -n <namespace>
kubectl describe opsrequest <ops-name> -n <namespace>

Check status.conditions and Events for failure reason.

Common OpsRequest failure reasons

Resource constraints (scaling beyond available capacity)
Preconditions not met (e.g. cluster not Running)
Timeout or step failure during the operation

How to cancel a stuck OpsRequest

Only VerticalScaling and HorizontalScaling OpsRequests in Running state can be cancelled:

kubectl patch opsrequest <OPSREQUEST_NAME> -n <ns> -p '{"spec":{"cancel":true}}' --type=merge

4. Operator Issues

KubeBlocks operator pod not running

kubectl -n kb-system get pods
kubectl -n kb-system describe pod <kubeblocks-pod>

CRD version mismatch

If installing on K8s ≤ 1.23, you may see unknown field "x-kubernetes-validations". Apply CRDs with --validate=false (see official docs).

Check operator logs

kubectl -n kb-system logs -l app.kubernetes.io/name=kubeblocks --tail=100 -f
# or
kubectl -n kb-system logs deployments/kubeblocks -f

ComponentDefinition status Unavailable

If kubectl get componentdefinition shows Unavailable (e.g. "immutable fields can't be updated"):

kubectl annotate componentdefinition <NAME> apps.kubeblocks.io/skip-immutable-check=true

5. Useful Commands Reference

| Purpose | Command | |---------|---------| | Operator health | kubectl -n kb-system get pods | | Cluster status | kubectl get cluster -A | | Cluster details | kubectl describe cluster <name> -n <ns> | | OpsRequest status | kubectl get opsrequest -n <ns> | | Pod status | kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster> | | Pod logs | kubectl logs <pod> -c <container> -n <ns> | | Pod events | kubectl describe pod <pod> -n <ns> | | Operator logs | kubectl -n kb-system logs -l app.kubernetes.io/name=kubeblocks --tail=100 | | Cluster resources | kubectl get cmp,its,po -l app.kubernetes.io/instance=<cluster> -n <ns> | | Generate report | kbcli report cluster <name> --with-logs --mask |

6. Official Troubleshooting Docs

| Resource | URL | |----------|-----| | FAQs (cluster exception) | https://kubeblocks.io/docs/preview/user_docs/troubleshooting/handle-a-cluster-exception | | Known Issues | https://kubeblocks.io/docs/preview/user_docs/troubleshooting/known-issues | | Full doc index | https://kubeblocks.io/llms-full.txt |

Known issues to check

PostgreSQL password with special chars (v0.9.4 and before, v1.0.0): Upgrade to v1.0.1-beta.6+ or set symbolCharacters in passwordConfig to avoid YAML parsing errors.
Excessive secrets (KubeBlocks v1.0.0 on K8s ≤ 1.24): Upgrade to v1.0.1-beta.3+.

apecloud/kubeblocks-troubleshoot

skills/kubeblocks-troubleshoot/SKILL.md

Diagnostic guide for KubeBlocks-managed database clusters. Use when the user reports troubleshoot, debug, diagnose, not working, error, failed, stuck, CrashLoopBackOff, cluster exception, or similar problems with their database cluster. This skill guides the agent through diagnostic steps — it does NOT perform actions.

2 stars

development

Updated Apr 18, 2026

$ install --global

skillsauth

npx skillsauth add apecloud/kubeblocks-skills kubeblocks-troubleshoot

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 18, 2026, 8:00 AM11.2s1 file scanned

SKILL.md

name:: kubeblocks-troubleshoot
version:: 0.1.0
description:: Diagnostic guide for KubeBlocks-managed database clusters. Use when the user reports troubleshoot, debug, diagnose, not working, error, failed, stuck, CrashLoopBackOff, cluster exception, or similar problems with their database cluster. This skill guides the agent through diagnostic steps — it does NOT perform actions.

Troubleshoot KubeBlocks Clusters

Overview

Quick Diagnostic Flowchart

User reports cluster problem
│
├─ Is the KubeBlocks operator healthy?
│  └─ kubectl -n kb-system get pods
│     ├─ Pods not Running → Operator Issues (Section 5)
│     └─ Pods Running → continue
│
├─ Is the cluster in an abnormal state?
│  └─ kubectl get cluster <name> -n <ns>
│     ├─ Phase: Creating/Updating/Abnormal/Deleting → Cluster Status Issues (Section 1)
│     └─ Phase: Running → continue
│
├─ Is there a failed OpsRequest?
│  └─ kubectl get opsrequest -n <ns>
│     ├─ Failed/Running (stuck) → OpsRequest Failures (Section 3)
│     └─ Succeeded → continue
│
└─ Are pods crashing?
   └─ kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster>
      ├─ CrashLoopBackOff/ImagePullBackOff/Pending → Pod Issues (Section 2)
      └─ All Running → check pod logs and events for application errors

1. Cluster Status Issues

Cluster stuck in `Creating`

Cluster stuck in `Updating`

Check if all pods are Running: kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster>
Check pod logs for errors: kubectl logs <pod> -c <container>
Verify pod roles: kubectl get po <pod> -L kubeblocks.io/role
Check for failed OpsRequest: kubectl get opsrequest -n <ns> and kubectl describe opsrequest <name>
Ensure container image in status.containerStatuses matches spec.containers.image; mismatches can block updates

Cluster in `Abnormal`

Usually caused by pod failures or storage issues. Check pod status and events (see Pod Issues below), then kubectl describe cluster <name> -n <ns> for conditions.

Cluster stuck in `Deleting`

If KubeBlocks logs show has no pods to running the pre-terminate action, the cluster cannot run the pre-terminate lifecycle. To skip:

kubectl annotate component <COMPONENT_NAME> -n <ns> apps.kubeblocks.io/skip-pre-terminate-action=true

2. Pod Issues

CrashLoopBackOff

Check logs: kubectl logs <pod> -c <container> --previous
Describe pod: kubectl describe pod <pod> -n <ns> (check Events)
Common causes: Wrong config, bad credentials, storage mount failure, OOM, database init failure

ImagePullBackOff

Describe pod: kubectl describe pod <pod> → Events show pull error
Common causes: Private registry without imagePullSecrets, wrong image name/tag, network/registry unreachable

Pending

Describe pod: kubectl describe pod <pod> → Events
Common causes: Insufficient CPU/memory, no nodes matching nodeSelector/affinity, PVC pending (StorageClass or capacity)

3. OpsRequest Failures

How to check failed OpsRequest

kubectl get opsrequest -n <namespace>
kubectl describe opsrequest <ops-name> -n <namespace>

Check status.conditions and Events for failure reason.

Common OpsRequest failure reasons

Resource constraints (scaling beyond available capacity)
Preconditions not met (e.g. cluster not Running)
Timeout or step failure during the operation

How to cancel a stuck OpsRequest

Only VerticalScaling and HorizontalScaling OpsRequests in Running state can be cancelled:

kubectl patch opsrequest <OPSREQUEST_NAME> -n <ns> -p '{"spec":{"cancel":true}}' --type=merge

4. Operator Issues

KubeBlocks operator pod not running

kubectl -n kb-system get pods
kubectl -n kb-system describe pod <kubeblocks-pod>

CRD version mismatch

If installing on K8s ≤ 1.23, you may see unknown field "x-kubernetes-validations". Apply CRDs with --validate=false (see official docs).

Check operator logs

kubectl -n kb-system logs -l app.kubernetes.io/name=kubeblocks --tail=100 -f
# or
kubectl -n kb-system logs deployments/kubeblocks -f

ComponentDefinition status Unavailable

If kubectl get componentdefinition shows Unavailable (e.g. "immutable fields can't be updated"):

kubectl annotate componentdefinition <NAME> apps.kubeblocks.io/skip-immutable-check=true

5. Useful Commands Reference

6. Official Troubleshooting Docs

Known issues to check

PostgreSQL password with special chars (v0.9.4 and before, v1.0.0): Upgrade to v1.0.1-beta.6+ or set symbolCharacters in passwordConfig to avoid YAML parsing errors.
Excessive secrets (KubeBlocks v1.0.0 on K8s ≤ 1.24): Upgrade to v1.0.1-beta.3+.

Related Skills

apecloud/kubeblocks-volume-expansion

devops

VerifiedTrustedCommunity

Expand persistent volume storage for KubeBlocks database clusters via OpsRequest. Requires the StorageClass to support volume expansion (allowVolumeExpansion=true). Use when the user needs more disk space, wants to increase storage, expand volumes, or resize PVCs. NOT for changing CPU/memory (see vertical-scaling) or adding more replicas (see horizontal-scaling). Note that volume shrinking is not supported by Kubernetes.

2SKILL.mdUpdated Apr 18, 2026

apecloud/kubeblocks-volume-expansion

apecloud/kubeblocks-vertical-scaling

data-ai

VerifiedTrustedCommunity

Scale CPU and memory resources for KubeBlocks database clusters via OpsRequest (vertical scaling). Supports in-place updates when the feature gate is enabled. Use when the user wants to change, increase, decrease, resize, or adjust CPU or memory resources of a database cluster. NOT for adding/removing replicas or shards (see horizontal-scaling) or expanding disk storage (see volume-expansion).

2SKILL.mdUpdated Apr 18, 2026

apecloud/kubeblocks-vertical-scaling

apecloud/kubeblocks-upgrade

data-ai

VerifiedTrustedCommunity

Upgrade the KubeBlocks operator itself via Helm. Covers update operator, upgrade to v1.0, update kubeblocks version, and CRD updates. Use when the user wants to upgrade KubeBlocks, update the operator, or upgrade to a new KubeBlocks release. NOT for upgrading database engine versions (see minor-version-upgrade).

2SKILL.mdUpdated Apr 18, 2026

apecloud/kubeblocks-upgrade

apecloud/kubeblocks-switchover

data-ai

VerifiedTrustedCommunity

Perform planned primary-secondary switchover for KubeBlocks database clusters via OpsRequest. Promotes a replica to primary with minimal downtime. Use when the user wants to promote a replica, switch primary, change leader, perform a planned failover, or do maintenance on the current primary node. NOT for unplanned failover recovery (handled automatically by HA middleware like Patroni, Orchestrator, or Sentinel) or restarting all pods (see kubeblocks-cluster-lifecycle).

2SKILL.mdUpdated Apr 18, 2026

apecloud/kubeblocks-switchover

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/apecloud/kubeblocks-skills.git

# Copy into Claude Code skills folder (global)
cp -r kubeblocks-skills/skills/kubeblocks-troubleshoot ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

apecloud/kubeblocks-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

apecloud/kubeblocks-troubleshoot

$ install --global

Security Scan Results

SKILL.md

Troubleshoot KubeBlocks Clusters

Overview

Quick Diagnostic Flowchart

1. Cluster Status Issues

Cluster stuck in Creating

Cluster stuck in Updating

Cluster in Abnormal

Cluster stuck in Deleting

2. Pod Issues

CrashLoopBackOff

ImagePullBackOff

Pending

3. OpsRequest Failures

How to check failed OpsRequest

Common OpsRequest failure reasons

How to cancel a stuck OpsRequest

4. Operator Issues

KubeBlocks operator pod not running

CRD version mismatch

Check operator logs

ComponentDefinition status Unavailable

5. Useful Commands Reference

6. Official Troubleshooting Docs

Known issues to check

Related Skills

apecloud/kubeblocks-volume-expansion

apecloud/kubeblocks-vertical-scaling

apecloud/kubeblocks-upgrade

apecloud/kubeblocks-switchover

apecloud/kubeblocks-troubleshoot

$ install --global

Security Scan Results

SKILL.md

Troubleshoot KubeBlocks Clusters

Overview

Quick Diagnostic Flowchart

1. Cluster Status Issues

Cluster stuck in Creating

Cluster stuck in Updating

Cluster in Abnormal

Cluster stuck in Deleting

2. Pod Issues

CrashLoopBackOff

ImagePullBackOff

Pending

3. OpsRequest Failures

How to check failed OpsRequest

Common OpsRequest failure reasons

How to cancel a stuck OpsRequest

4. Operator Issues

KubeBlocks operator pod not running

CRD version mismatch

Check operator logs

ComponentDefinition status Unavailable

5. Useful Commands Reference

6. Official Troubleshooting Docs

Known issues to check

Related Skills

apecloud/kubeblocks-volume-expansion

apecloud/kubeblocks-vertical-scaling

apecloud/kubeblocks-upgrade

apecloud/kubeblocks-switchover

Cluster stuck in `Creating`

Cluster stuck in `Updating`

Cluster in `Abnormal`

Cluster stuck in `Deleting`

Cluster stuck in `Creating`

Cluster stuck in `Updating`

Cluster in `Abnormal`

Cluster stuck in `Deleting`