skills/kubeblocks-rebuild-replica/SKILL.md
Rebuild a failed replica in MySQL or PostgreSQL clusters managed by KubeBlocks. Use when a replica's data is corrupted, the pod is in CrashLoopBackOff, replication is broken, or you need to recover or repair a secondary instance. NOT for planned switchover (see switchover) or full cluster restore (see restore).
npx skillsauth add apecloud/kubeblocks-skills kubeblocks-rebuild-replicaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Rebuild replica recovers a failed secondary instance by recreating its data from the primary or from a backup. Use this when:
Supported engines: MySQL (ApeCloud MySQL) and PostgreSQL only — engines with primary-secondary replication.
Official docs: MySQL | PostgreSQL
- [ ] Step 1: Identify the failed replica
- [ ] Step 2: Choose rebuild source (from primary vs from backup)
- [ ] Step 3: Apply RebuildInstance OpsRequest (dry-run then apply)
- [ ] Step 4: Monitor and verify
Check pod status and roles:
kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster> \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\t"}{.metadata.labels.kubeblocks\.io/role}{"\n"}{end}'
Identify the pod that is CrashLoopBackOff, Error, or has secondary role but is unhealthy. Note the component name (e.g. mysql, postgresql) from the Cluster spec.
| Source | When to use |
|--------|-------------|
| From primary | Primary is healthy; fastest option. Omit backupName. |
| From backup | Primary unavailable or you need a specific point-in-time. Set backupName. |
List backups (if rebuilding from backup):
kubectl get backup -n <ns> -l app.kubernetes.io/instance=<cluster>
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: rebuild-<cluster>-<pod>
namespace: <ns>
spec:
clusterName: <cluster>
type: RebuildInstance
rebuildFrom:
- componentName: <component>
instances:
- name: <failed-pod-name>
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: rebuild-<cluster>-<pod>
namespace: <ns>
spec:
clusterName: <cluster>
type: RebuildInstance
rebuildFrom:
- componentName: <component>
backupName: <backup-name>
instances:
- name: <failed-pod-name>
Optional: inPlace: true keeps the same pod name and recreates PVC; omit or false for non-in-place (new pod, then old one removed). Add force: true if preconditions block the operation.
Dry-run first:
kubectl apply -f rebuild-ops.yaml --dry-run=server
If dry-run succeeds, apply:
kubectl apply -f rebuild-ops.yaml
kubectl get ops rebuild-<cluster>-<pod> -n <ns> -w
Success condition:
.status.phase=Succeed| Typical: 5–15 min | If stuck >20 min:kubectl describe ops <name> -n <ns>
Status progresses: Pending → Running → Succeed
Confirm the replica pod is Running and has the secondary role:
kubectl get pods -n <ns> -l app.kubernetes.io/instance=<cluster> \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\t"}{.metadata.labels.kubeblocks\.io/role}{"\n"}{end}'
Verify replication:
# MySQL
kubectl exec -it <replica-pod> -n <ns> -- mysql -u root -p<password> -e "SHOW REPLICA STATUS\G"
# PostgreSQL
kubectl exec -it <primary-pod> -n <ns> -- psql -U postgres -c "SELECT * FROM pg_stat_replication;"
OpsRequest fails or stays Pending:
Running and no other OpsRequest is in progressbackupName exists and is Completedkubectl describe ops <name> -n <ns> for eventsReplica still unhealthy after rebuild:
kubectl logs <pod> -n <ns> --tail=100Non-in-place: pod name changed:
mysql-0 → mysql-2). The cluster keeps the same replica count.For general agent safety conventions (dry-run, status confirmation, production protection), see safety-patterns.md.
devops
Expand persistent volume storage for KubeBlocks database clusters via OpsRequest. Requires the StorageClass to support volume expansion (allowVolumeExpansion=true). Use when the user needs more disk space, wants to increase storage, expand volumes, or resize PVCs. NOT for changing CPU/memory (see vertical-scaling) or adding more replicas (see horizontal-scaling). Note that volume shrinking is not supported by Kubernetes.
data-ai
Scale CPU and memory resources for KubeBlocks database clusters via OpsRequest (vertical scaling). Supports in-place updates when the feature gate is enabled. Use when the user wants to change, increase, decrease, resize, or adjust CPU or memory resources of a database cluster. NOT for adding/removing replicas or shards (see horizontal-scaling) or expanding disk storage (see volume-expansion).
data-ai
Upgrade the KubeBlocks operator itself via Helm. Covers update operator, upgrade to v1.0, update kubeblocks version, and CRD updates. Use when the user wants to upgrade KubeBlocks, update the operator, or upgrade to a new KubeBlocks release. NOT for upgrading database engine versions (see minor-version-upgrade).
development
Diagnostic guide for KubeBlocks-managed database clusters. Use when the user reports troubleshoot, debug, diagnose, not working, error, failed, stuck, CrashLoopBackOff, cluster exception, or similar problems with their database cluster. This skill guides the agent through diagnostic steps — it does NOT perform actions.