skills/kubeblocks-restore/SKILL.md
Restore KubeBlocks database clusters from backups. Supports full restore (create new cluster from backup) and Point-in-Time Recovery (PITR) to a specific timestamp. Use when the user wants to restore, recover, rebuild, or roll back a database cluster from a backup. Requires an existing backup created by the backup skill. NOT for creating backups (see kubeblocks-backup skill) or for creating a brand new cluster without backup data (see kubeblocks-create-cluster).
npx skillsauth add apecloud/kubeblocks-skills kubeblocks-restoreInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
KubeBlocks supports restoring database clusters from backups. A restore always creates a new cluster from an existing backup — it does not modify the original cluster in-place. This design is intentional: by creating a new cluster, you can verify the restored data before switching traffic, and the original cluster remains available as a fallback.
Two restore modes are available:
DELETE or DROP TABLE) where you need to rewind to the moment just before the mistake.Official docs: https://kubeblocks.io/docs/preview/user_docs/handle-an-exception/recovery
Before proceeding, verify the cluster is healthy and no other operation is running:
# Cluster must be Running (if restoring to supplement an existing cluster)
kubectl get cluster <cluster-name> -n <namespace> -o jsonpath='{.status.phase}'
# No pending OpsRequests
kubectl get opsrequest -n <namespace> -l app.kubernetes.io/instance=<cluster-name> --field-selector=status.phase!=Succeed
If the cluster is not Running or has a pending OpsRequest, wait for it to complete before proceeding.
Verify backups are available for restore:
kubectl get backup -n <namespace>
- [ ] Step 1: List available backups
- [ ] Step 2: Create a new cluster with restore annotation or OpsRequest
- [ ] Step 3: Verify restored cluster
kubectl get backup -n <ns>
Example output:
NAME POLICY METHOD STATUS AGE
mycluster-full mycluster-mysql-backup-policy xtrabackup Completed 2d
mycluster-continuous mycluster-mysql-backup-policy archive-binlog Running 2d
Check backup details for restore information:
kubectl describe backup <backup-name> -n <ns>
For PITR, note the time range available from the continuous backup's status.
Create a new Cluster CR with the restore annotation. The new cluster spec should match the original cluster's configuration (same component types, resource requests, etc.):
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
name: <new-cluster>
namespace: <ns>
annotations:
kubeblocks.io/restore-from-backup: '{"<component>":{"name":"<backup-name>","namespace":"<ns>","volumeRestorePolicy":"Parallel"}}'
spec:
# ... same spec as original cluster ...
The volumeRestorePolicy options:
Parallel — restore all volumes simultaneously (faster)Serial — restore volumes one at a timeBefore applying, validate with dry-run:
kubectl apply -f restored-cluster.yaml --dry-run=server
If dry-run reports errors, fix the YAML before proceeding.
Apply it:
kubectl apply -f restored-cluster.yaml
kubectl get cluster <new-cluster> -n <ns> -w
Success condition:
.status.phase=Running| Typical: 2-5min | If stuck >10min:kubectl describe cluster <new-cluster> -n <ns>
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: <new-cluster>-restore-ops
namespace: <ns>
spec:
clusterName: <new-cluster>
type: Restore
restore:
backupName: <backup-name>
backupNamespace: <ns>
Before applying, validate with dry-run:
kubectl apply -f restore-ops.yaml --dry-run=server
If dry-run reports errors, fix the YAML before proceeding.
Apply it:
kubectl apply -f restore-ops.yaml
kubectl get ops <new-cluster>-restore-ops -n <ns> -w
Success condition:
.status.phase=Succeed| Typical: 2-5min | If stuck >10min:kubectl describe ops <new-cluster>-restore-ops -n <ns>
PITR requires both a completed full backup and a running continuous backup (archive-binlog for MySQL, wal-archive for PostgreSQL).
Use the annotation method with an additional restoreTime field:
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
name: <new-cluster>
namespace: <ns>
annotations:
kubeblocks.io/restore-from-backup: '{"<component>":{"name":"<continuous-backup>","namespace":"<ns>","volumeRestorePolicy":"Parallel","restoreTime":"2025-01-01T12:00:00Z"}}'
spec:
# ... same spec as original cluster ...
Key points for PITR:
name should reference the continuous backup (not the full backup)restoreTime must be in RFC 3339 format (UTC): YYYY-MM-DDTHH:MM:SSZkubectl describe backup <continuous-backup-name> -n <ns>
Look for status.timeRange which shows the recoverable time window.
# Watch cluster status
kubectl get cluster <new-cluster> -n <ns> -w
Success condition:
.status.phase=Running| Typical: 2-5min | If stuck >10min:kubectl describe cluster <new-cluster> -n <ns>
# Check pods are running
kubectl get pods -n <ns> -l app.kubernetes.io/instance=<new-cluster>
The cluster status should transition to Running. Verify data integrity by connecting to the database:
# Get connection credentials
kubectl get secrets -n <ns> <new-cluster>-<component>-account-root -o jsonpath='{.data.password}' | base64 -d
Restore stuck in Creating:
Completed (for full) or Running (for continuous)kubectl get backuprepokubectl get pods -n <ns> -l app.kubernetes.io/name=restorePITR restore fails:
restoreTime is within the valid time rangeNew cluster spec mismatch:
For addon-specific restore behaviors (MySQL/PostgreSQL/Redis/MongoDB), PITR time range calculation details, the full restore annotation schema, and volume restore policy comparison, see reference.md.
For general agent safety conventions (dry-run, status confirmation, production protection), see safety-patterns.md.
devops
Expand persistent volume storage for KubeBlocks database clusters via OpsRequest. Requires the StorageClass to support volume expansion (allowVolumeExpansion=true). Use when the user needs more disk space, wants to increase storage, expand volumes, or resize PVCs. NOT for changing CPU/memory (see vertical-scaling) or adding more replicas (see horizontal-scaling). Note that volume shrinking is not supported by Kubernetes.
data-ai
Scale CPU and memory resources for KubeBlocks database clusters via OpsRequest (vertical scaling). Supports in-place updates when the feature gate is enabled. Use when the user wants to change, increase, decrease, resize, or adjust CPU or memory resources of a database cluster. NOT for adding/removing replicas or shards (see horizontal-scaling) or expanding disk storage (see volume-expansion).
data-ai
Upgrade the KubeBlocks operator itself via Helm. Covers update operator, upgrade to v1.0, update kubeblocks version, and CRD updates. Use when the user wants to upgrade KubeBlocks, update the operator, or upgrade to a new KubeBlocks release. NOT for upgrading database engine versions (see minor-version-upgrade).
development
Diagnostic guide for KubeBlocks-managed database clusters. Use when the user reports troubleshoot, debug, diagnose, not working, error, failed, stuck, CrashLoopBackOff, cluster exception, or similar problems with their database cluster. This skill guides the agent through diagnostic steps — it does NOT perform actions.