skills/gke-workload-scaling/SKILL.md
Specific workflows for scaling GKE workloads using HPA and VPA, as well as best practices for autoscaling configuration.
npx skillsauth add googlecloudplatform/gke-mcp gke-workload-scalingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides workflows and best practices for scaling applications on Google Kubernetes Engine (GKE). It covers manual scaling, Horizontal Pod Autoscaling (HPA), and Vertical Pod Autoscaling (VPA).
Quickly scale a deployment to a fixed number of replicas. Useful for immediate manual intervention or testing.
Command:
kubectl scale deployment <deployment-name> --replicas=<number> -n <namespace>
Automatically scale the number of pods based on observed CPU utilization, memory utilization, or custom metrics.
Prerequisites:
Quick Command:
kubectl autoscale deployment <deployment-name> --cpu-percent=50 --min=1 --max=10
Manifest Approach (Recommended): Use a YAML manifest for version-controlled configuration. See assets/hpa-example.yaml for a template.
kubectl apply -f assets/hpa-example.yaml
Custom Metrics & External Metrics: For GKE, the modern and recommended approach for scaling based on Cloud Monitoring metrics (e.g., Pub/Sub queue length) is to use the External metric type, which is natively supported by the GKE control plane without requiring the Custom Metrics Adapter. For application-specific metrics exposed via Prometheus, you can use Google Cloud Managed Service for Prometheus or the Prometheus Adapter.
Automatically adjust the CPU and memory reservations for your pods to match actual usage. This is critical for right-sizing workloads.
Prerequisites:
Enable VPA on Standard Cluster:
gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling --zone <zone>
Update Modes:
Off: Calculates recommendations but does not apply them. Good for "dry run" analysis.Initial: Assigns resources only at pod creation time.Auto: Updates running pods by restarting them if recommendations differ significantly from requests.InPlaceOrRecreate: Attempts to update Pod resources without recreating the Pod. If in-place update is not possible, it reverts to Auto mode (requires GKE 1.34+).Example: See assets/vpa-example.yaml for a configuration template.
While not a workload-level scaler, the Cluster Autoscaler is essential for ensuring your cluster has enough nodes to run the scaled pods.
Enable on a Node Pool:
gcloud container clusters update <cluster-name> \
--enable-autoscaling \
--node-pool <node-pool-name> \
--min-nodes <min> \
--max-nodes <max> \
--zone <zone>
minReplicas in PodUpdatePolicy.data-ai
Systematically diagnose GKE JobSet interruptions, restarts, and preemptions for AI/ML training workloads. Identifies preemption events, maintenance interruptions, bad host VMs, unhealthy pods, and coordinator worker failures.
development
Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.
development
Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.
tools
Assists in preparing applications and clusters on GKE for production.