Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

googlecloudplatform/gke-workload-scaling

Name: gke-workload-scaling
Author: googlecloudplatform

skills/gke-workload-scaling/SKILL.md

npx skillsauth add googlecloudplatform/gke-mcp gke-workload-scaling

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

GKE Workload Scaling

This skill provides workflows and best practices for scaling applications on Google Kubernetes Engine (GKE). It covers manual scaling, Horizontal Pod Autoscaling (HPA), and Vertical Pod Autoscaling (VPA).

Workflows

1. Manual Scaling

Quickly scale a deployment to a fixed number of replicas. Useful for immediate manual intervention or testing.

Command:

kubectl scale deployment <deployment-name> --replicas=<number> -n <namespace>

2. Horizontal Pod Autoscaling (HPA)

Automatically scale the number of pods based on observed CPU utilization, memory utilization, or custom metrics.

Prerequisites:

Metrics Server must be running (enabled by default on GKE).
Containers clearly define resource requests/limits.

Quick Command:

kubectl autoscale deployment <deployment-name> --cpu-percent=50 --min=1 --max=10

Manifest Approach (Recommended): Use a YAML manifest for version-controlled configuration. See assets/hpa-example.yaml for a template.

kubectl apply -f assets/hpa-example.yaml

Custom Metrics & External Metrics: For GKE, the modern and recommended approach for scaling based on Cloud Monitoring metrics (e.g., Pub/Sub queue length) is to use the External metric type, which is natively supported by the GKE control plane without requiring the Custom Metrics Adapter. For application-specific metrics exposed via Prometheus, you can use Google Cloud Managed Service for Prometheus or the Prometheus Adapter.

3. Vertical Pod Autoscaling (VPA)

Automatically adjust the CPU and memory reservations for your pods to match actual usage. This is critical for right-sizing workloads.

Prerequisites:

VPA must be enabled on the cluster.
- Autopilot: Enabled by default.
- Standard: Must be enabled manually.

Enable VPA on Standard Cluster:

gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling --zone <zone>

Update Modes:

Off: Calculates recommendations but does not apply them. Good for "dry run" analysis.
Initial: Assigns resources only at pod creation time.
Auto: Updates running pods by restarting them if recommendations differ significantly from requests.
InPlaceOrRecreate: Attempts to update Pod resources without recreating the Pod. If in-place update is not possible, it reverts to Auto mode (requires GKE 1.34+).

Example: See assets/vpa-example.yaml for a configuration template.

4. Cluster Autoscaler

While not a workload-level scaler, the Cluster Autoscaler is essential for ensuring your cluster has enough nodes to run the scaled pods.

Enable on a Node Pool:

gcloud container clusters update <cluster-name> \
    --enable-autoscaling \
    --node-pool <node-pool-name> \
    --min-nodes <min> \
    --max-nodes <max> \
    --zone <zone>

Best Practices

Define Resource Requests: HPA and VPA rely on accurate resource requests. Always define them in your container specs.
Avoid Metric Conflicts: Do not configure HPA and VPA to use the same metric (e.g., both CPU). This causes thrashing.
- Typical Pattern: HPA on CPU, VPA on Memory.
Pod Disruption Budgets (PDBs): Define PDBs to ensure application availability during scaling events or node upgrades.
HPA Lag: HPA has a stabilization window (default 5 mins) to prevent rapid fluctuation.
VPA "Auto" Mode Risks: In "Auto" mode, VPA restarts pods to change resources. Ensure your application handles restarts gracefully (e.g., handles SIGTERM).
- Note: By default, VPA requires at least 2 replicas to perform evictions. In GKE 1.22+, you can override this by setting minReplicas in PodUpdatePolicy.

googlecloudplatform/gke-workload-scaling

skills/gke-workload-scaling/SKILL.md

Specific workflows for scaling GKE workloads using HPA and VPA, as well as best practices for autoscaling configuration.

141 stars

testing

Updated Apr 18, 2026

$ install --global

skillsauth

npx skillsauth add googlecloudplatform/gke-mcp gke-workload-scaling

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 18, 2026, 7:36 AM7.8s3 files scanned

SKILL.md

name:: gke-workload-scaling
description:: Specific workflows for scaling GKE workloads using HPA and VPA, as well as best practices for autoscaling configuration.

GKE Workload Scaling

Workflows

1. Manual Scaling

Quickly scale a deployment to a fixed number of replicas. Useful for immediate manual intervention or testing.

Command:

kubectl scale deployment <deployment-name> --replicas=<number> -n <namespace>

2. Horizontal Pod Autoscaling (HPA)

Automatically scale the number of pods based on observed CPU utilization, memory utilization, or custom metrics.

Prerequisites:

Metrics Server must be running (enabled by default on GKE).
Containers clearly define resource requests/limits.

Quick Command:

kubectl autoscale deployment <deployment-name> --cpu-percent=50 --min=1 --max=10

Manifest Approach (Recommended): Use a YAML manifest for version-controlled configuration. See assets/hpa-example.yaml for a template.

kubectl apply -f assets/hpa-example.yaml

3. Vertical Pod Autoscaling (VPA)

Automatically adjust the CPU and memory reservations for your pods to match actual usage. This is critical for right-sizing workloads.

Prerequisites:

VPA must be enabled on the cluster.
- Autopilot: Enabled by default.
- Standard: Must be enabled manually.

Enable VPA on Standard Cluster:

gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling --zone <zone>

Update Modes:

Off: Calculates recommendations but does not apply them. Good for "dry run" analysis.
Initial: Assigns resources only at pod creation time.
Auto: Updates running pods by restarting them if recommendations differ significantly from requests.
InPlaceOrRecreate: Attempts to update Pod resources without recreating the Pod. If in-place update is not possible, it reverts to Auto mode (requires GKE 1.34+).

Example: See assets/vpa-example.yaml for a configuration template.

4. Cluster Autoscaler

While not a workload-level scaler, the Cluster Autoscaler is essential for ensuring your cluster has enough nodes to run the scaled pods.

Enable on a Node Pool:

gcloud container clusters update <cluster-name> \
    --enable-autoscaling \
    --node-pool <node-pool-name> \
    --min-nodes <min> \
    --max-nodes <max> \
    --zone <zone>

Best Practices

Define Resource Requests: HPA and VPA rely on accurate resource requests. Always define them in your container specs.
Avoid Metric Conflicts: Do not configure HPA and VPA to use the same metric (e.g., both CPU). This causes thrashing.
- Typical Pattern: HPA on CPU, VPA on Memory.
Pod Disruption Budgets (PDBs): Define PDBs to ensure application availability during scaling events or node upgrades.
HPA Lag: HPA has a stabilization window (default 5 mins) to prevent rapid fluctuation.
VPA "Auto" Mode Risks: In "Auto" mode, VPA restarts pods to change resources. Ensure your application handles restarts gracefully (e.g., handles SIGTERM).
- Note: By default, VPA requires at least 2 replicas to perform evictions. In GKE 1.22+, you can override this by setting minReplicas in PodUpdatePolicy.

Related Skills

googlecloudplatform/gke-ai-troubleshooting-jobset-interruption

data-ai

VerifiedTrustedCommunity

Systematically diagnose GKE JobSet interruptions, restarts, and preemptions for AI/ML training workloads. Identifies preemption events, maintenance interruptions, bad host VMs, unhealthy pods, and coordinator worker failures.

158SKILL.mdUpdated Jun 4, 2026

googlecloudplatform/gke-ai-troubleshooting-jobset-interruption

googlecloudplatform/gke-ai-troubleshooting-tpu-connection-failure-vbar-oom

development

VerifiedTrustedCommunity

Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.

148SKILL.mdUpdated May 5, 2026

googlecloudplatform/gke-ai-troubleshooting-tpu-connection-failure-vbar-oom

googlecloudplatform/gke-ai-troubleshooting-skill-creation-guide

development

VerifiedTrustedCommunity

Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.

148SKILL.mdUpdated May 2, 2026

googlecloudplatform/gke-ai-troubleshooting-skill-creation-guide

googlecloudplatform/gke-productionize

tools

VerifiedTrustedCommunity

Assists in preparing applications and clusters on GKE for production.

148SKILL.mdUpdated Apr 18, 2026

googlecloudplatform/gke-productionize

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/googlecloudplatform/gke-mcp.git

# Copy into Claude Code skills folder (global)
cp -r gke-mcp/skills/gke-workload-scaling ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

googlecloudplatform/gke-mcp

141 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT