skills/gke-cluster-creator/SKILL.md
Guides the user through creating GKE clusters using pre-defined templates (Standard, Autopilot, GPU/AI).
npx skillsauth add googlecloudplatform/gke-mcp gke-cluster-creatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill helps users create Google Kubernetes Engine (GKE) clusters by providing a set of best-practice templates and guiding them through the customization process.
project_id, location, cluster_name.machineType, nodeCount, network).project_id, location, and cluster_name are set.create_cluster MCP tool schema.create_cluster MCP tool with the final configuration.When guiding the user or generating configurations, adhere to the following GKE cluster creation best practices:
Best for: Development, testing, non-critical workloads.
{
"name": "projects/{PROJECT_ID}/locations/{ZONE}/clusters/{CLUSTER_NAME}",
"initialNodeCount": 1,
"nodeConfig": {
"machineType": "e2-medium",
"diskSizeGb": 50,
"oauthScopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append"
]
}
}
Best for: Production workloads requiring high availability. Note: Creates 3 nodes (one per zone in the region) by default.
{
"name": "projects/{PROJECT_ID}/locations/{REGION}/clusters/{CLUSTER_NAME}",
"initialNodeCount": 1,
"nodeConfig": {
"machineType": "e2-standard-4",
"diskSizeGb": 100,
"oauthScopes": ["https://www.googleapis.com/auth/cloud-platform"]
}
}
Best for: Most workloads where you don't want to manage nodes.
{
"name": "projects/{PROJECT_ID}/locations/{REGION}/clusters/{CLUSTER_NAME}",
"autopilot": {
"enabled": true
}
}
Best for: AI/ML Inference, small model serving.
Note: Requires g2-standard-4 quota.
{
"name": "projects/{PROJECT_ID}/locations/{REGION}/clusters/{CLUSTER_NAME}",
"initialNodeCount": 1,
"nodeConfig": {
"machineType": "g2-standard-4",
"accelerators": [
{
"acceleratorCount": "1",
"acceleratorType": "nvidia-l4"
}
],
"diskSizeGb": 100,
"oauthScopes": ["https://www.googleapis.com/auth/cloud-platform"]
}
}
Best for: Large Model Training/Inference. Note: High cost and strict quota requirements.
{
"name": "projects/{PROJECT_ID}/locations/{REGION}/clusters/{CLUSTER_NAME}",
"initialNodeCount": 1,
"nodeConfig": {
"machineType": "a3-highgpu-8g",
"accelerators": [
{
"acceleratorCount": "8",
"acceleratorType": "nvidia-h100-80gb-hbm3"
}
],
"diskSizeGb": 200,
"oauthScopes": ["https://www.googleapis.com/auth/cloud-platform"]
}
}
project_id if it is not in the context.location (Region or Zone).cluster_name.Access to Google Cloud APIs (default cloud-platform scope is usually best for modern GKE).create_cluster MCP tool to create the cluster. The parent argument is projects/{PROJECT_ID}/locations/{LOCATION} and the cluster argument is the JSON object. The cluster.name is just the short name (e.g. "my-cluster").create_cluster, the cluster.name should be the short name (e.g., my-cluster), NOT the full resource path, because the parent argument defines the scope.User: "I want to create a GKE cluster." Model: "I can help with that. What kind of cluster do you need?
User: "Standard Zonal, please." Model: "Great. I'll need a few details:
User: "Project my-proj, zone us-west1-b, name dev-cluster."
Model: "Here is the configuration I will use:
[JSON view]
Do you want to proceed?"
data-ai
Systematically diagnose GKE JobSet interruptions, restarts, and preemptions for AI/ML training workloads. Identifies preemption events, maintenance interruptions, bad host VMs, unhealthy pods, and coordinator worker failures.
development
Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.
development
Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.
tools
Assists in preparing applications and clusters on GKE for production.