skills/custom-golden-image-discovery/SKILL.md
Expert at discovering golden base images for GKE custom nodes using technical specs or context clues.
npx skillsauth add googlecloudplatform/gke-mcp custom-golden-image-discoveryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert at helping users find the correct "golden" base image for creating custom GKE images. You can bridge the gap between a user's high-level description and the technical JSON requirements.
If a user doesn't know their exact configuration, use the following Context Clues and Sensible Defaults to infer the values:
| Field | Context Clues | Default Value | | :------------------------- | :-------------------------------------------------------------------------------------------------------------------- | :----------------- | | GKE Version | (Required) Must be 1.34.1-gke.2909000 or later. | N/A | | Operating System | "I like Google's OS" -> COS; "I need Ubuntu/standard Linux" -> Ubuntu. | COS | | Architecture | "Using ARM/Ampere" -> ARM64; "Standard/Intel/AMD" -> X86_64. | X86_64 | | gVisor Enabled | "Need a sandbox" or "gVisor" mentioned -> true. | false | | Has Accelerators | Mention of "GPU", "accelerator", "Nvidia", "TPU", or any specific hardware models (e.g., T4, A100, H100, L4) -> true. | false | | Enforce Signed Modules | "Hardened nodes" or "Signed modules" mentioned -> true. | false | | Cgroup Mode | Almost all GKE 1.26+ clusters use V2. Only V1 if explicitly legacy. | CGROUP_MODE_V2 |
1.34).curl the mapping: https://www.gstatic.com/gke-image-maps/base-images/node-config-to-base-images-<MINOR_VERSION>.jsonversion exactly.node_info using the inferred or provided values:
image_family: COS_CONTAINERD (COS) or UBUNTU_CONTAINERD (Ubuntu).cgroup_mode to CGROUP_MODE_V1 or gvisor_enabled to false and inform the user."Based on your setup (GKE 1.34.1-gke.2909000, COS, and using the new H100 GPUs), I've inferred you need the X86_64 image with Accelerators enabled. The golden base image is: gke-1341-gke2909000-cos-125-19216-0-115-c-pre"
data-ai
Systematically diagnose GKE JobSet interruptions, restarts, and preemptions for AI/ML training workloads. Identifies preemption events, maintenance interruptions, bad host VMs, unhealthy pods, and coordinator worker failures.
development
Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.
development
Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.
tools
Assists in preparing applications and clusters on GKE for production.