plugins/exec-remote/skills/deploy-cluster/SKILL.md
Deploys a SkyPilot-managed TPU cluster on GKE. Automatically ensures the required node pool exists for the requested TPU type, creating one if necessary. Supports running multiple TPU types in parallel on the same GKE cluster.
npx skillsauth add primatrix/skills deploy-clusterInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill deploys a SkyPilot-managed TPU cluster on an existing GKE cluster. It builds on the apply-resource skill which handles GKE cluster creation via xpk.
Key Feature: Each TPU type gets its own SkyPilot cluster (named <cluster>-<username>-<tpu_type>), allowing multiple topologies to run in parallel on the same GKE cluster. Node pools are automatically managed per TPU type.
pip install skypilot
sky --helpgcloud auth login to authenticateThe following defaults apply unless the user explicitly overrides them:
| Parameter | Default |
|----------------|----------------------------|
| PROJECT_ID | tpu-service-473302 |
| CLUSTER_NAME | sglang-jax-agent-tests |
| ZONE | asia-northeast1-b |
Use these values directly — do NOT ask the user to confirm or re-enter them unless they specify otherwise.
tpu-service-473302)v6e-1, v6e-4, v6e-16) — must be specifiedasia-northeast1-b)sglang-jax-agent-tests)If all parameters are already known from an upstream caller (e.g., exec-remote), use them directly -- do NOT re-ask. Only prompt interactively when this skill is invoked standalone and the user wants to override defaults.
Each GKE node exposes 4 TPU chips (google.com/tpu: 4), except v6e-1 which exposes 1 chip.
Therefore: num_nodes = total_chips / 4, and every pod always requests 4 chips (1 for v6e-1).
| Type | Topology | Chips/Host | Nodes | Machine Type | |------|----------|------------|-------|--------------| | v6e-1 | 1x1 | 1 | 1 | ct6e-standard-1t | | v6e-4 | 2x2 | 4 | 1 | ct6e-standard-4t | | v6e-8 | 2x4 | 4 | 2 | ct6e-standard-4t | | v6e-16 | 4x4 | 4 | 4 | ct6e-standard-4t | | v6e-32 | 4x8 | 4 | 8 | ct6e-standard-4t | | v6e-64 | 8x8 | 4 | 16 | ct6e-standard-4t | | v6e-128 | 8x16 | 4 | 32 | ct6e-standard-4t | | v6e-256 | 16x16 | 4 | 64 | ct6e-standard-4t |
Zone vs Region: xpk always creates GKE clusters at the region level (e.g.,
asia-northeast1), even when given a zone likeasia-northeast1-b. The deploy script handles this automatically -- you may pass either a zone or a region.
Use the apply-resource skill to create the GKE cluster (or confirm it already exists). This only needs to be done once:
/apply-resource create
Carry forward the resulting CLUSTER_NAME, TPU_TYPE, and ZONE for Step 2.
Before deploying SkyPilot, ensure the GKE cluster status is RUNNING:
gcloud container clusters list --project=$PROJECT_ID \
--filter="name=<CLUSTER_NAME>" --format="table(name,location,status)"
If status is RECONCILING or PROVISIONING, wait until it becomes RUNNING.
Run the deploy script (located in the scripts/ directory alongside this skill definition):
python scripts/deploy.py <TPU_TYPE> [CLUSTER_NAME] [ZONE]
Only TPU_TYPE is required. CLUSTER_NAME defaults to sglang-jax-agent-tests, ZONE defaults to asia-northeast1-b.
This script will:
gcloudtpu-<TPU_TYPE>, e.g., tpu-v6e-1)~/.sky/config.yaml from the template with correct TPU parameterssetup.yaml with the correct num_nodessky launch -c <CLUSTER_NAME>-<USERNAME>-<TPU_TYPE> -r <setup.yaml>.cluster_name_tpu in the plugin root (for exec-remote integration)sky status # Check cluster status
sky exec <CLUSTER_NAME> 'echo hello' # Test remote execution
The deploy script intelligently manages GKE node pools:
machineType and tpuTopology. This detects pools created by xpk, manually, or by previous runs.tpu-<type> (e.g., tpu-v6e-1, tpu-v6e-4). Single-host TPUs (v6e-1, v6e-4) omit --tpu-topology as GKE infers it from the machine type.nodeSelector ensures pods land on the correct pool.--spot and autoscaling (--min-nodes=0).# First time: create cluster via apply-resource (uses defaults)
/apply-resource create
# Deploy both TPU types (sequentially — config.yaml is global)
python scripts/deploy.py v6e-1
# Creates SkyPilot cluster: sglang-jax-agent-tests-hongmao-v6e-1
python scripts/deploy.py v6e-4
# Creates SkyPilot cluster: sglang-jax-agent-tests-hongmao-v6e-4
# Run tests in parallel on both clusters
sky exec sglang-jax-agent-tests-hongmao-v6e-1 'python test/srt/run_suite.py --suite unit-test-tpu-v6e-1' &
sky exec sglang-jax-agent-tests-hongmao-v6e-4 'python test/srt/run_suite.py --suite e2e-test-tpu-v6e-4' &
wait
Note:
deploy.pycalls must be sequential because~/.sky/config.yamlis a global file shared by all SkyPilot operations. However, once both clusters are launched,sky execcommands can run fully in parallel since pods already have the correct node affinity baked in.
The deploy script (scripts/deploy.py) automates:
gcloud container clusters get-credentialsgcloud beta container node-pools create with correct TPU paramsconfig.yaml template -> replaces placeholders -> writes to ~/.sky/config.yamlsetup.yaml template -> replaces <NUM_NODES> -> writes to temp filesky launch -c <cluster>-<user>-<tpu_type> -r <setup.yaml>To tear down SkyPilot clusters:
sky down <CLUSTER_NAME>-<USERNAME>-v6e-1
sky down <CLUSTER_NAME>-<USERNAME>-v6e-4
To also remove the GKE cluster:
/apply-resource delete
development
Use when analyzing TPU pretraining HBM occupancy from a profile directory — locates the static HBM peak (the same number TensorBoard's Memory Viewer shows), enumerates every buffer alive at the peak schedule moment with size / HLO instruction / opcode / op_name, and rolls the alive set up by opcode and op_name. Reads compile-time `*.hlo_proto.pb` (BufferAssignmentProto) as the primary source; runtime `*.xplane.pb` allocator events are a secondary, often-truncated signal.
testing
Use when analyzing TPU pretraining compute efficiency from xplane.pb — produces source-line-aggregated HLO duration tables, layer-scoped breakdowns, non-compute (padding/cast/copy) audits, and v7x roofline shortfall vs theoretical peak. Reads schema documented by profile-anatomy.
tools
--- name: comm-analysis description: Use when analyzing communication on a TPU pretraining profile — extracts every comm primitive (async + sync, TC + SparseCore), attributes axes via HLO replica_groups, computes per-row NCCL bus BW vs per-axis peak ICI BW (peak_link × k_torus_dims × directions_per_dim; TPUv7x: 200 GB/s bidir per link on a 3D torus; util% requires `--mesh-spec` with topology), and reports per-step compute/comm overlap. Builds on profile-anatomy. --- # Communication Analysis **
documentation
Use when reading TPU pretraining profiles (xplane.pb, trace.json.gz) — describes the on-disk layout, the XSpace/XPlane/XLine/XEvent/XStat hierarchy, and provides reference scripts that future tpu-perf skills can read as schema documentation.