Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

primatrix/gke-tpu

Name: gke-tpu
Author: primatrix

plugins/gke-tpu/skills/gke-tpu/SKILL.md

npx skillsauth add primatrix/skills gke-tpu

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

GKE TPU Skill

Manage GKE-based TPU workloads via kubectl. Config-driven via gke.toml in the current working directory (CWD).

Commands

| Command | Description | Reference | |---|---|---| | create | Create TPU pod (single-host) or job (multi-host) | references/create.md | | sync | Sync code + install deps to all containers | references/sync.md | | run | Execute script on multi-process TPU | references/run.md | | status | Check pod/workload status | references/status.md |

Read the relevant reference file for the user's command before executing.

Configuration

Read gke.toml from the current working directory at the start of every command. This keeps configs isolated per worktree/session. Never hardcode project/cluster/zone/bucket. If gke.toml does not exist in CWD, prompt the user to create one.

[gke]
project = "<your-gcp-project>"
cluster = "<your-cluster-name>"
zone = "<your-zone>"

[tpu]
accelerator = "tpu-v6e-slice"   # nodeSelector accelerator label
topology = "4x4"                # TPU topology (determines chip count)
chips_per_node = 4              # google.com/tpu resource per container
machine_type = "ct6e-standard-4t"  # GKE machine type
max_nodes = 4                   # autoscaling max for node pool
reservation = ""                # optional: reservation name for reserved capacity

[workload]
name = "my-workload"
docker_image = "us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:jax0.8.1-rev1"
service_account = "gcs-account"

[storage]
type = "gcsfuse"                # "gcsfuse" or "pvc"
mount_path = "/inference-models"

# --- gcsfuse-specific (only when type = "gcsfuse") ---
bucket = "inference-model-storage-poc-tpu"
mount_options = "implicit-dirs,file-cache:max-parallel-downloads:256,file-cache:enable-parallel-downloads:true,file-cache:download-chunk-size-mb:128,file-cache:max-size-mb:81920,file-cache:parallel-downloads-per-file:512,metadata-cache:ttl-secs:-1,metadata-cache:stat-cache-max-size-mb:-1,metadata-cache:type-cache-max-size-mb:-1,file-cache:cache-file-for-range-read:true,file-system:kernel-list-cache-ttl-secs:-1,read_ahead_kb=1024"

# --- pvc-specific (only when type = "pvc") ---
# pvc_name = "my-model-pvc"       # name of existing PersistentVolumeClaim
# read_only = false                # mount as read-only (default: false)
# gcsfuse_backed = false           # true if PVC's StorageClass uses GCS Fuse CSI driver
                                   # when true: adds gke-gcsfuse/volumes annotation + gke-gcsfuse-cache volume
                                   # when false: plain PVC mount, no sidecar needed

[repo]
git_url = "https://github.com/sgl-project/sglang-jax.git"
remote_path = "/tmp/sglang-jax"
install_cmd = "pip install -e ."            # run in repo root
# requirements_file = "requirements-tpu.txt"  # optional: extra deps file (relative to repo root)

TPU Topology Reference

See references/tpu-topologies.md for supported topologies (v6e and v7x), machine types, and chips-per-node mappings.

Single-host (1 VM): use Pod. Multi-host (>1 VM): use Indexed Job + headless Service.

Critical Rules

Single vs multi-host: Determine from topology. chips / chips_per_node = hosts. If hosts > 1, must use Job + headless Service.
Storage: Check storage.type:
- gcsfuse: mount with gke-gcsfuse/volumes: "true" annotation and gke-gcsfuse-cache emptyDir volume.
- pvc: mount the existing PVC directly. The PVC must already exist in the namespace.
  - If storage.gcsfuse_backed = true: the PVC's StorageClass uses GCS Fuse CSI driver under the hood — still needs gke-gcsfuse/volumes: "true" annotation and gke-gcsfuse-cache emptyDir volume, otherwise mount will fail with "failed to find the sidecar container".
  - If storage.gcsfuse_backed = false (default): plain PVC mount, no gcsfuse annotation or cache volume needed.
Simultaneous launch: For multi-host, jax.distributed.initialize() must run in all pods at the same time.
Same code path: ALL processes must execute the SAME jitted computations.
Docker image must match JAX version in pyproject.toml.
Reservations: If tpu.reservation is set, use --reservation-affinity=specific with fixed node count (no autoscaling).
Multi-host verification: import jax blocks on multi-host TPU. Use /dev/vfio/ for per-pod hardware check, run command for full JAX cluster verification.

Prerequisites

See references/prerequisites.md for gcloud/kubectl install steps.

Troubleshooting

See references/troubleshooting.md for common issues.

primatrix/gke-tpu

plugins/gke-tpu/skills/gke-tpu/SKILL.md

Manage GKE-based TPU workloads — create pods/jobs via kubectl, sync code, and run multi-process benchmarks. Use when the user wants to create/manage/run TPU workloads on GKE. Reads config from gke.toml in the current working directory.

development

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add primatrix/skills gke-tpu

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 4:50 AM4.4s8 files scanned

SKILL.md

name:: gke-tpu
description:: Manage GKE-based TPU workloads — create pods/jobs via kubectl, sync code, and run multi-process benchmarks. Use when the user wants to create/manage/run TPU workloads on GKE. Reads config from gke.toml in the current working directory.

GKE TPU Skill

Manage GKE-based TPU workloads via kubectl. Config-driven via gke.toml in the current working directory (CWD).

Commands

Read the relevant reference file for the user's command before executing.

Configuration

[gke]
project = "<your-gcp-project>"
cluster = "<your-cluster-name>"
zone = "<your-zone>"

[tpu]
accelerator = "tpu-v6e-slice"   # nodeSelector accelerator label
topology = "4x4"                # TPU topology (determines chip count)
chips_per_node = 4              # google.com/tpu resource per container
machine_type = "ct6e-standard-4t"  # GKE machine type
max_nodes = 4                   # autoscaling max for node pool
reservation = ""                # optional: reservation name for reserved capacity

[workload]
name = "my-workload"
docker_image = "us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:jax0.8.1-rev1"
service_account = "gcs-account"

[storage]
type = "gcsfuse"                # "gcsfuse" or "pvc"
mount_path = "/inference-models"

# --- gcsfuse-specific (only when type = "gcsfuse") ---
bucket = "inference-model-storage-poc-tpu"
mount_options = "implicit-dirs,file-cache:max-parallel-downloads:256,file-cache:enable-parallel-downloads:true,file-cache:download-chunk-size-mb:128,file-cache:max-size-mb:81920,file-cache:parallel-downloads-per-file:512,metadata-cache:ttl-secs:-1,metadata-cache:stat-cache-max-size-mb:-1,metadata-cache:type-cache-max-size-mb:-1,file-cache:cache-file-for-range-read:true,file-system:kernel-list-cache-ttl-secs:-1,read_ahead_kb=1024"

# --- pvc-specific (only when type = "pvc") ---
# pvc_name = "my-model-pvc"       # name of existing PersistentVolumeClaim
# read_only = false                # mount as read-only (default: false)
# gcsfuse_backed = false           # true if PVC's StorageClass uses GCS Fuse CSI driver
                                   # when true: adds gke-gcsfuse/volumes annotation + gke-gcsfuse-cache volume
                                   # when false: plain PVC mount, no sidecar needed

[repo]
git_url = "https://github.com/sgl-project/sglang-jax.git"
remote_path = "/tmp/sglang-jax"
install_cmd = "pip install -e ."            # run in repo root
# requirements_file = "requirements-tpu.txt"  # optional: extra deps file (relative to repo root)

TPU Topology Reference

See references/tpu-topologies.md for supported topologies (v6e and v7x), machine types, and chips-per-node mappings.

Single-host (1 VM): use Pod. Multi-host (>1 VM): use Indexed Job + headless Service.

Critical Rules

Single vs multi-host: Determine from topology. chips / chips_per_node = hosts. If hosts > 1, must use Job + headless Service.
Storage: Check storage.type:
- gcsfuse: mount with gke-gcsfuse/volumes: "true" annotation and gke-gcsfuse-cache emptyDir volume.
- pvc: mount the existing PVC directly. The PVC must already exist in the namespace.
  - If storage.gcsfuse_backed = true: the PVC's StorageClass uses GCS Fuse CSI driver under the hood — still needs gke-gcsfuse/volumes: "true" annotation and gke-gcsfuse-cache emptyDir volume, otherwise mount will fail with "failed to find the sidecar container".
  - If storage.gcsfuse_backed = false (default): plain PVC mount, no gcsfuse annotation or cache volume needed.
Simultaneous launch: For multi-host, jax.distributed.initialize() must run in all pods at the same time.
Same code path: ALL processes must execute the SAME jitted computations.
Docker image must match JAX version in pyproject.toml.
Reservations: If tpu.reservation is set, use --reservation-affinity=specific with fixed node count (no autoscaling).
Multi-host verification: import jax blocks on multi-host TPU. Use /dev/vfio/ for per-pod hardware check, run command for full JAX cluster verification.

Prerequisites

See references/prerequisites.md for gcloud/kubectl install steps.

Troubleshooting

See references/troubleshooting.md for common issues.

Related Skills

primatrix/memory-profile

development

VerifiedTrustedCommunity

Use when analyzing TPU pretraining HBM occupancy from a profile directory — locates the static HBM peak (the same number TensorBoard's Memory Viewer shows), enumerates every buffer alive at the peak schedule moment with size / HLO instruction / opcode / op_name, and rolls the alive set up by opcode and op_name. Reads compile-time `*.hlo_proto.pb` (BufferAssignmentProto) as the primary source; runtime `*.xplane.pb` allocator events are a secondary, often-truncated signal.

SKILL.mdUpdated May 27, 2026

primatrix/memory-profile

primatrix/compute-breakdown

testing

VerifiedTrustedCommunity

Use when analyzing TPU pretraining compute efficiency from xplane.pb — produces source-line-aggregated HLO duration tables, layer-scoped breakdowns, non-compute (padding/cast/copy) audits, and v7x roofline shortfall vs theoretical peak. Reads schema documented by profile-anatomy.

SKILL.mdUpdated May 25, 2026

primatrix/compute-breakdown

primatrix/plugins/tpu-perf/skills/comm-analysis

tools

VerifiedTrustedCommunity

--- name: comm-analysis description: Use when analyzing communication on a TPU pretraining profile — extracts every comm primitive (async + sync, TC + SparseCore), attributes axes via HLO replica_groups, computes per-row NCCL bus BW vs per-axis peak ICI BW (peak_link × k_torus_dims × directions_per_dim; TPUv7x: 200 GB/s bidir per link on a 3D torus; util% requires `--mesh-spec` with topology), and reports per-step compute/comm overlap. Builds on profile-anatomy. --- # Communication Analysis **

SKILL.mdUpdated May 25, 2026

primatrix/plugins/tpu-perf/skills/comm-analysis

primatrix/profile-anatomy

documentation

VerifiedTrustedCommunity

Use when reading TPU pretraining profiles (xplane.pb, trace.json.gz) — describes the on-disk layout, the XSpace/XPlane/XLine/XEvent/XStat hierarchy, and provides reference scripts that future tpu-perf skills can read as schema documentation.

SKILL.mdUpdated May 24, 2026

primatrix/profile-anatomy

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/primatrix/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/plugins/gke-tpu/skills/gke-tpu ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

primatrix/skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT