skills/gke/SKILL.md
Use when working with GKE, kubectl, Kubernetes manifests, k8s directories, Helm charts, node pools, workload identity, cluster scaling, GPU nodes, database sidecars, or GKE troubleshooting.
npx skillsauth add cofin/flow gkeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
GKE is Google Cloud's managed Kubernetes service, handling cluster management, upgrades, scaling, GPU workloads, and production database connectivity via Auth Proxy sidecars.
resources:
limits:
nvidia.com/gpu: "1" # GPU in limits ONLY — never in requests
Add toleration for tainted GPU nodes:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# 1. Annotate the KSA with the GCP SA email
kubectl annotate serviceaccount KSA_NAME \
--namespace=NAMESPACE \
iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
# 2. Bind GCP SA to allow KSA impersonation
gcloud iam service-accounts add-iam-policy-binding \
GSA_NAME@PROJECT_ID.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
- name: alloydb-auth-proxy
image: gcr.io/alloydb-connectors/alloydb-auth-proxy:latest
args:
- "projects/PROJECT_ID/locations/REGION/clusters/CLUSTER/instances/INSTANCE"
- "--port=5432"
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 65532
capabilities:
drop: [ALL]
See alloydb-on-gke.md for the full production pattern.
# Cluster access
gcloud container clusters get-credentials CLUSTER --region=REGION
kubectl config use-context CONTEXT_NAME
# Core operations
kubectl get nodes
kubectl get pods -A
kubectl logs -f POD_NAME -n NAMESPACE
kubectl exec -it POD_NAME -n NAMESPACE -- /bin/sh
kubectl apply -f manifest.yaml
kubectl apply or Helm chart with per-component values (web, workers).kubectl logs, kubectl describe, kubectl top.chart/
Chart.yaml
values.yaml
templates/
_helpers.tpl
web-deployment.yaml
web-service.yaml
worker-deployment.yaml
migration-job.yaml
Structure values.yaml with separate sections per component (web, workers), each specifying replicaCount, image, command, resources, and port.
Connect to AlloyDB via the Auth Proxy sidecar + Workload Identity. The proxy runs as a sidecar and listens on localhost:5432. Application connects to postgresql://user:password@localhost:5432/dbname.
Key roles for GSA: roles/alloydb.client, roles/secretmanager.secretAccessor, roles/storage.objectAdmin, roles/logging.logWriter.
See alloydb-on-gke.md for full deployment, HPA with queue-depth metrics, CronJob queue monitor, and Job patterns.
Connect to Cloud SQL via the cloud-sql-proxy sidecar. Same Workload Identity pattern; GSA needs roles/cloudsql.client.
See cloudsql-on-gke.md for pod spec and connection string format.
| GPU Type | Machine Series | Notes | |---|---|---| | NVIDIA T4 | N1 | Cost-effective inference | | NVIDIA L4 | G2 | Efficient inference/fine-tuning | | NVIDIA A100 (40/80GB) | A2 | Large-scale training, MIG support | | NVIDIA H100 (80GB) | A3 | Highest throughput, MIG support |
Autopilot GPU: automatic driver install, pay-per-pod billing, MIG enabled by default (v1.29.3+). Simpler operations.
Standard GPU: manual driver install via DaemonSet or GPU Operator (helm install gpu-operator nvidia/gpu-operator). Full node control.
# Minimal GPU pod spec
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
containers:
- name: trainer
image: nvcr.io/nvidia/pytorch:24.01-py3
resources:
limits:
nvidia.com/gpu: "1" # GPU in limits only; limits == requests for GPU
See gpu.md for time-sharing, MIG, NAP, Spot GPU, and TPU patterns.
Choose Autopilot (Google-managed nodes, pay-per-pod) or Standard (full node control). Use regional clusters for production HA. Enable Workload Identity at cluster creation.
# Create GSA + grant permissions
gcloud iam service-accounts create GSA_NAME
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:GSA_NAME@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.admin"
# Create KSA + bind to GSA
kubectl create serviceaccount KSA_NAME --namespace NAMESPACE
gcloud iam service-accounts add-iam-policy-binding \
GSA_NAME@PROJECT_ID.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
# Annotate KSA
kubectl annotate serviceaccount KSA_NAME \
--namespace=NAMESPACE \
iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
Apply manifests or install Helm chart. Set resource requests/limits on every container. Add PodDisruptionBudgets for availability during upgrades.
Run kubectl get pods -n NAMESPACE to confirm healthy rollout. Check logs and events for errors.
nvidia.com/gpu in requests; limits implicitly equal requests for GPU resources.nvidia.com/gpu=present:NoSchedule to prevent non-GPU pods from landing on expensive GPU nodes.runAsNonRoot: true, runAsUser: 65532, runAsGroup: 65532, fsGroup: 65532, allowPrivilegeEscalation: false, capabilities.drop: [ALL].Before delivering GKE configurations, verify:
limits only (not requests)nvidia.com/gpu=present:NoSchedule taintrunAsNonRoot: true, runAsUser: 65532, capabilities.drop: [ALL]Task: Deploy a web application with a Service on GKE.
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
serviceAccountName: web-app-ksa # Workload Identity KSA
containers:
- name: web
image: us-central1-docker.pkg.dev/my-project/repo/web-app:v1.2.0
ports:
- containerPort: 8080
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: "1"
memory: 1Gi
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: web-app
namespace: production
spec:
selector:
app: web-app
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
# pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-app-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: web-app
</example>
No Gemini CLI extension exists for GKE -- this skill provides unique value for GKE cluster management, GPU workloads, and production database connectivity patterns.
For detailed guides and configuration examples, refer to the following documents in references/:
development
Use when tracing execution paths, mapping dependencies, understanding unfamiliar code, following data flow, investigating end-to-end behavior, debugging call chains, or deciding which files to read next.
development
Use when reviewing authentication, authorization, user input, secrets, API keys, database queries, file uploads, session management, external API calls, OWASP risks, or data handling attack surface.
testing
Use when analyzing tradeoffs, comparing approaches, weighing options, assessing risks, stress-testing conclusions, identifying blind spots, or applying multiple viewpoints to a decision.
development
Use when reviewing hot paths, slow code, database queries, N+1 risks, memory usage, loops, I/O, caching strategy, concurrency, latency-sensitive paths, or resource efficiency.