skills/gke-multi-tenancy/SKILL.md
Guidance on implementing multi-tenancy and governance in Google Kubernetes Engine (GKE) clusters.
npx skillsauth add googlecloudplatform/gke-mcp gke-multi-tenancyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides guidance on implementing multi-tenancy and governance in Google Kubernetes Engine (GKE) clusters.
Multi-tenancy allows you to share a single GKE cluster among multiple teams or applications securely. Governance ensures that policies and resource limits are enforced.
Namespaces provide a scope for names and are the primary unit of isolation in Kubernetes.
Steps:
Example Namespace Manifest:
apiVersion: v1
kind: Namespace
metadata:
name: tenant-a
labels:
team: alpha
Role-Based Access Control (RBAC) allows you to control who has access to what resources within a namespace.
Steps:
Role with specific permissions.Role to a user or group using a RoleBinding.Example Role and RoleBinding Manifest:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: tenant-a
name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: tenant-a
subjects:
- kind: User
name: [email protected] # Name is case sensitive
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
Resource quotas prevent a single tenant from consuming all resources in the cluster.
Example ResourceQuota Manifest:
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-a-quota
namespace: tenant-a
spec:
hard:
requests.cpu: "2"
requests.memory: 4Gi
limits.cpu: "4"
limits.memory: 8Gi
data-ai
Systematically diagnose GKE JobSet interruptions, restarts, and preemptions for AI/ML training workloads. Identifies preemption events, maintenance interruptions, bad host VMs, unhealthy pods, and coordinator worker failures.
development
Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.
development
Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.
tools
Assists in preparing applications and clusters on GKE for production.