skills/gke-workload-security/SKILL.md
Workflows for auditing and hardening the security of GKE workloads.
npx skillsauth add googlecloudplatform/gke-mcp gke-workload-securityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
This skill has been flagged as suspicious. Review the scan results before using.
2 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides workflows and best practices for securing GKE workloads. It covers security auditing, Identity and Access Management (Workload Identity), Network Security (Network Policies), and Node Security.
Assess the current security posture of your cluster using the provided audit script.
Capabilities:
Command:
./scripts/audit_cluster.sh <cluster-name> <region> <project-id>
Workload Identity allows Kubernetes Service Accounts (KSAs) to impersonate Google Service Accounts (GSAs). This is the recommended method for workloads to access Google Cloud APIs.
Steps:
Create Namespace and KSA:
kubectl create namespace workload-identity-test-ns
kubectl create serviceaccount <ksa-name> \
--namespace workload-identity-test-ns
Bind KSA to GSA:
gcloud iam service-accounts add-iam-policy-binding <gsa-name>@<project-id>.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<project-id>.svc.id.goog[workload-identity-test-ns/<ksa-name>]"
Annotate KSA:
kubectl annotate serviceaccount <ksa-name> \
--namespace workload-identity-test-ns \
iam.gke.io/gcp-service-account=<gsa-name>@<project-id>.iam.gserviceaccount.com
Verify Example Pod:
Use existing asset assets/workload-identity-pod.yaml to test the
configuration. Update the <ksa-name> in the file first.
kubectl apply -f ./assets/workload-identity-pod.yaml -n workload-identity-test-ns
Control traffic flow between Pods using Network Policies. By default, all traffic is allowed.
Enable Network Policy Enforcement:
gcloud container clusters update <cluster-name> \
--update-addons=NetworkPolicy=ENABLED \
--region <region>
[!NOTE] If your cluster uses Dataplane V2 (
--enable-dataplane-v2), Network Policy enforcement is built-in and this step is not required (and may fail).
Apply Default Deny Policy: Isolate namespaces by denying all ingress and egress traffic by default.
Replace <target-namespace> with the namespace you want to isolate.
kubectl apply -f ./assets/default-deny-netpol.yaml -n <target-namespace>
Ensure nodes are running with verifiable integrity.
Command:
gcloud container clusters update <cluster-name> \
--enable-shielded-nodes \
--region <region>
Run untrusted workloads in a sandbox for extra isolation.
Enable GKE Sandbox:
gcloud container clusters update <cluster-name> \
--enable-gke-sandbox \
--region <region>
Run a Sandboxed Pod:
Add runtimeClassName: gvisor to your Pod spec.
Enforce security policies on namespaces using labels.
Enforce Restricted Profile:
kubectl label --overwrite ns <namespace> \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=latest
[!NOTE] Using
latestensures you use the policies corresponding to the cluster's current version. You can pin it to a specific version (e.g.,v1.30) to lock down the namespace to policies of a specific release.
Mount secrets from Google Cloud Secret Manager directly as volumes in your pods.
Prerequisites: Secret Manager CSI driver must be enabled on the cluster.
Example SecretProviderClass:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: my-secret-provider
spec:
provider: gcp
parameters:
secrets: |
- resourceName: "projects/<project-id>/secrets/my-secret/versions/latest"
fileName: "my-secret-file"
Example Pod Spec excerpt:
spec:
containers:
- name: my-app
volumeMounts:
- name: secrets-store-inline
mountPath: "/mnt/secrets"
readOnly: true
volumes:
- name: secrets-store-inline
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "my-secret-provider"
If using GKE Dataplane V2, you can log allowed and denied connections.
Steps:
NetworkLogging custom resource.Example NetworkLogging Manifest:
apiVersion: networking.gke.io/v1alpha1
kind: NetworkLogging
metadata:
name: default
spec:
cluster:
allow:
log: true
delegate: true
deny:
log: true
delegate: true
This will log connection details to Cloud Logging.
baseline or restricted Pod Security Standards on all non-system namespaces.data-ai
Systematically diagnose GKE JobSet interruptions, restarts, and preemptions for AI/ML training workloads. Identifies preemption events, maintenance interruptions, bad host VMs, unhealthy pods, and coordinator worker failures.
development
Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.
development
Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.
tools
Assists in preparing applications and clusters on GKE for production.