.claude/skills/deploying-cloud-k8s/SKILL.md
Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.
npx skillsauth add Asmayaseen/hackathon-2 deploying-cloud-k8sInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'--setNext.js NEXT_PUBLIC_* variables are embedded at build time, not runtime:
# WRONG: Runtime ENV does nothing for NEXT_PUBLIC_*
ENV NEXT_PUBLIC_API_URL=https://api.example.com
# RIGHT: Must be build ARG
ARG NEXT_PUBLIC_API_URL=https://api.example.com
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
| Variable | Purpose |
|----------|---------|
| NEXT_PUBLIC_SSO_URL | SSO endpoint for browser OAuth |
| NEXT_PUBLIC_API_URL | API endpoint for browser fetch |
| NEXT_PUBLIC_APP_URL | App URL for redirects |
| Variable | Source |
|----------|--------|
| DATABASE_URL | Secret (Neon/managed DB) |
| SSO_URL | ConfigMap (internal K8s: http://sso:3001) |
| BETTER_AUTH_SECRET | Secret |
BEFORE ANY DEPLOYMENT, check architecture:
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
# Output: arm64 arm64 OR amd64 amd64
- uses: docker/build-push-action@v5
with:
platforms: linux/arm64 # MATCH YOUR CLUSTER!
provenance: false # Avoid manifest issues
no-cache: true # When debugging
Why provenance: false? Buildx attestation creates complex manifest lists that cause "no match for platform" errors.
jobs:
changes:
runs-on: ubuntu-latest
outputs:
api: ${{ steps.filter.outputs.api }}
web: ${{ steps.filter.outputs.web }}
steps:
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
api:
- 'apps/api/**'
web:
- 'apps/web/**'
build-api:
needs: changes
if: needs.changes.outputs.api == 'true'
- name: Build and push (web)
uses: docker/build-push-action@v5
with:
build-args: |
NEXT_PUBLIC_SSO_URL=https://sso.${{ vars.DOMAIN }}
NEXT_PUBLIC_API_URL=https://api.${{ vars.DOMAIN }}
- name: Deploy
run: |
helm upgrade --install myapp ./helm/myapp \
--set global.imageTag=${{ github.sha }} \
--set "secrets.databaseUrl=${{ secrets.DATABASE_URL }}" \
--set "secrets.authSecret=${{ secrets.BETTER_AUTH_SECRET }}"
Pod not running?
│
├─► ImagePullBackOff
│ ├─► "not found" ──► Wrong tag or registry
│ ├─► "unauthorized" ──► Auth/imagePullSecrets
│ └─► "no match for platform" ──► Architecture mismatch
│
├─► CrashLoopBackOff
│ ├─► "exec format error" ──► Wrong CPU architecture
│ ├─► Exit code 1 ──► App startup failure
│ └─► OOMKilled ──► Memory limits too low
│
└─► Pending
├─► Insufficient resources ──► Scale cluster
└─► No matching node ──► Check nodeSelector
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -E "(Image:|Failed|Error)"
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
kubectl logs <pod-name> -n <namespace> --tail=50
Causes:
Fix: Verify image was pushed with exact tag used in deployment
Cause: Image built for wrong architecture OR buildx provenance issue
Fix:
platforms: linux/arm64 # Match cluster!
provenance: false # Simple manifest
no-cache: true # Force rebuild
Cause: Binary architecture doesn't match node
Fix: Rebuild with correct platform, use no-cache: true
failed parsing --set data: key "com" has no value
Cause: Helm interprets commas as array separators
Fix: Use heredoc values file:
- name: Deploy
run: |
cat > /tmp/overrides.yaml << EOF
sso:
env:
ALLOWED_ORIGINS: "https://a.com,https://b.com"
EOF
helm upgrade --install app ./chart --values /tmp/overrides.yaml
Cause: Password with special characters (base64 +/=)
Fix: Use hex passwords:
# Wrong
openssl rand -base64 16 # Can have +/=
# Right
openssl rand -hex 16 # Alphanumeric only
Cause: request.url returns container bind address
Fix:
const APP_URL = process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000";
const response = NextResponse.redirect(new URL("/", APP_URL));
provenance: false setplatforms: linux/<arch> matches clusterNEXT_PUBLIC_* as build args--set (not in values.yaml)--set values# 1. Frontend logs
kubectl logs deploy/web -n myapp --tail=50
# 2. API logs
kubectl logs deploy/api -n myapp --tail=100 | grep -i error
# 3. Sidecar logs (Dapr, etc.)
kubectl logs deploy/api -n myapp -c daprd --tail=50
| Error | Likely Cause |
|-------|--------------|
| AttributeError: no attribute 'X' | Model/schema mismatch |
| 404 Not Found on internal call | Wrong endpoint URL |
| Times off by hours | Timezone handling bug |
| greenlet_spawn not called | Async SQLAlchemy pattern |
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
source:
repoURL: https://github.com/org/repo.git
path: k8s/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true # Delete resources not in Git
selfHeal: true # Fix drift automatically
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
interval: 30s
# Pod Security Context
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
# HPA + PDB
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
---
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
minAvailable: 1
See references/production-patterns.md for full GitOps, observability, security, and resilience patterns.
Run: python scripts/verify.py
containerizing-applications - Docker and Helm chartsoperating-k8s-local - Local Kubernetes with Minikubebuilding-nextjs-apps - Next.js patternsdevelopment
Systematic methodology for debugging bugs, test failures, and unexpected behavior. Use when encountering any technical issue before proposing fixes. Covers root cause investigation, pattern analysis, hypothesis testing, and fix implementation. Use ESPECIALLY when under time pressure, "just one quick fix" seems obvious, or you've already tried multiple fixes. NOT for exploratory code reading.
development
Build beautiful, accessible UIs with shadcn/ui components in Next.js. Use when creating forms, dialogs, tables, sidebars, or any UI components. Covers installation, component patterns, react-hook-form + Zod validation, and dark mode setup. NOT when building non-React applications or using different component libraries.
tools
Implement real-time streaming UI patterns for AI chat applications. Use when adding response lifecycle handlers, progress indicators, client effects, or thread state synchronization. Covers onResponseStart/End, onEffect, ProgressUpdateEvent, and client tools. NOT when building basic chat without real-time feedback.
tools
Builds AI agents using OpenAI Agents SDK with async/await patterns and multi-agent orchestration. Use when creating tutoring agents, building agent handoffs, implementing tool-calling agents, or orchestrating multiple specialists. Covers Agent class, Runner patterns, function tools, guardrails, and streaming responses. NOT when using raw OpenAI API without SDK or other agent frameworks like LangChain.