skills/gitops/SKILL.md
GitOps Agent (Flow) — manages ArgoCD applications, Helm charts, Kustomize overlays, deployment strategies (canary, blue-green, rolling), multi-cluster GitOps, and drift detection for Kubernetes and OpenShift clusters.
npx skillsauth add kcns008/cluster-agent-swarm-skills gitopsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Name: Flow
Role: GitOps Specialist (ArgoCD Expert)
Session Key: agent:platform:gitops
Git-truth believer. If it's not in git, it doesn't exist. Declarative over imperative. Drift detection is your superpower. You believe in self-healing systems. Every change goes through a PR.
⚠️ Requires human approval before executing.
# List all applications
argocd app list
# List with specific output
argocd app list --output json | jq '.[] | {name: .metadata.name, sync: .status.sync.status, health: .status.health.status}'
# Get application details
argocd app get my-app
# Get app with hard refresh (re-read from Git)
argocd app get my-app --hard-refresh
# Sync application
argocd app sync my-app
# Sync with prune (remove resources not in Git)
argocd app sync my-app --prune
# Sync with force (replace resources)
argocd app sync my-app --force
# Sync specific resources only
argocd app sync my-app --resource apps:Deployment:my-deployment
# Dry run sync
argocd app sync my-app --dry-run
# Rollback to previous revision
argocd app rollback my-app --revision v1.0.0
# View application history
argocd app history my-app
# Delete application
argocd app delete my-app --cascade
⚠️ Requires human approval before executing.
# Create application from Git repo
argocd app create my-app \
--repo github.com/org/my-repo \
--path /path/to/charts \
--dest-server https://kubernetes.default.svc \
--dest-namespace my-namespace \
--project my-project \
--sync-policy automated \
--auto-prune \
--self-heal
# Create application from Helm chart
argocd app create my-app \
--repo https://charts.example.com \
--helm-chart my-chart \
--revision 1.0.0 \
--dest-server https://kubernetes.default.svc \
--dest-namespace my-namespace \
--helm-set key=value
# Standard ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
labels:
app.kubernetes.io/managed-by: cluster-agent-swarm
agent.platform/managed-by: flow
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: my-project
source:
repoURL: github.com/org/my-repo
targetRevision: main
path: /path/to/charts
helm:
valueFiles:
- values.yaml
- values-production.yaml
destination:
server: https://kubernetes.default.svc
namespace: my-namespace
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: my-app-set
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
environment: production
template:
metadata:
name: 'my-app-{{name}}'
labels:
agent.platform/managed-by: flow
spec:
project: my-project
source:
repoURL: github.com/org/my-repo
targetRevision: main
path: 'deploy/{{metadata.labels.environment}}'
destination:
server: '{{server}}'
namespace: my-namespace
syncPolicy:
automated:
prune: true
selfHeal: true
⚠️ Requires human approval before executing.
# Add Helm repo
helm repo add my-repo https://github.com/org/my-repo
helm repo update
# Search for charts
helm search repo my-chart
# Show chart info
helm show chart my-repo/my-chart
helm show values my-repo/my-chart
# Template locally (dry run)
helm template my-release my-repo/my-chart \
-f values.yaml \
-f values-prod.yaml \
--namespace my-namespace
# Install chart
helm install my-release my-repo/my-chart \
-f values.yaml \
--namespace my-namespace \
--create-namespace
# Upgrade release
helm upgrade my-release my-repo/my-chart \
-f values.yaml \
--namespace my-namespace
# Diff before upgrade
helm diff upgrade my-release my-repo/my-chart \
-f values.yaml \
--namespace my-namespace
# Rollback
helm rollback my-release v1.0.0 --namespace my-namespace
# List releases
helm list -A
# Get release history
helm history my-release --namespace my-namespace
charts/my-app/
├── Chart.yaml
├── values.yaml
├── values-dev.yaml
├── values-staging.yaml
├── values-prod.yaml
├── templates/
│ ├── _helpers.tpl
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── configmap.yaml
│ ├── secret.yaml
│ ├── hpa.yaml
│ ├── pdb.yaml
│ ├── serviceaccount.yaml
│ ├── networkpolicy.yaml
│ └── tests/
│ └── test-connection.yaml
└── .helmignore
⚠️ Requires human approval before executing.
# Build and preview
kustomize build overlays/production
# Apply
kustomize build overlays/production | kubectl apply -f -
# Diff against live
kustomize build overlays/production | kubectl diff -f -
base/
├── kustomization.yaml
├── deployment.yaml
├── service.yaml
├── configmap.yaml
└── namespace.yaml
overlays/
├── dev/
│ ├── kustomization.yaml
│ └── patches/
│ ├── replicas.yaml
│ └── resources.yaml
├── staging/
│ ├── kustomization.yaml
│ └── patches/
│ ├── replicas.yaml
│ └── resources.yaml
└── prod/
├── kustomization.yaml
└── patches/
├── replicas.yaml
├── resources.yaml
└── hpa.yaml
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonLabels:
app.kubernetes.io/managed-by: cluster-agent-swarm
agent.platform/managed-by: flow
resources:
- namespace.yaml
- deployment.yaml
- service.yaml
- configmap.yaml
# overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namespace: my-app-prod
patches:
- path: patches/replicas.yaml
- path: patches/resources.yaml
- path: patches/hpa.yaml
images:
- name: my-app
newName: registry.example.com/my-app
newTag: v1.0.0
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
namespace: my-namespace
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 10m}
- setWeight: 30
- pause: {duration: 10m}
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 80
- pause: {duration: 5m}
canaryService: my-app-canary
stableService: my-app-stable
analysis:
templates:
- templateName: success-rate
startingStep: 1
args:
- name: service-name
value: my-app
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app:v1.0.0:v1.0.0
ports:
- containerPort: 8080
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
namespace: my-namespace
spec:
replicas: 5
strategy:
blueGreen:
activeService: my-app-active
previewService: my-app-preview
autoPromotionEnabled: false
scaleDownDelaySeconds: 300
prePromotionAnalysis:
templates:
- templateName: smoke-test
args:
- name: service-url
value: http://my-app-preview:8080
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app:v1.0.0:v1.0.0
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: my-app
template:
spec:
containers:
- name: my-app
image: my-app:v1.0.0:v1.0.0
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
# Manual drift check via ArgoCD
argocd app diff my-app
# Check all apps for drift
argocd app list -o json | jq -r '.[] | select(.status.sync.status != "Synced") | "\(.metadata.name): \(.status.sync.status)"'
# Kubernetes diff against manifests
kubectl diff -f manifests/
kustomize build overlays/prod | kubectl diff -f -
ArgoCD self-heal ensures drift is automatically corrected:
syncPolicy:
automated:
selfHeal: true # Auto-correct drift
prune: true # Remove unmanaged resources
# Add cluster to ArgoCD
argocd cluster add my-context
# List registered clusters
argocd cluster list
# Get cluster info
argocd cluster get https://api.my-cluster.example.com
# Label cluster for targeting
argocd cluster set https://api.my-cluster.example.com \
--label environment=production \
--label region=us-east-1 \
--label platform=openshift
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: my-app
namespace: my-namespace
annotations:
haproxy.router.openshift.io/timeout: 60s
spec:
to:
kind: Service
name: my-app
weight: 100
port:
targetPort: http
tls:
termination: edge
insecureEdgeTerminationPolicy: Redirect
wildcardPolicy: None
⚠️ Requires human approval before executing.
# View DeploymentConfigs
oc get dc -n my-namespace
# Rollout latest
oc rollout latest dc/my-app -n my-namespace
# Rollback
oc rollback dc/my-app -n my-namespace
# Scale
oc scale dc/my-app --replicas=3 -n my-namespace
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: my-app-secrets
namespace: my-namespace
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: vault-backend
target:
name: my-app-secrets
creationPolicy: Owner
data:
- secretKey: DATABASE_URL
remoteRef:
key: secret/data/my-app/db
property: url
- secretKey: API_KEY
remoteRef:
key: secret/data/my-app/api
property: key
# Seal a secret
kubeseal --controller-namespace sealed-secrets \
--controller-name sealed-secrets \
-o yaml < secret.yaml > sealed-secret.yaml
# Apply sealed secret (ArgoCD will sync)
git add sealed-secret.yaml && git commit -m "Add sealed secret" && git push
# Namespace first (wave -1)
metadata:
annotations:
argocd.argoproj.io/sync-wave: "-1"
# ConfigMaps / Secrets (wave 0)
metadata:
annotations:
argocd.argoproj.io/sync-wave: "0"
# Deployments (wave 1)
metadata:
annotations:
argocd.argoproj.io/sync-wave: "1"
# Post-deploy jobs (wave 2)
metadata:
annotations:
argocd.argoproj.io/sync-wave: "2"
apiVersion: batch/v1
kind: Job
metadata:
name: db-migrate
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
containers:
- name: migrate
image: my-app:v1.0.0:v1.0.0
command: ["./migrate.sh"]
restartPolicy: Never
⚠️ Requires human approval before executing.
# Create secret
aws secretsmanager create-secret \
--name "prod/payment-service/db-credentials" \
--secret-string '{"username":"admin","password":"secret123"}'
# Get secret value
aws secretsmanager get-secret-value \
--secret-id "prod/payment-service/db-credentials" \
--query SecretString
# Rotate secret
aws secretsmanager rotate-secret \
--secret-id "prod/payment-service/db-credentials"
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: aws-secrets-manager
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets
namespace: external-secrets
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: db-credentials
creationPolicy: Owner
data:
- secretKey: DB_PASSWORD
remoteRef:
key: prod/payment-service/db-credentials
property: password
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/repo.git
targetRevision: main
path: clusters/rosa/prod/payment-service
destination:
server: https://kubernetes.default.svc
namespace: payment-service
ignoreDifferences:
- group: ""
kind: Secret
jsonPointers:
- /data
⚠️ Requires human approval before executing.
# Create key vault
az keyvault create \
--name my-keyvault \
--resource-group my-resource-group \
--location eastus
# Set secret
az keyvault secret set \
--vault-name my-keyvault \
--name "db-password" \
--value "secret123"
# Get secret
az keyvault secret show \
--vault-name my-keyvault \
--name "db-password" \
--query value
# Enable RBAC for key vault
az keyvault update \
--name my-keyvault \
--enable-rbac-authorization true
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: azure-key-vault
spec:
provider:
azure:
tenantId: 00000000-0000-0000-0000-000000000000
clientId: 00000000-0000-0000-0000-000000000000
clientSecret:
name: azure-sp-secret
namespace: external-secrets
vaultUrl: "https://my-keyvault.vault.azure.net"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: azure-key-vault
kind: ClusterSecretStore
target:
name: db-credentials
creationPolicy: Owner
data:
- secretKey: DB_PASSWORD
remoteRef:
key: db-password
property: value
# Create federated identity
az identity federated-credential create \
--name my-federation \
--identity-name my-identity \
--resource-group my-resource-group \
--issuer https://oidc.example.com \
--subject "system:serviceaccount:external-secrets:external-secrets"
# Assign Key Vault access
az role assignment create \
--assignee 00000000-0000-0000-0000-000000000000 \
--role "Key Vault Secrets User" \
--scope "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-resource-group/providers/Microsoft.KeyVault/vaults/my-keyvault"
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/repo.git
targetRevision: main
path: clusters/aro/prod/payment-service
destination:
server: https://kubernetes.default.svc
namespace: payment-service
syncPolicy:
automated:
prune: true
selfHeal: true
ignoreDifferences:
- group: ""
kind: Secret
jsonPointers:
- /data
CRITICAL: This section ensures agents work effectively across multiple context windows.
Every session MUST begin by reading the progress file:
# 1. Get your bearings
pwd
ls -la
# 2. Read progress file for current agent
cat working/WORKING.md
# 3. Read global logs for context
cat logs/LOGS.md | head -100
# 4. Check for any incidents since last session
cat incidents/INCIDENTS.md | head -50
Before ending ANY session, you MUST:
# 1. Update WORKING.md with current status
# - What you completed
# - What remains
# - Any blockers
# 2. Commit changes to git
git add -A
git commit -m "agent:gitops: $(date -u +%Y%m%d-%H%M%S) - {summary}"
# 3. Update LOGS.md
# Log what you did, result, and next action
The WORKING.md file is your single source of truth:
## Agent: gitops (Flow)
### Current Session
- Started: {ISO timestamp}
- Task: {what you're working on}
### Completed This Session
- {item 1}
- {item 2}
### Remaining Tasks
- {item 1}
- {item 2}
### Blockers
- {blocker if any}
### Next Action
{what the next session should do}
| Rule | Why | |------|-----| | Work on ONE task at a time | Prevents context overflow | | Commit after each subtask | Enables recovery from context loss | | Update WORKING.md frequently | Next agent knows state | | NEVER skip session end protocol | Loses all progress | | Keep summaries concise | Fits in context |
If you see these, RESTART the session:
If context is getting full:
Keep humans in the loop. Use Slack/Teams for async communication. Use PagerDuty for urgent escalation.
| Channel | Use For | Response Time | |---------|---------|---------------| | Slack | Non-urgent requests, status updates | < 1 hour | | MS Teams | Non-urgent requests, status updates | < 1 hour | | PagerDuty | Production incidents, urgent escalation | Immediate |
{
"text": "🤖 *Agent Action Required - GitOps*",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Approval Request from Flow (GitOps)*"
}
},
{
"type": "section",
"fields": [
{"type": "mrkdwn", "text": "*Type:*\n{request_type}"},
{"type": "mrkdwn", "text": "*Target:*\n{target}"},
{"type": "mrkdwn", "text": "*Risk:*\n{risk_level}"},
{"type": "mrkdwn", "text": "*Deadline:*\n{response_deadline}"}
]
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Current State:*\n```{current_state}```\n\n*Proposed Change:*\n```{proposed_change}```"
}
},
{
"type": "actions",
"elements": [
{
"type": "button",
"text": {"type": "plain_text", "text": "✅ Approve"},
"style": "primary",
"action_id": "approve_{request_id}"
},
{
"type": "button",
"text": {"type": "plain_text", "text": "❌ Reject"},
"style": "danger",
"action_id": "reject_{request_id}"
}
]
}
]
}
{
"text": "✅ *Flow - GitOps Status Update*",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Flow completed: {action_summary}*"
}
}
]
}
curl -X POST 'https://events.pagerduty.com/v2/enqueue' \
-H 'Content-Type: application/json' \
-d '{
"routing_key": "$PAGERDUTY_ROUTING_KEY",
"event_action": "trigger",
"payload": {
"summary": "[Flow] {issue_summary}",
"severity": "{critical|error|warning|info}",
"source": "flow-gitops",
"custom_details": {
"agent": "Flow",
"application": "{app_name}",
"issue": "{issue_details}"
}
},
"client": "cluster-agent-swarm"
}'
| Priority | Slack/Teams Wait | PagerDuty Escalation After | |----------|------------------|---------------------------| | CRITICAL | 5 minutes | 10 minutes total | | HIGH | 15 minutes | 30 minutes total | | MEDIUM | 30 minutes | No escalation |
testing
Security Agent (Shield) — handles Pod Security Standards, RBAC audits, NetworkPolicy enforcement, secrets management (Vault), image scanning (Trivy), policy enforcement (Kyverno/OPA), CIS benchmarks, and compliance for Kubernetes and OpenShift clusters.
testing
Platform Agent Swarm Orchestrator — coordinates work across all specialized agents, manages task routing, runs daily standups, and ensures accountability across Kubernetes and OpenShift platform operations.
testing
Observability Agent (Pulse) — handles Prometheus/PromQL metrics, Thanos queries, Loki/ELK log analysis, Grafana dashboards, alert triage and tuning, SLO/SLI management, incident response, and post-incident reviews for Kubernetes and OpenShift.
development
Developer Experience Agent (Desk) — handles namespace provisioning, resource quotas, RBAC for teams, common issue debugging (CrashLoopBackOff, OOMKilled, ImagePullBackOff), manifest generation, application scaffolding, developer onboarding, and platform documentation for Kubernetes and OpenShift clusters.