skills/gitops-principles/SKILL.md
Comprehensive GitOps methodology and principles skill for cloud-native operations. Use when (1) Designing GitOps architecture for Kubernetes deployments, (2) Implementing declarative infrastructure with Git as single source of truth, (3) Setting up continuous deployment pipelines with ArgoCD/Flux/Kargo, (4) Establishing branching strategies and repository structures, (5) Troubleshooting drift, sync failures, or reconciliation issues, (6) Evaluating GitOps tooling decisions, (7) Teaching or explaining GitOps concepts and best practices, (8) Deploying ArgoCD on Azure Arc-enabled Kubernetes or AKS with workload identity. Covers the 4 pillars of GitOps (OpenGitOps), patterns, anti-patterns, tooling ecosystem, Azure Arc integration, and operational guidance.
npx skillsauth add julianobarbosa/claude-code-skills gitops-principlesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
This skill has been flagged as suspicious. Review the scan results before using.
2 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Complete guide for implementing GitOps methodology in Kubernetes environments - the operational framework where Git is the single source of truth for declarative infrastructure and applications.
GitOps is a set of practices that uses Git repositories as the source of truth for defining the desired state of infrastructure and applications. An automated process ensures the production environment matches the state described in the repository.
GitOps is defined by four core principles established by the OpenGitOps project (part of CNCF):
| Principle | Description | |-----------|-------------| | 1. Declarative | The entire system must be described declaratively | | 2. Versioned and Immutable | Desired state is stored in a way that enforces immutability, versioning, and retention | | 3. Pulled Automatically | Software agents automatically pull desired state from the source | | 4. Continuously Reconciled | Agents continuously observe and attempt to apply desired state |
┌─────────────────────────────────────────────────────────────────┐
│ GIT REPOSITORY │
│ (Single Source of Truth for Desired State) │
├─────────────────────────────────────────────────────────────────┤
│ manifests/ │
│ ├── base/ # Base configurations │
│ │ ├── deployment.yaml │
│ │ ├── service.yaml │
│ │ └── kustomization.yaml │
│ └── overlays/ # Environment-specific │
│ ├── dev/ │
│ ├── staging/ │
│ └── production/ │
└─────────────────────────────────────────────────────────────────┘
│
▼ Pull (not Push)
┌─────────────────────────────────────────────────────────────────┐
│ GITOPS CONTROLLER │
│ (ArgoCD / Flux / Kargo) │
│ - Continuously watches Git repository │
│ - Compares desired state vs actual state │
│ - Reconciles differences automatically │
└─────────────────────────────────────────────────────────────────┘
│
▼ Apply
┌─────────────────────────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
│ (Actual State / Runtime Environment) │
└─────────────────────────────────────────────────────────────────┘
| Push Model (Traditional CI/CD) | Pull Model (GitOps) |
|--------------------------------|---------------------|
| CI system pushes changes to cluster | Agent pulls changes from Git |
| Requires cluster credentials in CI | Credentials stay within cluster |
| Point-in-time deployment | Continuous reconciliation |
| Drift goes undetected | Drift automatically corrected |
| Manual rollback process | Rollback = git revert |
git revertMonorepo (Single repository for all environments):
gitops-repo/
├── apps/
│ ├── app-a/
│ │ ├── base/
│ │ └── overlays/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ └── app-b/
└── infrastructure/
├── monitoring/
└── networking/
Polyrepo (Separate repositories):
# Repository per concern
app-a-config/ # App A manifests
app-b-config/ # App B manifests
infrastructure/ # Shared infrastructure
cluster-bootstrap/ # Cluster setup
Separates infrastructure from values for security boundaries:
infra-team/ # Base configurations, ApplicationSets
├── applications/ # ArgoCD Application definitions
└── helm-base-values/ # Default Helm values
argo-cd-helm-values/ # Environment-specific overrides
├── dev/ # Development values
├── stg/ # Staging values
└── prd/ # Production values
Benefits:
main ────────────────────────────────────► Production
│
└──► staging ──────────────────────────► Staging cluster
│
└──► develop ───────────────────► Development cluster
main ────────────────────────────────────► All environments
│
├── overlays/dev/ → Dev cluster
├── overlays/staging/ → Staging cluster
└── overlays/prod/ → Prod cluster
main
│
├── release/v1.0 ──────► Production (v1.0)
├── release/v1.1 ──────► Production (v1.1)
└── release/v2.0 ──────► Production (v2.0)
syncPolicy:
automated:
prune: true # Delete resources not in Git
selfHeal: true # Revert manual changes
syncPolicy:
automated: null # Require explicit sync
| Option | Use Case |
|--------|----------|
| CreateNamespace=true | Auto-create missing namespaces |
| PruneLast=true | Delete after successful sync |
| ServerSideApply=true | Handle large CRDs |
| ApplyOutOfSyncOnly=true | Performance optimization |
| Replace=true | Force resource replacement |
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
# overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patchesStrategicMerge:
- replica-patch.yaml
images:
- name: myapp
newTag: v1.2.3
# Application pointing to Helm chart
spec:
source:
repoURL: https://charts.example.com
chart: my-app
targetRevision: 1.2.3
helm:
releaseName: my-app
valueFiles:
- values.yaml
- values-prod.yaml
spec:
sources:
- repoURL: https://charts.bitnami.com/bitnami
chart: nginx
targetRevision: 15.0.0
helm:
valueFiles:
- $values/nginx/values-prod.yaml
- repoURL: https://github.com/org/values.git
targetRevision: main
ref: values
GitOps enables progressive delivery patterns:
# Two applications, traffic shift via Ingress/Service
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: app-blue
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: app-green
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 5m}
- setWeight: 50
- pause: {duration: 10m}
Warehouse → Dev Stage → Staging Stage → Production Stage
│ │ │ │
└── Freight promotion through environments ───┘
Azure provides a managed ArgoCD experience through the Microsoft.ArgoCD cluster extension:
# Simple installation (single node)
az k8s-extension create \
--resource-group <rg> --cluster-name <cluster> \
--cluster-type managedClusters \
--name argocd \
--extension-type Microsoft.ArgoCD \
--release-train preview \
--config deployWithHighAvailability=false
# Production with workload identity (recommended)
# Use Bicep template - see references/azure-arc-integration.md
Key Benefits:
| Feature | Description | |---------|-------------| | Managed Installation | Azure handles deployment and upgrades | | Workload Identity | Azure AD authentication without secrets | | Multi-Cluster | Consistent GitOps across hybrid environments | | Azure Integration | Native ACR, Key Vault, Azure AD support |
Prerequisites:
Microsoft.KubernetesConfiguration provider registeredk8s-extension CLI extension installedSee references/azure-arc-integration.md for complete setup guide.
Never store secrets in Git! Use:
| Approach | Tool | |----------|------| | External Secrets | External Secrets Operator | | Sealed Secrets | Bitnami Sealed Secrets | | SOPS | Mozilla SOPS encryption | | Vault | HashiCorp Vault + CSI | | Cloud KMS | AWS/Azure/GCP Key Management |
# Limit ArgoCD to specific namespaces
apiVersion: argoproj.io/v1alpha1
kind: AppProject
spec:
destinations:
- namespace: 'team-a-*'
server: https://kubernetes.default.svc
sourceRepos:
- 'https://github.com/org/team-a-*'
| Status | Meaning | Action | |--------|---------|--------| | Healthy | All resources running | None | | Progressing | Deployment in progress | Wait | | Degraded | Health check failed | Investigate | | Suspended | Manually paused | Resume when ready | | Missing | Resource not found | Check manifests |
# Check application diff
argocd app diff myapp
# Force refresh from Git
argocd app get myapp --refresh
For detailed information, see:
references/core-principles.md - Deep dive into the 4 pillarsreferences/patterns-and-practices.md - Branching and repo patternsreferences/tooling-ecosystem.md - ArgoCD vs Flux vs Kargoreferences/anti-patterns.md - Common mistakes to avoidreferences/troubleshooting.md - Debugging guidereferences/azure-arc-integration.md - Azure Arc & AKS GitOps setupReady-to-use templates in templates/:
application.yaml - ArgoCD Application exampleapplicationset.yaml - Multi-cluster deploymentkustomization.yaml - Kustomize overlay structureUtility scripts in scripts/:
gitops-health-check.sh - Validate GitOps setupselfHeal: true fights kubectl edit: Engineers patching live resources during incident response will see their changes reverted within seconds. Either pause auto-sync first or commit the fix to Git.main branch can pin controller CPU. Split repos or use webhooks instead of polling.PruneLast=true and Helm hooks collide: Hooks run during sync; prune happens after. Resources from post-install hooks get deleted on next sync because they're not in the rendered manifest. Annotate hooks with Prune=false.ignoreDifferences.git revert rollback assumes immutable image tags: Reverting a manifest reverts the tag string, but a mutable tag (e.g. :latest) now points at a different image. Pin to digests for true rollback safety.development
End-to-end branch delivery: commit (no AI attribution) → push → open a pull request → ensure a Board work item exists (create one per task, assigned to the configured user, if none) and link it → after merge, clean up branch and worktree. Auto-detects the platform from the remote — Azure Repos + Boards (azure-devops-node-api SDK; OAuth Bearer push fallback via `az`) or GitHub (Octokit; `gh` for auth). Scripts are TypeScript, run via `bun`. Use whenever asked to "ship", "ship it", "ship this branch", "open a PR", "push and open a PR", "raise a PR", "deliver this", "send this for review", or "create a PR and link the work item" — and when a direct push to main is blocked and the change needs to go through a PR instead.
testing
Brief description of what this skill does. Include specific triggers - when should Claude use this skill? Example triggers, file types, or keywords that indicate this skill applies.
tools
Manage and troubleshoot PATH configuration in zsh. Use when adding tools to PATH (bun, nvm, Python venv, cargo, go), diagnosing "command not found" errors, validating PATH entries, or organizing shell configuration in .zshrc and .zshrc.local files.
tools
Zabbix monitoring system automation via API and Python. Use when: (1) Managing hosts, templates, items, triggers, or host groups, (2) Automating monitoring configuration, (3) Sending data via Zabbix trapper/sender, (4) Querying historical data or events, (5) Bulk operations on Zabbix objects, (6) Maintenance window management, (7) User/permission management