skills/argocd-helm-validation/SKILL.md
This skill should be used when validating an ArgoCD Helm chart deployment after committing changes to a git repository, checking if an ArgoCD app has synced and is healthy, debugging ArgoCD sync failures or stuck syncs, investigating Kubernetes Events or pod logs for application errors, monitoring a GitOps workflow deployment to completion, or fixing Helm chart values or templates to resolve deployment issues. Also covers known stuck-sync patterns: "waiting for completion of hook batch/Job" when the Job no longer exists (PreSync hook TTL race); ArgoCD controller OOMKilled during sync of large charts with ServerSideApply=true causing workloads to never be created despite ConfigMaps/SAs existing. Trigger phrases: "check if my deployment deployed", "validate the ArgoCD deployment", "debug ArgoCD sync", "check if helm chart is healthy", "monitor deployment progress", "argocd not syncing", "deployment stuck", "check argo", "helm chart errors", "waiting for completion of hook", "job not found argocd stuck", "argocd controller oomkilled", "deployments not created", "statefulsets not created after sync".
npx skillsauth add aldengolab/lorist ArgoCD Helm Deployment ValidationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Follow this procedure to validate a Helm-based ArgoCD GitOps deployment end-to-end: run pre-flight auth checks, monitor sync status, evaluate health per resource type, debug issues, apply approved fixes, and retry until the deployment is confirmed healthy.
Before doing anything else, verify CLI state and cluster context.
Check login state:
argocd account get-user-info
If authenticated, continue. If unauthenticated, follow this credential flow in order:
Tier 1 — SSO (preferred):
argocd login <server> --sso
This opens a browser for SSO authentication and stores a token in ~/.argocd/config. No credentials pass through this session.
Tier 2 — Stored token: If SSO is unavailable, check whether a valid token already exists:
argocd account get-user-info
If a stored token works, continue. Do not prompt for credentials.
Tier 3 — User runs login in terminal: If no valid session exists and SSO is unavailable, do NOT ask for a password. Instead, instruct the user:
"Please run
argocd login <server>in your terminal to authenticate, then return here. I'll wait."
After the user confirms they've logged in, re-run argocd account get-user-info to verify before continuing.
Never embed a password in a command. Credentials passed through the chat session may be stored in session history.
Self-signed certificates: If your ArgoCD server uses a self-signed certificate, add the CA to your system trust store rather than using
--insecure. Using--insecuredisables all TLS verification and exposes the session to MITM attacks on any network path.
Check current context:
kubectl config current-context
kubectl config get-contexts
If the context looks incorrect for this deployment (wrong cluster name, wrong environment), suggest the likely correct context and ask for confirmation before switching:
kubectx <context-name>
Check the current branch:
git branch --show-current
main, proceed with commits on the current branch.main, ask the user: "You're on main — should I commit fixes directly to main, or create a new branch?"If an app name is provided, target it directly:
argocd app get <app-name>
If no app name is provided, list all apps in the ArgoCD namespace and present them:
argocd app list -o json
Filter to apps on the current cluster context. If multiple apps exist, ask the user which to monitor, or monitor all if the user says so.
After identifying the target app(s), check sync and health status using the
argocd_read(group="app", command="get", app_name="<app-name>") MCP tool, or via Bash
if the tool is unavailable:
argocd app get <app-name> --output json
Evaluate the response:
| Sync Status | Health Status | Action |
|-------------|---------------|--------|
| Synced | Healthy | [OK] Validate k8s Events and logs (see below) |
| Synced | Progressing | [WAIT] Apply timing logic before escalating |
| Synced | Degraded | [DEBUG] Invoke argocd-debugger agent immediately |
| OutOfSync | — | [SYNC] Run argocd app sync <app-name> then re-check. If argocd_read(group="app", command="diff", app_name="<app-name>") indicates no diff (e.g. (no diff — desired and live state match)), invoke argocd-outofsync-empty-diff-operator-crd skill instead |
| Unknown | — | [INVESTIGATE] Check resources, look for sync errors |
If sync was recently triggered and status is Progressing, apply the Timing Logic below before escalating to debug.
Do not immediately flag a progressing sync as failed. Use this logic:
argocd-debugger agent.When ArgoCD reports Synced + Healthy, validate that the application is actually running correctly by checking k8s Events and logs in the app's namespace.
Use the k8s_namespace_health(<namespace>) MCP tool to get pod statuses and warning events
in a single call, or via Bash:
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -40
kubectl get pods -n <namespace>
Look for: Warning events, BackOff, Failed, OOMKilled, Evicted, FailedMount, Unhealthy.
For pod logs, use get_pod_logs(<namespace>, <pod>, errors_only=True) or:
kubectl logs -n <namespace> <pod-name> --tail=100
# For previous crash:
kubectl logs -n <namespace> <pod-name> --previous --tail=100
If Events or logs contain errors, invoke argocd-debugger agent to investigate and propose fixes.
A deployment is confirmed healthy when:
When issues are detected, invoke the argocd-debugger agent with context:
The debugger will investigate, propose file changes, get approval, apply fixes, commit, push, and trigger a re-sync.
After the debugger commits and pushes:
argocd app sync <app-name>)For known stuck-sync diagnoses and fixes (PreSync hook TTL race, ArgoCD controller OOMKilled
during large-chart SSA sync), read:
Read skills/argocd-helm-validation/references/stuck-sync-patterns.md
references/sync-debugging.md — Detailed ArgoCD sync error investigation procedures, common error patterns, and resolution stepsreferences/health-checks.md — Per-resource-type health check procedures (Deployments, StatefulSets, Jobs, Pods, static resources)references/fix-and-retry.md — How to propose, edit, commit, push, and re-sync fixes; git workflow; values.yaml and Chart fix patternsIf the debugger identifies that kubectl changes are reverting immediately, a git push isn't updating a deployed resource, or two tools appear to be contesting the same resource, invoke the argocd-helm-eso-setups skill. It covers the ownership model, selfHeal behavior, prune scope, fixed-name collision points, and how to transfer a resource between Helm and ArgoCD management.
development
Build a UEFI Secure Boot PXE netboot server for Ubuntu autoinstall. Use when: designing or implementing network boot infrastructure for automated Ubuntu provisioning with Secure Boot enabled. Covers the complete chain: signed shim+GRUB selection, TFTP layout, kernel parameters, autoinstall config requirements, and post-install bootstrapping scripts. Also applicable when debugging an existing PXE setup that uses the wrong GRUB binary or config paths.
development
Design pattern for running a persistent PXE/TFTP server that safely coexists with already-installed nodes. Use when: building PXE infrastructure that should stay always-on, designing automated bare-metal provisioning in GitOps/Kubernetes environments, or any PXE setup where UEFI boot order has network boot first. Eliminates boot loops without requiring UEFI firmware changes.
development
This skill governs all prose output — Claude's own responses, documentation, PR descriptions, commit messages, README content, comments, and any text the user asks to draft or edit. It should also be used when the user asks to "review my writing", "edit this for clarity", "make this clearer", "simplify this text", "rewrite this", "check my prose", "tighten this up", or "make this more concise". Based on George Orwell's "Politics and the English Language" (1946).
development
Debug Kubernetes pods using hostNetwork: true that crash with "Address already in use" or "failed to create listening socket for port N". Use when: (1) a hostNetwork pod container is in CrashLoopBackOff and logs show a port bind failure, (2) the port works fine in non-hostNetwork pods but fails with hostNetwork, (3) you need to identify which host-level process holds a port from within Kubernetes (no SSH). Covers /proc/net/udp inspection and kubectl debug node with nsenter.