Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

scitix/deployment-rollout-debug

Name: deployment-rollout-debug
Author: scitix

skills/core/deployment-rollout-debug/SKILL.md

npx skillsauth add scitix/siclaw deployment-rollout-debug

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Deployment Rollout Failure Diagnosis

When a Deployment rollout is stuck, not progressing, or shows replica mismatches, follow this flow to identify the root cause.

Scope: This skill is for diagnosis only. Once you identify the root cause, report it to the user and stop. Do NOT attempt to roll back, scale, or modify the Deployment — that should be left to the user.

Diagnostic Flow

1. Check rollout status

kubectl rollout status deployment/<name> -n <ns> --timeout=5s

This shows whether the rollout is progressing, complete, or stuck. A timeout indicates the rollout is not making progress.

2. Get Deployment details

kubectl get deployment <name> -n <ns> -o wide

Compare the columns:

DESIRED — target replica count
CURRENT — total pods (old + new)
UP-TO-DATE — pods running the new version
AVAILABLE — pods ready to serve traffic

If UP-TO-DATE < DESIRED or AVAILABLE < DESIRED, the rollout is incomplete.

3. Describe the Deployment

kubectl describe deployment <name> -n <ns>

Focus on:

Conditions — look for Progressing (status and reason) and Available
Events — look for scaling events, errors, or warnings
Strategy — note maxSurge and maxUnavailable settings

4. Check ReplicaSets

kubectl get rs -n <ns> -l app=<name>

If the label app=<name> doesn't match, find ReplicaSets owned by the Deployment:

kubectl get rs -n <ns> --sort-by='.metadata.creationTimestamp' | grep <name>

You should see the old RS (with reduced replicas) and the new RS (scaling up). If the new RS has 0 ready replicas, its pods are failing.

5. Check new ReplicaSet's pods

Find pods from the new RS:

kubectl get pods -n <ns> -l app=<name> --sort-by='.metadata.creationTimestamp'

Check the status of the newest pods. Based on their status:

Pending → Use the pod-pending-debug skill
CrashLoopBackOff / Error → Use the pod-crash-debug skill
ImagePullBackOff / ErrImagePull → Use the image-pull-debug skill
Running but not Ready → Check readiness probe below

6. Match deployment-level patterns

`ProgressDeadlineExceeded` — Rollout timed out

The Deployment's spec.progressDeadlineSeconds (default 600s) has been exceeded without progress.

This is a symptom, not a cause. The root cause is in the new pods — check step 5 for why new pods are not becoming ready.

`MinimumReplicasUnavailable` — Not enough replicas available

The Deployment cannot maintain the minimum number of available replicas during the rollout.

Check the new pods' status (step 5). The new version's pods are failing to start or pass readiness probes.

New pods `Running` but not `Ready` — Readiness probe failing

The new pods started successfully but are failing their readiness probe. The rollout will not progress because the new pods are not considered available.

Check the readiness probe configuration and failures:

kubectl describe pod <new-pod> -n <ns>

Look for Readiness probe failed events. Common causes:

Application not listening on the expected port
Health endpoint returning errors
Probe timing too aggressive (initialDelaySeconds too low)

Rollout stuck with `maxSurge: 0` and `maxUnavailable: 0` — Invalid strategy

If both maxSurge and maxUnavailable are 0, the rollout cannot proceed because it cannot create extra pods and cannot remove existing pods. At least one must be greater than 0.

New RS created but not scaling up — Admission webhook or quota

The new ReplicaSet exists but has 0 replicas or cannot create pods:

kubectl describe rs <new-rs> -n <ns>

Check events for:

Admission webhook denied — a webhook is rejecting the new pod spec
Exceeded quota / LimitRange violation — Use quota-debug for native ResourceQuota/LimitRange diagnosis, or volcano-queue-diagnose for Volcano gang scheduling clusters
FailedCreate — other creation failures

Old RS not scaling down — Waiting for new pods

The old ReplicaSet keeps its replicas because the new RS pods are not ready yet. This is expected behavior — Kubernetes will not remove old pods until new pods are available. Fix the new pods first.

Notes

kubectl rollout status waits for the rollout to complete by default. Use --timeout to avoid blocking.
If the user wants to undo a failed rollout: kubectl rollout undo deployment/<name> -n <ns> (but let the user decide, do not execute this).
For Deployments managed by Helm or ArgoCD, the rollout may be triggered by those tools — check if the issue is in the Helm chart values or ArgoCD sync.
StatefulSet rollouts follow a different pattern (ordered, one-at-a-time by default). This skill is specific to Deployments.

scitix/deployment-rollout-debug

skills/core/deployment-rollout-debug/SKILL.md

Diagnose Deployment rollout failures (stuck rollouts, ProgressDeadlineExceeded, replica mismatch). Checks rollout status, ReplicaSets, and new pod health to identify why an update is failing.

88 stars

development

Updated Apr 12, 2026

$ install --global

skillsauth

npx skillsauth add scitix/siclaw deployment-rollout-debug

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 12, 2026, 11:43 PM32.0s1 file scanned

SKILL.md

name:: deployment-rollout-debug
description:: >-

Deployment Rollout Failure Diagnosis

When a Deployment rollout is stuck, not progressing, or shows replica mismatches, follow this flow to identify the root cause.

Diagnostic Flow

1. Check rollout status

kubectl rollout status deployment/<name> -n <ns> --timeout=5s

This shows whether the rollout is progressing, complete, or stuck. A timeout indicates the rollout is not making progress.

2. Get Deployment details

kubectl get deployment <name> -n <ns> -o wide

Compare the columns:

DESIRED — target replica count
CURRENT — total pods (old + new)
UP-TO-DATE — pods running the new version
AVAILABLE — pods ready to serve traffic

If UP-TO-DATE < DESIRED or AVAILABLE < DESIRED, the rollout is incomplete.

3. Describe the Deployment

kubectl describe deployment <name> -n <ns>

Focus on:

Conditions — look for Progressing (status and reason) and Available
Events — look for scaling events, errors, or warnings
Strategy — note maxSurge and maxUnavailable settings

4. Check ReplicaSets

kubectl get rs -n <ns> -l app=<name>

If the label app=<name> doesn't match, find ReplicaSets owned by the Deployment:

kubectl get rs -n <ns> --sort-by='.metadata.creationTimestamp' | grep <name>

You should see the old RS (with reduced replicas) and the new RS (scaling up). If the new RS has 0 ready replicas, its pods are failing.

5. Check new ReplicaSet's pods

Find pods from the new RS:

kubectl get pods -n <ns> -l app=<name> --sort-by='.metadata.creationTimestamp'

Check the status of the newest pods. Based on their status:

Pending → Use the pod-pending-debug skill
CrashLoopBackOff / Error → Use the pod-crash-debug skill
ImagePullBackOff / ErrImagePull → Use the image-pull-debug skill
Running but not Ready → Check readiness probe below

6. Match deployment-level patterns

`ProgressDeadlineExceeded` — Rollout timed out

The Deployment's spec.progressDeadlineSeconds (default 600s) has been exceeded without progress.

This is a symptom, not a cause. The root cause is in the new pods — check step 5 for why new pods are not becoming ready.

`MinimumReplicasUnavailable` — Not enough replicas available

The Deployment cannot maintain the minimum number of available replicas during the rollout.

Check the new pods' status (step 5). The new version's pods are failing to start or pass readiness probes.

New pods `Running` but not `Ready` — Readiness probe failing

The new pods started successfully but are failing their readiness probe. The rollout will not progress because the new pods are not considered available.

Check the readiness probe configuration and failures:

kubectl describe pod <new-pod> -n <ns>

Look for Readiness probe failed events. Common causes:

Application not listening on the expected port
Health endpoint returning errors
Probe timing too aggressive (initialDelaySeconds too low)

Rollout stuck with `maxSurge: 0` and `maxUnavailable: 0` — Invalid strategy

If both maxSurge and maxUnavailable are 0, the rollout cannot proceed because it cannot create extra pods and cannot remove existing pods. At least one must be greater than 0.

New RS created but not scaling up — Admission webhook or quota

The new ReplicaSet exists but has 0 replicas or cannot create pods:

kubectl describe rs <new-rs> -n <ns>

Check events for:

Admission webhook denied — a webhook is rejecting the new pod spec
Exceeded quota / LimitRange violation — Use quota-debug for native ResourceQuota/LimitRange diagnosis, or volcano-queue-diagnose for Volcano gang scheduling clusters
FailedCreate — other creation failures

Old RS not scaling down — Waiting for new pods

Notes

kubectl rollout status waits for the rollout to complete by default. Use --timeout to avoid blocking.
If the user wants to undo a failed rollout: kubectl rollout undo deployment/<name> -n <ns> (but let the user decide, do not execute this).
For Deployments managed by Helm or ArgoCD, the rollout may be triggered by those tools — check if the issue is in the Helm chart values or ArgoCD sync.
StatefulSet rollouts follow a different pattern (ordered, one-at-a-time by default). This skill is specific to Deployments.

Related Skills

scitix/gateway-diagnostics

testing

VerifiedTrustedCommunity

Show and ping the gateway of a network interface, on a Kubernetes node or inside a pod's network namespace. Auto-detects the gateway from the routing table (ip -j route), reports interface type (RoCE / Ethernet / IB), and tests reachability with ping. Use for default-route / gateway questions, network reachability checks, RoCE/RDMA data-path validation, and "can this node/pod reach its gateway" investigations.

209SKILL.mdUpdated Jun 7, 2026

scitix/gateway-diagnostics

scitix/skill-authoring

development

VerifiedTrustedCommunity

Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.

209SKILL.mdUpdated Apr 23, 2026

scitix/skill-authoring

scitix/node-logs

devops

VerifiedTrustedCommunity

Retrieve logs from a Kubernetes node. Supports journalctl (systemd units) and file-based logs. Use when you need to inspect node-level logs (containerd, kubelet, etc.). Run via host_script (preferred) or node_script.

209SKILL.mdUpdated Apr 12, 2026

scitix/manage-skill

development

VerifiedTrustedCommunity

Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.

88SKILL.mdUpdated Apr 12, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/scitix/siclaw.git

# Copy into Claude Code skills folder (global)
cp -r siclaw/skills/core/deployment-rollout-debug ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

scitix/siclaw

88 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

scitix/deployment-rollout-debug

$ install --global

Security Scan Results

SKILL.md

Deployment Rollout Failure Diagnosis

Diagnostic Flow

1. Check rollout status

2. Get Deployment details

3. Describe the Deployment

4. Check ReplicaSets

5. Check new ReplicaSet's pods

6. Match deployment-level patterns

ProgressDeadlineExceeded — Rollout timed out

MinimumReplicasUnavailable — Not enough replicas available

New pods Running but not Ready — Readiness probe failing

Rollout stuck with maxSurge: 0 and maxUnavailable: 0 — Invalid strategy

New RS created but not scaling up — Admission webhook or quota

Old RS not scaling down — Waiting for new pods

Notes

Related Skills

scitix/gateway-diagnostics

scitix/skill-authoring

scitix/node-logs

scitix/manage-skill

scitix/deployment-rollout-debug

$ install --global

Security Scan Results

SKILL.md

Deployment Rollout Failure Diagnosis

Diagnostic Flow

1. Check rollout status

2. Get Deployment details

3. Describe the Deployment

4. Check ReplicaSets

5. Check new ReplicaSet's pods

6. Match deployment-level patterns

ProgressDeadlineExceeded — Rollout timed out

MinimumReplicasUnavailable — Not enough replicas available

New pods Running but not Ready — Readiness probe failing

Rollout stuck with maxSurge: 0 and maxUnavailable: 0 — Invalid strategy

New RS created but not scaling up — Admission webhook or quota

Old RS not scaling down — Waiting for new pods

Notes

Related Skills

scitix/gateway-diagnostics

scitix/skill-authoring

scitix/node-logs

scitix/manage-skill

`ProgressDeadlineExceeded` — Rollout timed out

`MinimumReplicasUnavailable` — Not enough replicas available

New pods `Running` but not `Ready` — Readiness probe failing

Rollout stuck with `maxSurge: 0` and `maxUnavailable: 0` — Invalid strategy

`ProgressDeadlineExceeded` — Rollout timed out

`MinimumReplicasUnavailable` — Not enough replicas available

New pods `Running` but not `Ready` — Readiness probe failing

Rollout stuck with `maxSurge: 0` and `maxUnavailable: 0` — Invalid strategy