Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

scitix/job-debug

Name: job-debug
Author: scitix

skills/core/job-debug/SKILL.md

npx skillsauth add scitix/siclaw job-debug

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Job / CronJob Failure Diagnosis

When a Job has failed, is stuck, or a CronJob is not triggering as expected, follow this flow to identify the root cause.

Scope: This skill is for diagnosis only. Once you identify the root cause, report it to the user and stop. Do NOT attempt to delete, restart, or modify Jobs or CronJobs — that should be left to the user.

Diagnostic Flow

1. Check Job status

kubectl get jobs -n <ns>

Note the COMPLETIONS (succeeded/total) and DURATION columns. For a specific Job:

kubectl describe job <job-name> -n <ns>

Focus on:

Conditions — look for Complete (success) or Failed (failure) with the reason
Events — look for BackoffLimitExceeded, DeadlineExceeded, or pod creation errors
Pods Statuses — counts of Succeeded, Failed, and Active pods

2. Check the Job's pods

kubectl get pods -n <ns> -l job-name=<job-name> --sort-by='.metadata.creationTimestamp'

Note pod statuses. For failed pods, check logs:

kubectl logs <pod-name> -n <ns>

If the pod was terminated, check the exit code:

kubectl get pod <pod-name> -n <ns> -o jsonpath='{.status.containerStatuses[*].state.terminated}'

3. For CronJobs — check schedule and history

kubectl get cronjobs -n <ns>

Note the SCHEDULE, SUSPEND, ACTIVE, and LAST SCHEDULE columns.

For details:

kubectl describe cronjob <cronjob-name> -n <ns>

Focus on:

Schedule — the cron expression
Suspend — if True, the CronJob will not create new Jobs
Last Schedule Time — when the last Job was triggered
Active Jobs — currently running Jobs
Events — look for creation events or errors

List Jobs created by the CronJob:

kubectl get jobs -n <ns> -l job-name --sort-by='.metadata.creationTimestamp' | grep <cronjob-name>

4. Match patterns and conclude

`BackoffLimitExceeded` — Too many pod failures

The Job's pods have failed more times than the backoffLimit (default: 6). The Job is marked as Failed.

Check why the pods are failing — look at the logs of the most recent failed pod (step 2). Common causes:

Application error — the program exits with a non-zero code
OOMKilled — the container exceeded its memory limit (use pod-crash-debug for deeper analysis)
Configuration error — missing environment variables, wrong arguments, or missing config files

Advise the user to fix the underlying pod failure, then create a new Job.

`DeadlineExceeded` — Job took too long

The Job did not complete within its activeDeadlineSeconds limit. All running pods are terminated.

kubectl get job <job-name> -n <ns> -o jsonpath='{.spec.activeDeadlineSeconds}'

Check if the Job's pods are slow or stuck:

The workload may genuinely take longer than the deadline allows
Pods may be stuck waiting for resources, network, or external dependencies

Advise the user to either increase activeDeadlineSeconds or investigate why the workload is slow.

Pods in `Pending` — Scheduling issues

Job pods are created but cannot be scheduled.

Use the pod-pending-debug skill to diagnose the scheduling failure. Common causes for batch jobs:

Resource constraints — the Job requests more resources than available
Node affinity — the Job's pods can only run on specific nodes that are busy

Pods in `ImagePullBackOff` — Image issue

The Job pod cannot pull its container image. Use the image-pull-debug skill.

Job succeeded but results are wrong — Application-level issue

The Job completed (status Complete) but produced incorrect results. This is not a Kubernetes issue — check the application logs:

kubectl logs <succeeded-pod> -n <ns>

If the pod has been garbage collected, check if ttlSecondsAfterFinished caused it to be deleted:

kubectl get job <job-name> -n <ns> -o jsonpath='{.spec.ttlSecondsAfterFinished}'

CronJob not triggering — Schedule or suspend issue

The CronJob exists but is not creating Jobs on schedule.

Suspended — spec.suspend is true. No new Jobs will be created until it is unsuspended.
Invalid schedule — the cron expression may be malformed. Kubernetes uses standard 5-field cron syntax (minute, hour, day-of-month, month, day-of-week).
Concurrency policy — if concurrencyPolicy: Forbid and a previous Job is still running, the new Job will be skipped.

kubectl get cronjob <cronjob-name> -n <ns> -o jsonpath='suspend={.spec.suspend} concurrencyPolicy={.spec.concurrencyPolicy} schedule={.spec.schedule}'

Check if there are active Jobs blocking the next run:

kubectl get jobs -n <ns> | grep <cronjob-name>

CronJob creates Jobs but they keep failing — Recurring failure

If every Job created by the CronJob fails, the root cause is in the Job template. Diagnose the most recent Job using steps 1-2 above.

Check the startingDeadlineSeconds — if set, Jobs that miss their window (e.g., because the previous Job ran too long) are skipped:

kubectl get cronjob <cronjob-name> -n <ns> -o jsonpath='{.spec.startingDeadlineSeconds}'

Also check successfulJobsHistoryLimit and failedJobsHistoryLimit — if these are set to 0, completed/failed Jobs and their pods are deleted immediately, making diagnosis harder.

Notes

Jobs created by CronJobs are named <cronjob-name>-<timestamp>. Use this pattern to find related Jobs.
By default, failed Job pods are kept (not deleted) so you can inspect their logs. The backoffLimit controls how many retries happen before the Job is marked as Failed.
completions and parallelism control how many pods need to succeed and how many run concurrently. A Job with completions: 5, parallelism: 2 runs 2 pods at a time until 5 have succeeded.
For indexed Jobs (completionMode: Indexed), each pod gets a unique index. A failure in one index does not affect others — check which specific index is failing.

scitix/job-debug

skills/core/job-debug/SKILL.md

Diagnose Job and CronJob failures (BackoffLimitExceeded, DeadlineExceeded, pods failing, CronJob not triggering). Checks Job status, pod logs, and CronJob schedule to identify why batch workloads are failing.

88 stars

development

Updated Apr 12, 2026

$ install --global

skillsauth

npx skillsauth add scitix/siclaw job-debug

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 12, 2026, 11:15 PM209.1s1 file scanned

SKILL.md

name:: job-debug
description:: >-

Job / CronJob Failure Diagnosis

When a Job has failed, is stuck, or a CronJob is not triggering as expected, follow this flow to identify the root cause.

Diagnostic Flow

1. Check Job status

kubectl get jobs -n <ns>

Note the COMPLETIONS (succeeded/total) and DURATION columns. For a specific Job:

kubectl describe job <job-name> -n <ns>

Focus on:

Conditions — look for Complete (success) or Failed (failure) with the reason
Events — look for BackoffLimitExceeded, DeadlineExceeded, or pod creation errors
Pods Statuses — counts of Succeeded, Failed, and Active pods

2. Check the Job's pods

kubectl get pods -n <ns> -l job-name=<job-name> --sort-by='.metadata.creationTimestamp'

Note pod statuses. For failed pods, check logs:

kubectl logs <pod-name> -n <ns>

If the pod was terminated, check the exit code:

kubectl get pod <pod-name> -n <ns> -o jsonpath='{.status.containerStatuses[*].state.terminated}'

3. For CronJobs — check schedule and history

kubectl get cronjobs -n <ns>

Note the SCHEDULE, SUSPEND, ACTIVE, and LAST SCHEDULE columns.

For details:

kubectl describe cronjob <cronjob-name> -n <ns>

Focus on:

Schedule — the cron expression
Suspend — if True, the CronJob will not create new Jobs
Last Schedule Time — when the last Job was triggered
Active Jobs — currently running Jobs
Events — look for creation events or errors

List Jobs created by the CronJob:

kubectl get jobs -n <ns> -l job-name --sort-by='.metadata.creationTimestamp' | grep <cronjob-name>

4. Match patterns and conclude

`BackoffLimitExceeded` — Too many pod failures

The Job's pods have failed more times than the backoffLimit (default: 6). The Job is marked as Failed.

Check why the pods are failing — look at the logs of the most recent failed pod (step 2). Common causes:

Application error — the program exits with a non-zero code
OOMKilled — the container exceeded its memory limit (use pod-crash-debug for deeper analysis)
Configuration error — missing environment variables, wrong arguments, or missing config files

Advise the user to fix the underlying pod failure, then create a new Job.

`DeadlineExceeded` — Job took too long

The Job did not complete within its activeDeadlineSeconds limit. All running pods are terminated.

kubectl get job <job-name> -n <ns> -o jsonpath='{.spec.activeDeadlineSeconds}'

Check if the Job's pods are slow or stuck:

The workload may genuinely take longer than the deadline allows
Pods may be stuck waiting for resources, network, or external dependencies

Advise the user to either increase activeDeadlineSeconds or investigate why the workload is slow.

Pods in `Pending` — Scheduling issues

Job pods are created but cannot be scheduled.

Use the pod-pending-debug skill to diagnose the scheduling failure. Common causes for batch jobs:

Resource constraints — the Job requests more resources than available
Node affinity — the Job's pods can only run on specific nodes that are busy

Pods in `ImagePullBackOff` — Image issue

The Job pod cannot pull its container image. Use the image-pull-debug skill.

Job succeeded but results are wrong — Application-level issue

The Job completed (status Complete) but produced incorrect results. This is not a Kubernetes issue — check the application logs:

kubectl logs <succeeded-pod> -n <ns>

If the pod has been garbage collected, check if ttlSecondsAfterFinished caused it to be deleted:

kubectl get job <job-name> -n <ns> -o jsonpath='{.spec.ttlSecondsAfterFinished}'

CronJob not triggering — Schedule or suspend issue

The CronJob exists but is not creating Jobs on schedule.

Suspended — spec.suspend is true. No new Jobs will be created until it is unsuspended.
Invalid schedule — the cron expression may be malformed. Kubernetes uses standard 5-field cron syntax (minute, hour, day-of-month, month, day-of-week).
Concurrency policy — if concurrencyPolicy: Forbid and a previous Job is still running, the new Job will be skipped.

kubectl get cronjob <cronjob-name> -n <ns> -o jsonpath='suspend={.spec.suspend} concurrencyPolicy={.spec.concurrencyPolicy} schedule={.spec.schedule}'

Check if there are active Jobs blocking the next run:

kubectl get jobs -n <ns> | grep <cronjob-name>

CronJob creates Jobs but they keep failing — Recurring failure

If every Job created by the CronJob fails, the root cause is in the Job template. Diagnose the most recent Job using steps 1-2 above.

Check the startingDeadlineSeconds — if set, Jobs that miss their window (e.g., because the previous Job ran too long) are skipped:

kubectl get cronjob <cronjob-name> -n <ns> -o jsonpath='{.spec.startingDeadlineSeconds}'

Also check successfulJobsHistoryLimit and failedJobsHistoryLimit — if these are set to 0, completed/failed Jobs and their pods are deleted immediately, making diagnosis harder.

Notes

Jobs created by CronJobs are named <cronjob-name>-<timestamp>. Use this pattern to find related Jobs.
By default, failed Job pods are kept (not deleted) so you can inspect their logs. The backoffLimit controls how many retries happen before the Job is marked as Failed.
completions and parallelism control how many pods need to succeed and how many run concurrently. A Job with completions: 5, parallelism: 2 runs 2 pods at a time until 5 have succeeded.
For indexed Jobs (completionMode: Indexed), each pod gets a unique index. A failure in one index does not affect others — check which specific index is failing.

Related Skills

scitix/gateway-diagnostics

testing

VerifiedTrustedCommunity

Show and ping the gateway of a network interface, on a Kubernetes node or inside a pod's network namespace. Auto-detects the gateway from the routing table (ip -j route), reports interface type (RoCE / Ethernet / IB), and tests reachability with ping. Use for default-route / gateway questions, network reachability checks, RoCE/RDMA data-path validation, and "can this node/pod reach its gateway" investigations.

209SKILL.mdUpdated Jun 7, 2026

scitix/gateway-diagnostics

scitix/skill-authoring

development

VerifiedTrustedCommunity

Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers skill directory layout, SKILL.md format, script execution modes, and best practices.

209SKILL.mdUpdated Apr 23, 2026

scitix/skill-authoring

scitix/node-logs

devops

VerifiedTrustedCommunity

Retrieve logs from a Kubernetes node. Supports journalctl (systemd units) and file-based logs. Use when you need to inspect node-level logs (containerd, kubelet, etc.). Run via host_script (preferred) or node_script.

209SKILL.mdUpdated Apr 12, 2026

scitix/manage-skill

development

VerifiedTrustedCommunity

Guides the user to the Siclaw Web page to manage Skills. Use this guide when the user requests to create, edit, or view a Skill in a Channel conversation.

88SKILL.mdUpdated Apr 12, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/scitix/siclaw.git

# Copy into Claude Code skills folder (global)
cp -r siclaw/skills/core/job-debug ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

scitix/siclaw

88 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

scitix/job-debug

$ install --global

Security Scan Results

SKILL.md

Job / CronJob Failure Diagnosis

Diagnostic Flow

1. Check Job status

2. Check the Job's pods

3. For CronJobs — check schedule and history

4. Match patterns and conclude

BackoffLimitExceeded — Too many pod failures

DeadlineExceeded — Job took too long

Pods in Pending — Scheduling issues

Pods in ImagePullBackOff — Image issue

Job succeeded but results are wrong — Application-level issue

CronJob not triggering — Schedule or suspend issue

CronJob creates Jobs but they keep failing — Recurring failure

Notes

Related Skills

scitix/gateway-diagnostics

scitix/skill-authoring

scitix/node-logs

scitix/manage-skill

scitix/job-debug

$ install --global

Security Scan Results

SKILL.md

Job / CronJob Failure Diagnosis

Diagnostic Flow

1. Check Job status

2. Check the Job's pods

3. For CronJobs — check schedule and history

4. Match patterns and conclude

BackoffLimitExceeded — Too many pod failures

DeadlineExceeded — Job took too long

Pods in Pending — Scheduling issues

Pods in ImagePullBackOff — Image issue

Job succeeded but results are wrong — Application-level issue

CronJob not triggering — Schedule or suspend issue

CronJob creates Jobs but they keep failing — Recurring failure

Notes

Related Skills

scitix/gateway-diagnostics

scitix/skill-authoring

scitix/node-logs

scitix/manage-skill

`BackoffLimitExceeded` — Too many pod failures

`DeadlineExceeded` — Job took too long

Pods in `Pending` — Scheduling issues

Pods in `ImagePullBackOff` — Image issue

`BackoffLimitExceeded` — Too many pod failures

`DeadlineExceeded` — Job took too long

Pods in `Pending` — Scheduling issues

Pods in `ImagePullBackOff` — Image issue