Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jaykim88/background-jobs

Name: background-jobs
Author: jaykim88

plugins/backend-toolkit/skills/background-jobs/SKILL.md

npx skillsauth add jaykim88/claude-ai-engineering background-jobs

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Background Jobs

Purpose

Move slow, external, or deferrable work off the request path into a durable queue with proper retries, dead-lettering, and idempotent handlers — so failures are visible and recoverable, not silent.

Universal — queue design, retry/backoff, DLQ, concurrency control, and idempotent handlers are job-processing principles; BullMQ is the default implementation.

Procedure

Decide what belongs in a job
- Slow (> ~1s), external I/O, deferrable, or retryable work → job
- Keep the request fast; return early, do the work async
- Don't queue work that must be synchronous for the user's next action
Make every handler idempotent
- At-least-once execution means a job CAN run more than once (retry, redelivery)
- Dedupe on a job/business key; design the side effect to be safe to repeat (see resilience-patterns idempotency)
Configure retries: exponential backoff + jitter, capped
- Set attempts + backoff: { type: 'exponential' } (+ jitter)
- Distinguish retryable (timeout, 503) from non-retryable (validation, 4xx) failures — don't retry the un-retryable
Dead-letter queue for poison messages
- After max attempts, move to a DLQ instead of dropping or infinite-retrying
- Alert on DLQ growth; provide a replay path after the bug is fixed
Control concurrency — and apply backpressure
- Set worker concurrency to protect downstream (DB pool, rate-limited API)
- Rate-limit jobs hitting a quota'd external service
- Producers outpacing consumers → queue depth climbs. Decide the policy: producer slowdown / circuit-break the producing endpoint / shed (return 429 / drop low-priority). Unbounded queue depth is a delayed outage, not a working system

5b. Set a visibility timeout / heartbeat for long jobs

A worker that dies mid-job leaves the job in "in progress" — without a visibility timeout it's stuck forever; with one (and an active heartbeat from healthy workers) the job becomes eligible for re-delivery
Tune the timeout to slightly above the job's p99 duration; emit heartbeats for genuinely-long jobs

5c. Retain only what you need (job table cleanup)

Completed and failed jobs accumulate — set removeOnComplete (or equivalent) with a retention count/age; archive critical job history elsewhere if needed
An unbounded job store eats Redis / DB memory silently

Make jobs observable
- Log job start/end/failure with correlation id (see observability-setup)
- Track queue depth, processing latency, failure rate
Validate (validation loop)
- Force a job to fail → verify retry with backoff, then DLQ after cap (not infinite, not silent)
- Run a job twice → verify idempotent (no duplicate side effect)
- If a duplicate side effect occurs → handler not idempotent; fix and re-test

Anti-patterns

| ❌ Anti-pattern | ✅ Correct | |---|---| | Non-idempotent handler (retry double-charges) | Idempotent handler (dedupe key) | | Retry forever on any failure | Capped attempts → DLQ; retry only transient errors | | Failed jobs silently dropped | DLQ + alert + replay path | | Unbounded concurrency exhausting DB pool | Tuned concurrency + rate limits | | Blocking the request on slow work | Enqueue, return fast | | No visibility timeout / heartbeat → job stuck "in progress" forever on worker crash | Visibility timeout slightly above p99; heartbeat for long jobs | | Completed jobs accumulating in Redis until OOM | removeOnComplete retention + archive critical history elsewhere | | Producer faster than consumers; queue grows unbounded | Backpressure policy (slow producer / 429 / shed low-priority) |

Severity tiers

| Tier | Examples | Action SLA | |---|---|---| | Critical | Non-idempotent job double-charging on retry; failed payment jobs silently dropped | Fix immediately | | Major | No DLQ (infinite retry or data loss); no backoff (retry storm) | Fix this sprint | | Minor | Concurrency untuned; missing queue-depth metric | Schedule within 2 sprints |

Completion Criteria

[ ] Handlers idempotent (verified by double-run)
[ ] Retries capped with exponential backoff + jitter
[ ] DLQ configured + alerted
[ ] Concurrency tuned to protect downstream
[ ] Queue depth + failure rate observable

Output

Job definitions + worker config: retry/backoff/concurrency/DLQ
Idempotency keys per job type
Commit format: feat(jobs): add <job> with DLQ + backoff / fix(jobs): make <handler> idempotent

Implementation

TypeScript + BullMQ + Redis (default)

defaultJobOptions: { attempts: 5, backoff: { type: 'exponential', delay: 1000 } } (+ jitter via custom strategy)
DLQ: a separate queue; move on failed after attempts exhausted; BullMQ failed events
Concurrency: new Worker(name, fn, { concurrency: N, limiter: { max, duration } })
Idempotent: dedupe table on job key, or use jobId for natural dedup
See BullMQ "Going to Production"

Other stacks

Python: Celery (acks_late=True, max_retries, autoretry_for) + dead-letter via routing
Go: Asynq or River — same retry/DLQ/concurrency model
Universal: idempotency + capped backoff + DLQ are queue-agnostic; the at-least-once contract is the same everywhere

Related skills

async-messaging — the outbox relay and event consumers run as jobs
resilience-patterns — job retries reuse backoff + idempotency
webhook-design — received webhooks are processed as jobs

Reference

Key insight encoded: Configure exponential backoff with jitter in defaultJobOptions, cap attempts, and make handlers idempotent — at-least-once execution means a job can run more than once on retry.

jaykim88/background-jobs

plugins/backend-toolkit/skills/background-jobs/SKILL.md

Run work off the request thread reliably — queue design, retries with exponential backoff + jitter, dead-letter queues, concurrency control, and idempotent handlers. Use when an operation is slow/external, when jobs fail silently, or when retries cause duplicates. Not for write+event transactional reliability — the dual-write problem (use async-messaging Outbox) or webhook-receiver specifics (use webhook-design).

development

Updated Jun 9, 2026

$ install --global

skillsauth

npx skillsauth add jaykim88/claude-ai-engineering background-jobs

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 9, 2026, 8:25 AM162.0s1 file scanned

SKILL.md

name:: background-jobs
description:: Run work off the request thread reliably — queue design, retries with exponential backoff + jitter, dead-letter queues, concurrency control, and idempotent handlers. Use when an operation is slow/external, when jobs fail silently, or when retries cause duplicates. Not for write+event transactional reliability — the dual-write problem (use async-messaging Outbox) or webhook-receiver specifics (use webhook-design).
license:: MIT

Background Jobs

Purpose

Move slow, external, or deferrable work off the request path into a durable queue with proper retries, dead-lettering, and idempotent handlers — so failures are visible and recoverable, not silent.

Universal — queue design, retry/backoff, DLQ, concurrency control, and idempotent handlers are job-processing principles; BullMQ is the default implementation.

Procedure

Decide what belongs in a job
- Slow (> ~1s), external I/O, deferrable, or retryable work → job
- Keep the request fast; return early, do the work async
- Don't queue work that must be synchronous for the user's next action
Make every handler idempotent
- At-least-once execution means a job CAN run more than once (retry, redelivery)
- Dedupe on a job/business key; design the side effect to be safe to repeat (see resilience-patterns idempotency)
Configure retries: exponential backoff + jitter, capped
- Set attempts + backoff: { type: 'exponential' } (+ jitter)
- Distinguish retryable (timeout, 503) from non-retryable (validation, 4xx) failures — don't retry the un-retryable
Dead-letter queue for poison messages
- After max attempts, move to a DLQ instead of dropping or infinite-retrying
- Alert on DLQ growth; provide a replay path after the bug is fixed
Control concurrency — and apply backpressure
- Set worker concurrency to protect downstream (DB pool, rate-limited API)
- Rate-limit jobs hitting a quota'd external service
- Producers outpacing consumers → queue depth climbs. Decide the policy: producer slowdown / circuit-break the producing endpoint / shed (return 429 / drop low-priority). Unbounded queue depth is a delayed outage, not a working system

5b. Set a visibility timeout / heartbeat for long jobs

A worker that dies mid-job leaves the job in "in progress" — without a visibility timeout it's stuck forever; with one (and an active heartbeat from healthy workers) the job becomes eligible for re-delivery
Tune the timeout to slightly above the job's p99 duration; emit heartbeats for genuinely-long jobs

5c. Retain only what you need (job table cleanup)

Completed and failed jobs accumulate — set removeOnComplete (or equivalent) with a retention count/age; archive critical job history elsewhere if needed
An unbounded job store eats Redis / DB memory silently

Make jobs observable
- Log job start/end/failure with correlation id (see observability-setup)
- Track queue depth, processing latency, failure rate
Validate (validation loop)
- Force a job to fail → verify retry with backoff, then DLQ after cap (not infinite, not silent)
- Run a job twice → verify idempotent (no duplicate side effect)
- If a duplicate side effect occurs → handler not idempotent; fix and re-test

Anti-patterns

Severity tiers

Completion Criteria

[ ] Handlers idempotent (verified by double-run)
[ ] Retries capped with exponential backoff + jitter
[ ] DLQ configured + alerted
[ ] Concurrency tuned to protect downstream
[ ] Queue depth + failure rate observable

Output

Job definitions + worker config: retry/backoff/concurrency/DLQ
Idempotency keys per job type
Commit format: feat(jobs): add <job> with DLQ + backoff / fix(jobs): make <handler> idempotent

Implementation

TypeScript + BullMQ + Redis (default)

defaultJobOptions: { attempts: 5, backoff: { type: 'exponential', delay: 1000 } } (+ jitter via custom strategy)
DLQ: a separate queue; move on failed after attempts exhausted; BullMQ failed events
Concurrency: new Worker(name, fn, { concurrency: N, limiter: { max, duration } })
Idempotent: dedupe table on job key, or use jobId for natural dedup
See BullMQ "Going to Production"

Other stacks

Python: Celery (acks_late=True, max_retries, autoretry_for) + dead-letter via routing
Go: Asynq or River — same retry/DLQ/concurrency model
Universal: idempotency + capped backoff + DLQ are queue-agnostic; the at-least-once contract is the same everywhere

Related skills

async-messaging — the outbox relay and event consumers run as jobs
resilience-patterns — job retries reuse backoff + idempotency
webhook-design — received webhooks are processed as jobs

Reference

Key insight encoded: Configure exponential backoff with jitter in defaultJobOptions, cap attempts, and make handlers idempotent — at-least-once execution means a job can run more than once on retry.

Related Skills

jaykim88/webhook-design

development

VerifiedTrustedCommunity

Design webhooks correctly on both sides — sending (HMAC signing, retries with backoff, at-least-once) and receiving (verify signature on raw body, enqueue + 200 fast, dedupe on event id). Use when adding webhook delivery or consuming a provider's webhooks. Not for internal service-to-service events (use async-messaging) or general outbound-call retry policy (use resilience-patterns).

SKILL.mdUpdated Jun 9, 2026

jaykim88/webhook-design

jaykim88/transaction-management

testing

VerifiedTrustedCommunity

Use transactions and isolation levels correctly — keep them short, no network calls inside, explicit isolation, retry on serialization conflicts, and choose optimistic vs pessimistic locking. Use when a write spans multiple tables, when concurrent updates corrupt data, or when designing money/inventory flows. Not for cross-service event delivery (use async-messaging Outbox) or schema-level constraints (use schema-design).

SKILL.mdUpdated Jun 9, 2026

jaykim88/transaction-management

jaykim88/test-strategy

development

VerifiedTrustedCommunity

Backend testing pyramid — unit for pure logic, integration against a real DB (Testcontainers), and consumer-driven contract testing (Pact) for service boundaries. Use before a feature, after a bug fix, or when services break each other on deploy. Not for load testing (use performance-profiling) or security testing (use backend-security-audit).

SKILL.mdUpdated Jun 9, 2026

jaykim88/test-strategy

jaykim88/schema-design

data-ai

VerifiedTrustedCommunity

Design a relational schema — normalize to 3NF then denormalize with justification, choose the right Postgres index type per data shape, enforce constraints at the DB. Use when modeling a new domain, when queries are slow, or before a migration. Not for diagnosing slow queries (use query-optimization) or shipping the change without downtime (use migration-strategy).

SKILL.mdUpdated Jun 9, 2026

jaykim88/schema-design

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jaykim88/claude-ai-engineering.git

# Copy into Claude Code skills folder (global)
cp -r claude-ai-engineering/plugins/backend-toolkit/skills/background-jobs ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jaykim88/claude-ai-engineering

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT