api-design-principles/skills/webhooks-and-events/SKILL.md
This skill should be used when the user is designing webhook systems, implementing webhook signing with HMAC-SHA256, building webhook retry logic, choosing event naming conventions, handling webhook ordering, implementing webhook endpoints, or building event-driven API integrations. Covers Stripe-style webhook patterns, signature verification, exponential backoff retries, and event deduplication.
npx skillsauth add oborchers/fractional-cto webhooks-and-eventsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A webhook is a contract: when something happens on your platform, you will tell the subscriber about it reliably, securely, and in a format they can trust. Every failed delivery, every unsigned payload, every missing retry erodes that trust. Stripe delivers billions of webhook events with cryptographic verification, exponential backoff retries over a 72-hour window, and a structured event format that has become the industry standard. Your webhook system must meet the same bar.
The webhook producer carries the burden of proof. Consumers must be able to verify the event is authentic, handle duplicates gracefully, and trust that missed deliveries will be retried.
resource.action ConventionName every event type as resource.action, dot-separated, with the action in past tense. This is Stripe's convention and it has become the de facto standard across modern APIs.
order.created
order.updated
order.cancelled
payment.succeeded
payment.failed
payment.refunded
invoice.paid
invoice.overdue
invoice.finalized
customer.subscription.created
customer.subscription.updated
customer.subscription.deleted
customer.subscription.trial_will_end
Rules:
parent.child.action (e.g., customer.subscription.created)..created, .updated, .deleted, .succeeded, .failed. Never .create (that sounds like a command) or .creating (that implies in-progress).payment_intent.succeeded, not paymentIntent.succeeded.Every event follows the same envelope. Consumers parse one format regardless of the event type.
{
"id": "evt_1NdBKYLkdIwHu7ixr0rMHeVX",
"type": "order.created",
"created": 1689956724,
"api_version": "2024-01-15",
"data": {
"object": {
"id": "ord_01HXK3GJ5V8WJKPT",
"status": "pending",
"total": 4999,
"currency": "usd",
"customer": "cus_NffrFeUfNV2Hib"
},
"previous_attributes": {}
},
"request": {
"id": "req_abc123def456",
"idempotency_key": "KG5LxwFBepaKHyKt"
}
}
Required fields:
| Field | Purpose |
|-------|---------|
| id | Unique event identifier with evt_ prefix. Consumers use this for deduplication. |
| type | The event type string (resource.action). Drives routing in the consumer. |
| created | Unix timestamp of event creation. Used for ordering and staleness checks. |
| data.object | The full resource in its current state after the event occurred. |
| data.previous_attributes | For .updated events: the fields that changed and their old values. Empty object for other event types. |
| request.id | The API request ID that triggered this event (null for async or system events). Enables end-to-end tracing. |
Design decisions:
created is a Unix timestamp integer, not an ISO 8601 string. No timezone ambiguity.data.object contains the full resource snapshot, not a diff. Consumers reconstruct current state from a single event without fetching the API.previous_attributes is populated only for .updated events. Empty object otherwise.Sign every webhook payload with HMAC-SHA256. Include a timestamp in the signed content to prevent replay attacks. This is non-negotiable — an unsigned webhook endpoint is an open door for attackers to inject fake events.
The signature header:
X-Webhook-Signature: t=1689956724,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd
Signing algorithm (sender side):
{timestamp}.{payload}.t={timestamp},v1={hex_signature}.Verification algorithm (receiver side):
t and v1 from the signature header.{t}.{raw_body}.v1 using a constant-time comparison function.400 Bad Request.Critical implementation details:
crypto.timingSafeEqual in Node.js, hmac.compare_digest in Python). Standard string equality leaks timing information that attackers can exploit to reconstruct the signature byte by byte.whsec_ (e.g., whsec_test_51MqLiJLkdIwH). Return it only once, at endpoint creation time. If the consumer loses it, rotate and issue a new one.The timestamp in the signature header prevents replay attacks. Without it, an attacker who intercepts a valid webhook payload can replay it days later and the signature will still verify.
Rule: Reject any event where the timestamp is more than 5 minutes old. This tolerance window accounts for clock skew between servers while keeping the replay window tight.
Attacker captures: t=1689956724,v1=abc123...
Replays 2 hours later.
Receiver: current_time - 1689956724 = 7200 seconds > 300 seconds
=> REJECTED
Five minutes is the standard tolerance. Stripe uses 300 seconds. If your infrastructure has exceptional clock skew, widen to 10 minutes but never more.
When delivery fails — the endpoint returns a non-2xx status code, the connection times out, or DNS resolution fails — retry with exponential backoff. Do not drop events after a single failure.
Recommended retry schedule:
Attempt 1: Immediate
Attempt 2: 5 minutes
Attempt 3: 30 minutes
Attempt 4: 2 hours
Attempt 5: 8 hours
Attempt 6: 24 hours
Attempt 7: 48 hours
Final: 72 hours after first attempt
Rules:
Webhooks provide at-least-once delivery, never exactly-once. Network failures, retries, and edge cases mean the same event may arrive more than once. Every webhook consumer must be idempotent.
The pattern: Before processing an event, check if event.id has already been processed. If it has, return 200 OK without re-processing. Record the event ID within the same database transaction as the business logic to avoid race conditions.
Incoming event: evt_1NdBKYLkdIwHu7ixr0rMHeVX
→ Check processed_events table for this ID
→ Already exists? Return 200, skip processing
→ Not found? BEGIN transaction:
1. INSERT into processed_events
2. Execute business logic
3. COMMIT
→ Return 200
The deduplication check and the business logic must happen atomically. If you check for duplicates, process the event, and then record the event ID as separate steps, a crash between steps 2 and 3 will cause the event to be re-processed on the next delivery.
Events may arrive out of order. A customer.subscription.updated event might arrive before the customer.subscription.created event if the first delivery attempt for created fails and is retried.
The pattern: Use the created timestamp to determine which event reflects the latest state. Implement a latest-wins strategy:
last_event_timestamp alongside each resource.created timestamp to the stored last_event_timestamp.Event A (created: 1000): customer.subscription.created → arrives second
Event B (created: 1001): customer.subscription.updated → arrives first
Processing B first: upsert with last_event_timestamp = 1001
Processing A second: created = 1000 < last_event_timestamp = 1001 → skip
This eliminates the brittle assumption that created always arrives before updated. Build your handlers to work regardless of event order.
For webhook producers (your API):
200.For webhook consumers (your users):
200 OK within 5 seconds. Return immediately, then process the event asynchronously using a background job queue.Let consumers register endpoints via your API and filter which event types they receive. Do not require manual configuration in a dashboard.
POST /v1/webhook_endpoints
{
"url": "https://example.com/webhooks",
"enabled_events": ["order.created", "payment.succeeded", "payment.failed"],
"description": "Production payment events"
}
→ 201 Created
{
"id": "wh_1MqVTHLkdIwHu7ix5RbKdAnA",
"url": "https://example.com/webhooks",
"status": "enabled",
"enabled_events": ["order.created", "payment.succeeded", "payment.failed"],
"secret": "whsec_live_abc123...",
"created": 1689956724
}
Rules:
secret only on creation. Never return it on subsequent GET or LIST requests.enabled_events filtering. A consumer processing only payments should not receive order events.["*"] to subscribe to all event types.status field (enabled / disabled) that reflects whether the endpoint is active.GET /v1/events) that lets consumers list recent events for debugging, with filters by type and date range.A webhook system without monitoring is a black hole. Build these from day one:
Delivery dashboard. Show every delivery attempt for every endpoint: timestamp, event type, HTTP status code, response time, and whether it was a retry. Let consumers see their own endpoint's delivery history.
Failure alerts. When an endpoint accumulates consecutive failures (e.g., 5 in a row), send an email alert to the owner. Do not wait until the 72-hour retry window expires.
Manual replay. Provide a dashboard button and an API endpoint (POST /v1/events/:id/retry) to re-trigger delivery of a specific event. Replays recover data after consumer bug fixes.
Metrics. Track delivery success rate, p95 response time, retry rate, and failure rate per endpoint. Use these for automatic endpoint disabling decisions.
Working implementations in examples/:
examples/webhook-signature-verification.md -- Complete HMAC-SHA256 webhook signature generation (sender) and verification (receiver) with timestamp validation in Node.js and Pythonexamples/webhook-retry-system.md -- Webhook delivery system with exponential backoff, dead letter queue, and delivery status tracking in Node.js and PythonWhen reviewing or building webhook systems:
resource.action naming convention with past-tense actionsid, type, created, data.object, requestevent.id and skippedtools
This skill should be used when the user invokes any /plan-* command from the planning-tools plugin (/plan-context, /plan-master, /plan-open-questions, /plan-verify, /plan-tick, /plan-progress, /plan-delete), asks how Claude Code's plan files work, asks where plans are stored, asks to author or audit a multi-phase master planning document, asks how to walk through a plan's Open Questions interactively, asks how to write progress entries, or mentions ~/.claude/plans/ or .claude/planning-tools.local.md. Provides the index of planning-tools commands, the master-plan workflow lifecycle, the v0.3.0+ list-shape mandate (phases and questions as headings + bulleted scope items, never tables), the v0.3.2+ plain-bullet shape (no `- [ ]` checkboxes — heading emoji is the sole tick signal), the progress-entry methodology, and the mechanics of Claude Code's plan-mode file storage.
testing
This skill should be used by the plan-verifier agent and the /plan-verify command to audit a drafted master plan against a fixed checklist. Covers universal-core completeness, the v0.3.0+ no-tables-for-phases-or-questions rule, trigger-based section-coverage gaps, phase actionability (heading + per-phase TL;DR + bulleted scope + exit criteria), the v0.3.1+ per-phase TL;DR requirement, the v0.3.2+ plain-bullet scope shape (legacy `- [ ]`/`- [x]` accepted silently), the v0.3.3+ context-block shape (plan-level `**TL;DR:**` + bulleted metadata, legacy `>` blockquote accepted silently), integer phase numbering enforcement, dependency traceability, citation resolution, callout/evidence convention compliance, Open Questions placement, and the one-PR-per-master-plan rule. Single-owner of the audit checklist.
tools
This skill should be used when authoring, reviewing, or modifying a multi-phase master planning document via the planning-tools plugin (especially the /plan-master and /plan-verify commands). Codifies the universal core sections, trigger-based optional sections, integer-only phase numbering, Open Questions placement, one-PR-per-plan rule, status conventions, evidence attribution, callouts, cross-reference formats, the v0.3.0 list-shape mandate (phases and questions are heading + bulleted list, never markdown tables), the v0.3.1 per-phase TL;DR requirement (1–3 sentence what/why summary under each phase heading for glance-ability), the v0.3.2 plain-bullet scope shape (`- <action>` items, no `- [ ]` checkboxes — the phase status emoji is the sole tick signal), and the v0.3.3 context-block shape (a plan-level `**TL;DR:**` + a bulleted metadata list instead of a `>` blockquote; legacy blockquote blocks accepted silently). Project-agnostic — no ticket-prefix or plan-type taxonomy.
testing
This skill should be used when the user is adjusting spacing, padding, margins, content density, section gaps, vertical rhythm, or separation between elements. Also applies when reviewing whether a design feels cramped or too sparse, choosing between borders and whitespace for separation, or defining a spacing system. Covers the 4px/8px spacing system, macro vs micro whitespace, content density spectrum, separation techniques (whitespace > background shifts > borders), and vertical rhythm.