event-driven-design/SKILL.md
Design, implement, and test event-driven systems — event modelling, schema evolution, transactional outbox, sagas (orchestration and choreography), event sourcing, CQRS, idempotent consumers, and tests that actually catch race conditions. Use whenever the user mentions events, event bus, message broker, Kafka, RabbitMQ, SNS/SQS, EventBridge, Pub/Sub, Kinesis, event sourcing, CQRS, sagas, choreography, outbox, CDC, dead-letter queue, eventual consistency, at-least-once delivery, or when designing asynchronous coupling between services. Also use when reviewing event-driven code for failure modes or converting a synchronous design to event-driven. Do NOT use for fire-and-forget in-process callbacks, UI event listeners, DOM events, plain pub/sub inside a single process, reactive UI state machines, or pure request/response API design (use api-design-principles instead). Do NOT use to pick a specific broker (Kafka vs RabbitMQ vs SQS) — this skill is vendor-neutral.
npx skillsauth add kayaman/skills event-driven-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design, implement, review, and test event-driven systems. The goal of this skill is to make the default choice the correct one — so the system stays debuggable, evolvable, and resilient as it grows beyond the first three services.
This file is a table of contents. Load referenced files only when the task needs them.
| Reference | When to load |
|---|---|
| references/PATTERNS.md | Choosing between event notification / state transfer / sourcing; designing aggregates, projections, CDC. |
| references/EVENT_SCHEMA.md | Designing a new event, changing an existing one, versioning, schema registry, compatibility rules. |
| references/SAGAS.md | Cross-service workflow, compensating actions, choreography vs orchestration, process managers. |
| references/TESTING.md | Writing tests — unit (given-when-then on aggregates), consumer-driven contracts, integration with Testcontainers, contract compatibility CI gate. |
| references/ANTIPATTERNS.md | Reviewing existing code or debugging "why is this system so painful?" |
| references/EVENT_STORMING.md | Starting from a blank domain — running the workshop, colour coding, translating stickies to bounded contexts. |
Scripts (invoke via bash; only the output consumes context):
scripts/validate-event-schema.py <event.json> <schema.json> — validate an event instance against a JSON Schema envelope.scripts/check-naming.py <path> — enforce past-tense event names and envelope fields across a directory of schemas.Most confusion in event-driven design starts here. Get this right first, then everything else composes.
Rule: Pick one per boundary and state it in the contract. Mixing modes silently inside one topic is the largest source of mystery bugs in mature event systems. (Bellemare, Building Event-Driven Microservices, 2nd Edition, O'Reilly; Fowler's taxonomy as cited in Richardson, Microservices Patterns, O'Reilly.)
PlaceOrder). It can be rejected. Named in imperative. One handler. Coupling is explicit.OrderPlaced). It cannot be rejected — it already happened. Named in past tense. N handlers. Coupling is inverted (producer does not know who listens).If you're debating whether something is a command or an event, ask: can a consumer refuse it? If yes, it is a command. If the "event" has a handler that validates and rejects, rename it to a command — you have misplaced authority.
These five rules hold across every event-driven system worth running in production. Load references/PATTERNS.md for the detailed treatment.
OrderPlaced, not PlaceOrder and not OrderPlacing. An immutable fact is the unit of interop. (Perry, The Art of Immutable Architecture, O'Reilly.)id, type, version, source, occurredAt, causationId, correlationId, plus the domain payload. The envelope exists so consumers can deduplicate, trace, and evolve without coordinating with the producer. See assets/event-envelope.schema.json.event.id or by idempotent state transitions — not by trusting the broker. (Kleppmann & Riccomini, DDIA, 2nd Edition, O'Reilly.)references/EVENT_SCHEMA.md. (Bellemare, Building Event-Driven Microservices, 2e; Stopford, Designing Event-Driven Systems, O'Reilly.)Reach for it when:
Do NOT reach for it when:
correlationId, DLQ dashboards). Adopt those first; EDA on top of blind operations is a debugging nightmare.references/SAGAS.md.Adopting EDA is hard to reverse. Bias toward starting synchronous and extracting events at the seams that proved valuable — not toward greenfield EDA on day one.
Use this when adding a single new event to an existing system. For a greenfield domain, run event storming first (references/EVENT_STORMING.md).
- [ ] Event named in past tense, unambiguous, and scoped to one aggregate (`OrderPaid`, not `OrderUpdated`)
- [ ] Fits the chosen mode for this topic: notification, state transfer, or sourcing — pick one
- [ ] Envelope fields filled: id (UUIDv7 or ULID), type, version=1, source, occurredAt, causationId, correlationId
- [ ] Payload contains a stable identifier for every referenced entity (never only mutable display strings)
- [ ] Schema defined (Avro / Protobuf / JSON Schema), checked into the repo next to the producer
- [ ] Backward-compatibility rule declared for the topic (and documented in the schema file header)
- [ ] Producer writes event + state in one local DB transaction via the outbox table
- [ ] A relay (or CDC) publishes outbox rows to the broker — not application code directly
- [ ] Consumers dedupe on event.id (consumed-events table, or idempotent state transition)
- [ ] A replay strategy exists: can a new consumer backfill from the topic or the outbox?
- [ ] Tests: producer aggregate (given-when-then), consumer handler, schema-compatibility, end-to-end with Testcontainers
- [ ] Tracing: causationId/correlationId propagated through every handler; visible in logs
If any box is "no", load the matching reference file and fix it before merging.
{
"id": "01H8XYZ...", // UUIDv7 or ULID — monotonic + unique
"type": "OrderPaid", // past tense, PascalCase
"version": 1, // integer; bump on breaking changes
"source": "payments.service", // stable identifier for the producer
"occurredAt": "2026-04-23T02:45:12.413Z", // ISO-8601, UTC, millisecond precision
"causationId": "01H8XYW...", // id of the command/event that caused this
"correlationId": "01H8XYV...", // end-to-end flow id, constant across the saga
"data": {
"orderId": "ord_7Qk2",
"amount": { "currency": "BRL", "minor": 42900 },
"paidAt": "2026-04-23T02:45:11.900Z"
}
}
Full schema: assets/event-envelope.schema.json. Validate with scripts/validate-event-schema.py event.json assets/event-envelope.schema.json.
Load references/PATTERNS.md for the full catalogue. These three are the ones worth knowing at TOC level:
Problem. Writing to the DB and the broker in two network calls is a dual-write. One can succeed while the other fails — silently producing ghost state or lost events.
Solution. Write to an outbox table in the same local transaction as the business state. A separate process (a relay) reads unpublished rows and forwards them to the broker, then marks them published. Rows are eventually consistent with the broker; the DB and outbox are always consistent with each other.
Why this beats "just send to the broker in a finally block": the finally block runs after the transaction commits, which is exactly when the process can crash and you lose the event. The outbox removes that window because the commit itself records the intent to publish.
DDL skeleton: assets/outbox-table.sql. (Richardson, Microservices Patterns, O'Reilly.)
Problem. Brokers deliver at-least-once. Retries, rebalances, and reprocessing all re-deliver events the consumer already saw. Processing the same OrderPaid twice debits the customer twice.
Solution (pick one):
INSERT … ON CONFLICT DO NOTHING into consumed_events(event_id, consumer_name). Skip if already there.balance = balance + amount is not idempotent; balance_at_seq(seq=42) = X is.Do not rely on the broker's "exactly-once" flag alone. It usually means exactly-once inside the broker's storage — not across your external side effects. (Kleppmann & Riccomini, DDIA, 2e, O'Reilly.)
Two flavours:
For anything with more than two steps or any compensating logic, prefer orchestration. Load references/SAGAS.md. (Richardson, Microservices Patterns; Newman, Building Microservices, 2e, O'Reilly.)
Concrete failure modes previously seen in production — update this list as you accumulate scars.
offset=42 means the same thing across topics; use event.id for dedupe.{"userName": "ana"} without userId will break the day someone renames. Always carry stable identifiers.occurredAt.customerId when one customer does 80% of traffic pins throughput to a single consumer. Monitor per-partition lag; re-shard or introduce a sub-key if needed.UserInfoRequested and waiting for UserInfoReplied reinvents synchronous RPC with worse latency and worse debugging. Use the synchronous call or a materialized read model.When the task is "review this code" or "why does this feel painful?", load references/ANTIPATTERNS.md and walk this order:
rg 'producer\.send|kafkaTemplate\.send|sns:Publish|eventBridge\.putEvents' plus your local wrappers.*Updated bags. A UserUpdated with 14 optional fields is N events pretending to be one; split it.git log on the schema files). Breaking changes without a version bump or consumer-coordination plan indicate no governance.Before declaring a schema or event design done:
scripts/validate-event-schema.py <sample.json> <schema.json> against a realistic instance.scripts/check-naming.py <schemas-dir> to catch present-tense names and missing envelope fields.references/TESTING.md and ensure every bullet in the "Minimum viable test pyramid" section has at least one corresponding test committed.api-design-principles — when the boundary should be synchronous (REST/GraphQL).domain-driven-design — aggregates, bounded contexts, ubiquitous language. Event storming is a DDD technique; this skill focuses on implementation.hexagonal-architecture / clean-architecture — how to structure a service so its event handlers do not leak into the domain.tools
Guidance for designing charts, graphs, plots, dashboards, and data visualizations that communicate clearly and persuade. Use when creating or reviewing a visualization, choosing a chart type, picking a color palette, decluttering a busy graphic, fixing misleading axes or proportions, building a dashboard, annotating a figure, or turning data into a presentation, report, or data-driven story. Grounded in the standard data-visualization literature (Knaflic, Tufte, Cleveland & McGill, Cairo, Wilke, Munzner, Few, Berinato). Covers chart selection, graphical perception and encoding, color and accessibility, decluttering, graphical integrity, dashboards, and narrative. Does NOT cover building data pipelines or ETL, statistical modeling or analysis methods, BI tool/vendor selection, or general UI/UX layout (see ux-design-principles). Tool-agnostic, with optional Python recipes.
development
Architect and implement production-grade microservices systems in TypeScript (NestJS) and Python (FastAPI), including resilience, observability, testing, deployment, and migration guidance.
development
--- name: databricks-genie-spaces-best-practices description: Design, configure, curate, govern, monitor, and integrate Databricks AI/BI Genie Spaces — the natural-language-to-SQL surface over Unity Catalog. Covers space scoping, general instructions, parameterized example SQL, SQL functions, trusted assets, JOIN configuration, knowledge store, certified queries, benchmarks, monitoring tab, feedback loops, the Genie Conversation API, governance via Unity Catalog (row filters, column masks, embed
tools
Implement OTP and passwordless authentication on AWS for TypeScript projects using Cognito CUSTOM_AUTH triggers (default) or a custom DynamoDB-backed flow, with SES (email) and SNS (SMS) delivery. Use when the user mentions OTP, one-time password, passwordless login, magic link, Cognito custom auth, DefineAuthChallenge, CreateAuthChallenge, VerifyAuthChallengeResponse, SES verification email, SNS SMS code, or MFA over email/SMS. Covers architecture decision (Cognito vs custom), Lambda trigger handlers, SES/SNS notifiers, DynamoDB schema with TTL, rate limiting, constant-time comparison, threat model (enumeration, replay, brute force), and aws-sdk-client-mock testing.