skills/microservices-architecture/SKILL.md
Microservices architecture patterns, trade-offs, and implementation guidance. Use when user asks about service decomposition, bounded contexts, API gateways, distributed transactions, saga pattern, circuit breaker, service mesh, event-driven architecture, CQRS, event sourcing, message brokers, service discovery, distributed tracing, contract testing, strangler fig migration, polyglot persistence, eventual consistency, inter-service communication, or any microservices design, operational concerns, and architectural decisions.
npx skillsauth add 1mangesh1/dev-skills-collection microservices-architectureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Patterns, trade-offs, and practical guidance for designing, building, and operating microservices systems.
Start with a well-structured monolith. Extract services only when you have a clear, justified reason. Premature decomposition creates distributed monoliths.
Service boundaries should align with business domains, not technical layers.
Bounded Contexts define the boundaries within which a particular domain model applies. Each microservice should map to one bounded context.
Order Context Inventory Context Shipping Context
+-----------------+ +-----------------+ +-----------------+
| Order | | Product | | Shipment |
| OrderLine | | StockLevel | | Carrier |
| Payment | | Warehouse | | TrackingEvent |
+-----------------+ +-----------------+ +-----------------+
Define explicit relationships between bounded contexts:
REST (HTTP/JSON)
/v1/orders) or headersgRPC (Protocol Buffers)
.proto filesservice OrderService {
rpc CreateOrder (CreateOrderRequest) returns (OrderResponse);
rpc StreamOrderUpdates (OrderQuery) returns (stream OrderUpdate);
}
Message Queues (Point-to-Point)
Events (Publish-Subscribe)
Choose async when:
Mobile App --\
Web App ----> API Gateway ----> Order Service
Partner API --/ | \------> Inventory Service
| \------> User Service
v
Authentication
Rate Limiting
Request Routing
Response Aggregation
Protocol Translation
Logging / Metrics
A variant where each client type (mobile, web, partner) gets its own gateway tailored to its needs, avoiding a one-size-fits-all gateway.
Services need to locate each other at runtime when instances scale dynamically.
The client queries a service registry and selects an instance.
Client -> Service Registry -> [Instance A, Instance B, Instance C]
Client -> Instance B (selected via load balancing)
The client sends requests to a load balancer, which queries the registry.
Client -> Load Balancer -> Service Registry -> Instance A
order-service.default.svc.cluster.local)In Kubernetes environments, built-in DNS-based discovery often eliminates the need for a separate registry.
Instead of storing current state, store the sequence of events that led to the current state.
Events (append-only log):
1. OrderCreated { orderId: 42, items: [...], total: 150.00 }
2. PaymentReceived { orderId: 42, amount: 150.00 }
3. OrderShipped { orderId: 42, trackingId: "XYZ123" }
Current state is derived by replaying events.
Benefits: Full audit trail, temporal queries ("what was the state at time T"), ability to rebuild read models, supports event replay for debugging.
Costs: Increased storage, more complex queries, eventual consistency, schema evolution for events requires care.
Separate the write model (commands) from the read model (queries).
Commands (writes) Queries (reads)
| |
v v
Write Model --events--> Read Model (projection)
(normalized) (denormalized, optimized for queries)
| Broker | Best for | Key characteristics | |--------|----------|-------------------| | RabbitMQ | Task queues, work distribution, routing | AMQP protocol, flexible routing (direct, topic, fanout), message acknowledgment, per-message delivery guarantees, lower throughput than Kafka | | Apache Kafka | Event streaming, event sourcing, high-throughput pipelines | Append-only log, consumer groups, message retention by time/size, replay capability, horizontal scaling via partitions, high throughput | | Amazon SQS | Simple cloud-native queuing | Fully managed, no infrastructure to operate, standard and FIFO queues, dead-letter queues, integrates with AWS Lambda |
Each service owns its database. No other service accesses it directly.
Order Service --> Orders DB (PostgreSQL)
Product Service --> Products DB (MongoDB)
Search Service --> Search Index (Elasticsearch)
Benefits: Independent schema evolution, technology choice per service, no shared-database coupling.
Challenge: Cross-service queries and distributed transactions.
Manage distributed transactions as a sequence of local transactions with compensating actions.
Choreography (event-based):
Order Service: CreateOrder --> emit OrderCreated
Payment Service: hears OrderCreated --> charge payment --> emit PaymentCompleted
Inventory Service: hears PaymentCompleted --> reserve stock --> emit StockReserved
Order Service: hears StockReserved --> confirm order
On failure at any step: each service listens for failure events and runs compensation
(e.g., PaymentFailed --> refund payment, release stock)
Orchestration (coordinator-based):
Saga Orchestrator:
1. Tell Order Service: create order
2. Tell Payment Service: charge payment
3. Tell Inventory Service: reserve stock
On failure: send compensating commands in reverse order
In a microservices system, strong consistency across services is impractical. Accept that:
Prevent cascading failures when a downstream service is unhealthy.
CLOSED (normal) --failures exceed threshold--> OPEN (failing fast)
OPEN --timeout expires--> HALF-OPEN (testing)
HALF-OPEN --success--> CLOSED
HALF-OPEN --failure--> OPEN
Retry with exponential backoff:
import time
def call_with_retry(func, max_retries=3, base_delay=1.0):
for attempt in range(max_retries):
try:
return func()
except TransientError:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt) # 1s, 2s, 4s
time.sleep(delay)
Timeout: Set timeouts on every external call. A missing timeout is a latency leak. Calculate cascading timeouts so upstream timeouts are longer than downstream.
Fallback strategies:
Libraries: Resilience4j (Java), Polly (.NET), Hystrix (legacy), custom implementations.
Logs: Structured, machine-parsable records of discrete events.
Metrics: Numeric measurements aggregated over time.
Traces: End-to-end request paths across services.
The emerging standard for instrumentation. Provides APIs and SDKs for traces, metrics, and logs with exporters to multiple backends. Prefer OpenTelemetry over vendor-specific instrumentation.
Every service should expose:
A dedicated infrastructure layer for managing service-to-service communication.
Service A -> Sidecar Proxy A ---network---> Sidecar Proxy B -> Service B
\ /
-----> Control Plane <--------
(configuration, certificates, policies)
Microservices are typically deployed as containers managed by an orchestrator.
/ E2E Tests \ (few, slow, expensive)
/ Integration \
/ Contract Tests \
/ Component Tests \
/ Unit Tests \ (many, fast, cheap)
Verify that service interfaces remain compatible without running full integration tests.
Test a single service in isolation with its dependencies stubbed or mocked.
Test interactions between real services in a shared test environment.
Test complete user journeys across the full system.
Services are deployed independently but must be changed, tested, and deployed together. Causes:
Multiple services read/write the same database tables. Eliminates independent deployability and schema evolution. Always give each service its own data store.
Excessive fine-grained calls between services. If Service A makes 20 calls to Service B to fulfill one request:
One service grows to handle too many responsibilities, becoming a monolith in disguise. Enforce bounded contexts.
Treating remote calls like local function calls. Always account for latency, partial failure, and network partitions.
A -> B -> C -> D where each call is synchronous. Latency compounds, and failure in any service fails the entire chain. Prefer async communication or reduce chain depth.
Incrementally replace monolith functionality with microservices.
Phase 1: All traffic --> Monolith
Phase 2: Proxy/Router --> New Service (handles /orders)
\--> Monolith (handles everything else)
Phase 3: Proxy/Router --> Order Service
\--> Payment Service
\--> Monolith (shrinking)
Phase N: Monolith is decommissioned
Steps:
Refactor internally before extracting.
tools
Parallel execution with xargs, GNU parallel, and batch processing patterns. Use when user mentions "xargs", "parallel", "batch processing", "run in parallel", "parallel execution", "process list of files", "bulk operations", "concurrent commands", "map over files", or running commands on multiple inputs.
development
WebSocket implementation for real-time bidirectional communication. Use when user mentions "websocket", "ws://", "wss://", "real-time", "live updates", "chat application", "socket.io", "Server-Sent Events", "SSE", "push notifications", "live data", "streaming data", "bidirectional communication", "websocket server", "reconnection", or building real-time features.
tools
Frontend bundler configuration for Webpack and Vite. Use when user mentions "webpack", "vite", "bundler", "vite config", "webpack config", "code splitting", "tree shaking", "hot module replacement", "HMR", "build optimization", "bundle size", "chunk splitting", "loader", "plugin", "esbuild", "rollup", "dev server", or configuring JavaScript build tools.
tools
VS Code configuration, extensions, keybindings, and workspace optimization. Use when user mentions "vscode", "vs code", "vscode settings", "vscode extensions", "keybindings", "code editor", "workspace settings", "settings.json", "launch.json", "tasks.json", "vscode snippets", "devcontainer", "remote development", or customizing their VS Code setup.