skills/2389-research/building-multiagent-systems/SKILL.md
This skill should be used when designing or implementing systems with multiple AI agents that coordinate to accomplish tasks. Triggers on "multi-agent", "orchestrator", "sub-agent", "coordination", "delegation", "parallel agents", "sequential pipeline", "fan-out", "map-reduce", "spawn agents", "agent hierarchy".
npx skillsauth add aiskillstore/marketplace building-multiagent-systemsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive architecture patterns for multi-agent systems where AI agents coordinate to accomplish complex tasks using tools. Language-agnostic and applicable across TypeScript, Python, Go, Rust, and other environments.
Before architecting any system, ask these six mandatory questions:
Every agent follows the four-layer architecture for testability, safety, and modularity:
| Layer | Name | Responsibility | |-------|------|----------------| | 1 | Reasoning (LLM) | Plans, critiques, decides which tools to call | | 2 | Orchestration | Validates, routes, enforces policy, spawns sub-agents | | 3 | Tool Bus | Schema validation, tool execution coordination | | 4 | Deterministic Adapters | File I/O, APIs, shell commands, database access |
Critical Rule: Everything below Layer 1 must be deterministic. No LLM calls in tools.
See references/four-layer-architecture.md for detailed implementation with code examples.
| Pattern | Purpose |
|---------|---------|
| Event-Sourcing | All state changes as events for audit trails and replay |
| Hierarchical IDs | Encode delegation hierarchy (e.g., session.1.2) for cost aggregation |
| Agent State Machines | Explicit states (idle → thinking → tool_execution → stopped) with invalid transition errors |
| Communication | EventEmitter for state changes, promises for result collection |
Choose based on discovery question answers:
| Pattern | Use Case | Trade-offs | |---------|----------|------------| | Fan-Out/Fan-In | Parallel independent work | Fast but costly; watch for orphans | | Sequential Pipeline | Multi-stage transformations | Bottleneck at slowest stage | | Recursive Delegation | Hierarchical task breakdown | Must add depth limits | | Work-Stealing Queue | 1000+ tasks with load balancing | No built-in priority | | Map-Reduce | Cost optimization | Cheap map ($0.01), smart reduce ($0.15) | | Peer Collaboration | LLM council for bias reduction | Expensive (3N+1 calls), slow | | MAKER | Zero-error tasks (100K+ steps) | 5× cost but ~0% error rate |
See references/coordination-patterns.md for detailed implementations.
| Requirement | Recommended Pattern | |-------------|---------------------| | Parallel independent tasks | Fan-Out/Fan-In | | Each stage depends on previous | Sequential Pipeline | | Complex task decomposition | Recursive Delegation | | Large batch processing | Work-Stealing Queue | | Cost-sensitive analysis | Map-Reduce | | Need diverse perspectives | Peer Collaboration | | Zero error tolerance | MAKER |
For tasks requiring 100K+ steps with zero error tolerance (medical, financial, legal domains):
Cost comparison: Same cost as traditional approach, zero errors vs. 10+ errors.
See references/maker-pattern.md for full implementation with medical diagnosis example.
| Mechanism | Purpose | |-----------|---------| | Permission Inheritance | Children inherit subset of parent permissions (cannot escalate) | | Resource Locking | Acquire/release patterns for shared resources | | Rate Limiting | Token bucket algorithm across all agents | | Result Caching | Cache read-only, idempotent, expensive operations |
Sub-Agent as Tool Pattern: Wrap specialized agents as tools the parent can call, providing composable abstractions and natural lifecycle management.
See references/tool-coordination.md for implementations.
"Always stop children before stopping self." This prevents orphaned agents.
1. Get all child agents
2. Stop all children in parallel
3. Stop self
4. Cancel ongoing work
5. Flush events
If pause/resume unavailable, implement manual checkpointing: save agent state (messages, context, tool results), then restore later.
| Concern | Solution | |---------|----------| | Orphan Detection | Heartbeat monitoring every 30 seconds | | Cost Tracking | Hierarchical aggregation across agent tree | | Session Persistence | Project-level task store for cross-session work | | Checkpointing | Save after 10+ tools, $1.00 cost, or 5 minutes elapsed | | Self-Modification Safety | Blast radius assessment, branch isolation, test-first |
See references/production-hardening.md for detailed implementations.
A pull request orchestrator using Fan-Out/Fan-In:
When guiding implementation of multi-agent systems:
| Pitfall | Impact | |---------|--------| | Missing four-layer architecture | Untestable, unsafe, hard to debug | | LLM calls in tools (Layer 3-4) | Non-deterministic, can't unit test | | No schema-first tool design | Sub-agents can't discover tools | | Missing cascading stop | Orphaned agents consuming resources | | No permission inheritance | Sub-agents can escalate privileges | | No timeouts | Indefinite hangs waiting for sub-agents | | Unbounded concurrency | Resource exhaustion from too many agents | | Ignoring cost tracking | Budget surprises | | No partial-failure handling | One failure cascades to all agents | | Unpersisted state | Unrecoverable workflows on crash | | Uncoordinated tool access | Race conditions on shared resources | | Wrong model selection | Cost inefficiency (Sonnet for simple tasks) | | Self-modification without safety | Sub-agents break themselves | | No heartbeat monitoring | Can't detect orphans after parent crash |
Detailed implementations with code examples:
| File | Contents |
|------|----------|
| references/four-layer-architecture.md | Four-layer stack, deterministic boundary, schema-first tools |
| references/coordination-patterns.md | Seven coordination patterns with code |
| references/maker-pattern.md | MAKER implementation, voting, medical diagnosis example |
| references/tool-coordination.md | Permission inheritance, locking, rate limiting, caching |
| references/production-hardening.md | Cascading stop, orphan detection, cost tracking, checkpointing |
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.