skills/agent-architecture/SKILL.md
AI agent orchestration patterns (ReAct, chain-of-thought, multi-agent), tool design, memory strategies, guardrails, and token cost modeling. Use when designing AI agent systems.
npx skillsauth add navraj007in/architecture-cowork-plugin agent-architectureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Patterns, decision frameworks, and best practices for designing AI agent systems. Use this skill when the project type is agent or hybrid.
| Pattern | Best For | Complexity | Example Use Case | |---------|----------|-----------|------------------| | Single-turn | Simple Q&A, classification, extraction | Low | FAQ bot, sentiment analysis, data extraction | | ReAct | Tool-using tasks that need reasoning | Medium | Research assistant, data analysis agent | | Chain-of-thought | Complex reasoning without tools | Medium | Math problems, logic puzzles, decision analysis | | Multi-agent router | Multiple specialist domains | High | Customer support (billing + technical + sales agents) | | Multi-agent parallel | Independent subtasks that can run simultaneously | High | Content pipeline (research + write + review in parallel) | | Plan-and-execute | Multi-step tasks with dependencies | High | Project management bot, complex workflow automation |
User Message → LLM → Response
User Message → LLM thinks → Calls tool → Observes result → LLM thinks → ... → Final response
User Message → LLM step 1 → LLM step 2 → ... → Final response
User Message → Router Agent → Specialist Agent A or B or C → Response
User Message → Coordinator → [Agent A, Agent B, Agent C] (parallel) → Merger → Response
User Message → Planner creates step list → Executor runs step 1 → ... → step N → Response
Every agent tool should follow this structure:
Tool:
name: descriptive-verb-noun (e.g., "search-knowledge-base", "create-ticket")
type: <agent_tool_type>
description: What it does in one sentence
input:
- parameter: name
type: string | number | boolean | object
required: true | false
description: What this parameter controls
output:
- field: name
type: string | number | object | array
description: What this field contains
errors:
- error: name
description: When this error occurs
search-products not tool1, create-user not do-thing| Tool | Purpose | Input | Output |
|------|---------|-------|--------|
| search-knowledge-base | Find relevant documents | query string, top_k | Array of {document, score} |
| query-database | Execute a database query | collection, filter, fields | Array of documents |
| call-api | Make an HTTP request | url, method, headers, body | Response body + status |
| create-record | Insert data | collection, data | Created record ID |
| update-record | Modify data | collection, id, updates | Updated record |
| send-notification | Notify user/system | channel, recipient, message | Delivery status |
| generate-content | Create text/images | prompt, format, constraints | Generated content |
| human-handoff | Escalate to human | reason, context, priority | Handoff confirmation |
| Strategy | When to Use | Implementation | Cost Impact | |----------|-------------|----------------|-------------| | Session | Short conversations, stateless tasks | Include last N messages in context | Low — bounded context | | Persistent | Multi-session relationships, user preferences | Store in database, load relevant history | Medium — database reads | | Vector Store | Large knowledge bases, semantic retrieval | Embed documents, retrieve top-K similar | Medium-High — embedding + storage costs | | Hybrid | Complex agents needing both history and knowledge | Session memory + vector retrieval | High — multiple systems |
Every agent needs explicit guardrails. Define these in the system prompt and enforce them in the orchestration layer.
For each guardrail, document:
Guardrail: [Name]
Trigger: When does this guardrail activate?
Action: What does the agent do?
Message: What does the agent say to the user?
Fallback: What happens if the guardrail can't be enforced?
Monthly cost = conversations/mo × turns/conversation × (input_tokens × input_price + output_tokens × output_price)
| Pattern | Avg Turns | Avg Input Tokens | Avg Output Tokens | Cost per Conversation (Sonnet) | |---------|-----------|-----------------|-------------------|-------------------------------| | Single-turn | 1 | 1,500 | 500 | ~$0.01 | | ReAct (simple) | 3 | 3,000 | 1,500 | ~$0.03 | | ReAct (complex) | 7 | 8,000 | 3,000 | ~$0.07 | | Multi-agent router | 3 | 4,000 | 2,000 | ~$0.04 | | Plan-and-execute | 8 | 10,000 | 5,000 | ~$0.11 |
| Provider | Best For | Strengths | Weaknesses | |----------|----------|-----------|------------| | Anthropic (Claude) | Tool use, long context, safety | Best tool use, 200K context, strong guardrails | Higher cost at Opus tier | | OpenAI (GPT-4o) | General purpose, ecosystem | Huge ecosystem, function calling, vision | Slightly weaker at complex reasoning vs Opus | | Google (Gemini) | Long context, multimodal | 1M+ context window, good vision | Smaller ecosystem | | Mistral | EU data residency, multilingual | Strong European language support, fast | Smaller model selection | | Groq | Speed-critical applications | Ultra-fast inference | Limited model selection | | Local (Ollama) | Privacy, offline, cost control | No API costs, full data control | Requires GPU, lower quality |
When designing agent architecture, provide comprehensive specifications:
Always provide a visual flow diagram showing:
Format (use Mermaid):
graph TD
A[User Message] --> B{Router Agent}
B -->|Technical Question| C[Technical Agent]
B -->|Billing Question| D[Billing Agent]
B -->|General| E[General Agent]
C --> F[Search Knowledge Base]
F --> G{Found Answer?}
G -->|Yes| H[Format Response]
G -->|No| I[Escalate to Human]
D --> J[Query Billing DB]
J --> K[Format Invoice Data]
E --> L[Generate Response]
H --> M[Return to User]
I --> M
K --> M
L --> M
For each agent in the system, provide:
Agent: [Agent Name]
Purpose:
[1-2 sentences describing what this agent does and when it's invoked]
Pattern:
[Single-turn / ReAct / Multi-agent router / etc.]
System Prompt:
You are a [role] agent. Your job is to [specific task].
Guidelines:
Available tools:
Output format: [Expected output structure]
Guardrails:
Tools Available:
| Tool Name | Purpose | When to Use |
|-----------|---------|-------------|
| tool-1 | [Brief description] | [Trigger condition] |
| tool-2 | [Brief description] | [Trigger condition] |
Memory Configuration:
- Type: [Session / Persistent / Vector / Hybrid]
- Storage: [Where data is stored]
- Retention: [How long data is kept]
- Load strategy: [When/how to load memory]
Expected Input:
```json
{
"user_message": "string",
"conversation_id": "string",
"user_context": {
"user_id": "string",
"preferences": {}
}
}
Expected Output:
{
"response": "string",
"confidence": 0.0-1.0,
"sources": ["source1", "source2"],
"needs_escalation": boolean,
"metadata": {}
}
Error Handling: | Error Type | Trigger | Action | User Message | |------------|---------|--------|--------------| | Tool failure | API returns 500 | Retry 3x with backoff | "I'm having trouble accessing [system]. Please try again." | | Low confidence | confidence < 0.7 | Escalate to human | "I'm not confident in my answer. Let me connect you with a specialist." | | Out of scope | Topic outside domain | Redirect | "I specialize in [domain]. For [other topic], please contact [resource]." |
Token Estimate:
Performance Targets:
### 3. Tool Implementation Specs (REQUIRED for each tool)
**For each custom tool, provide complete specification:**
```markdown
Tool: [tool-name]
Type: [API call / Database query / Function / External service]
Description:
[2-3 sentences on what this tool does and why it exists]
Input Schema:
```typescript
interface ToolInput {
param1: string; // Description of param1
param2?: number; // Optional: Description of param2
param3: { // Nested object
field1: string;
field2: boolean;
};
}
Output Schema:
interface ToolOutput {
success: boolean;
data?: {
// Success case structure
};
error?: {
code: string;
message: string;
};
}
Implementation: [Detailed description of how the tool works internally]
External Dependencies:
Error Cases: | Error | Trigger | Code | Message | Retry? | |-------|---------|------|---------|--------| | [Error name] | [When it happens] | ERROR_CODE | "User-friendly message" | Yes/No |
Example Usage:
Input:
{
"param1": "example value",
"param3": {
"field1": "test",
"field2": true
}
}
Output (Success):
{
"success": true,
"data": {
"result": "example result"
}
}
Output (Failure):
{
"success": false,
"error": {
"code": "INVALID_INPUT",
"message": "param1 must not be empty"
}
}
Performance:
Testing Notes:
### 4. Memory Architecture Specification (REQUIRED if using memory)
**Provide detailed memory implementation:**
```markdown
Memory Strategy: [Session / Persistent / Vector / Hybrid]
Storage Backend:
- Technology: [PostgreSQL / MongoDB / Pinecone / Redis]
- Connection: [How to connect]
- Schema: [Data structure]
Session Memory (if applicable):
- Window size: Last N messages
- Summarization: After M messages, summarize older messages
- Retention: Until session ends (X minutes of inactivity)
Persistent Memory (if applicable):
Database Schema:
```sql
CREATE TABLE conversations (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
created_at TIMESTAMP,
last_message_at TIMESTAMP,
summary TEXT,
metadata JSONB
);
CREATE TABLE messages (
id UUID PRIMARY KEY,
conversation_id UUID REFERENCES conversations(id),
role VARCHAR(50), -- 'user' or 'assistant'
content TEXT,
tokens INTEGER,
created_at TIMESTAMP
);
Load Strategy:
Vector Memory (if applicable):
Chunk Strategy (for document embeddings):
Cost Impact:
Privacy Considerations:
### 5. Guardrail Implementation (REQUIRED)
**For each guardrail, provide enforceable specification:**
```markdown
Guardrail: [Name]
Priority: [Critical / High / Medium / Low]
Trigger:
[Specific, measurable condition that activates this guardrail]
Detection Method:
[How the system detects the trigger — pattern matching, classifier, heuristic, etc.]
Action:
1. [First action taken by system]
2. [Second action taken by system]
3. [Final outcome]
User Message:
[Exact message shown to user when guardrail activates]
Bypass Conditions:
[When is it OK to bypass this guardrail? Usually: "Never" or very specific exception]
Logging:
- Log event: Yes/No
- Alert team: Yes/No
- Include in analytics: Yes/No
Example:
User input: "[Example input that triggers guardrail]"
System detects: [What pattern/condition is detected]
System action: [What the system does]
User sees: "[Message shown]"
Testing:
- Test case 1: [Input that SHOULD trigger guardrail]
- Test case 2: [Input that should NOT trigger guardrail]
- Edge case: [Tricky input to test boundary]
Provide detailed token and cost analysis:
Agent Cost Analysis
Model: [Selected LLM model and tier]
Pricing: Input $X / MTok, Output $Y / MTok
Token Breakdown (per conversation):
| Component | Tokens | Cost |
|-----------|--------|------|
| System prompt | X | $X.XX |
| Average user message | X | $X.XX |
| Average agent response | X | $X.XX |
| Tool results | X | $X.XX |
| Context/memory | X | $X.XX |
| **Total per turn** | **X** | **$X.XX** |
Turns per conversation:
- Simple queries: A turns = $X
- Medium complexity: B turns = $Y
- Complex queries: C turns = $Z
- **Average: D turns = $W**
Monthly projection:
| Usage Level | Conversations/mo | Cost/mo |
|-------------|:----------------:|:-------:|
| Low | 1,000 | $X |
| Medium | 10,000 | $Y |
| High | 100,000 | $Z |
Cost Optimization Strategies:
✅ Strategy #1: [Name]
- Current cost: $X per conversation
- Optimized cost: $Y per conversation
- Savings: Z%
- How: [Specific implementation]
- Trade-off: [What you lose, if anything]
✅ Strategy #2: [Name]
- Current cost: $X per conversation
- Optimized cost: $Y per conversation
- Savings: Z%
- How: [Specific implementation]
- Trade-off: [What you lose, if anything]
[Continue for 3-5 strategies...]
Recommended optimization path:
1. [Start with this optimization — easiest/highest impact]
2. [Then this one]
3. [Finally this one if needed]
Cost with all optimizations:
- Before: $X per conversation
- After: $Y per conversation
- Savings: Z% ($W/month at medium usage)
Provide comprehensive testing plan:
Agent Testing Plan
Unit Tests (per tool):
| Tool | Test Case | Expected Output | Edge Cases |
|------|-----------|----------------|------------|
| tool-1 | [Normal input] | [Expected result] | [3-5 edge cases to test] |
| tool-2 | [Normal input] | [Expected result] | [3-5 edge cases to test] |
Integration Tests (agent workflows):
1. **Happy path test**: [Describe complete successful flow]
- Input: [User message]
- Expected: Agent uses [tools], returns [response]
- Success criteria: [Measurable criteria]
2. **Tool failure test**: [Describe tool failure scenario]
- Input: [User message]
- Failure: [Which tool fails and how]
- Expected: Agent gracefully handles, returns [response]
- Success criteria: [No crash, appropriate fallback]
3. **Guardrail test**: [Describe guardrail trigger]
- Input: [User message that should trigger guardrail]
- Expected: Guardrail activates, agent [action]
- Success criteria: [Specific guardrail behavior]
4. **Multi-turn conversation test**: [Describe conversation flow]
- Turn 1: [User message] → [Agent response]
- Turn 2: [User message] → [Agent response]
- Turn 3: [User message] → [Agent response]
- Success criteria: [Agent maintains context, proper memory usage]
Evaluation Metrics:
| Metric | Target | How to Measure |
|--------|--------|----------------|
| Success rate | >X% | % conversations that resolve without escalation |
| Average response time | <Y sec | p95 latency from user message to response |
| Tool call accuracy | >Z% | % of tool calls that return successful results |
| User satisfaction | >W rating | Post-conversation survey (1-5 scale) |
| Cost per conversation | <$X | Track actual token usage vs estimate |
Red Team Tests (adversarial):
1. **Jailbreak attempt**: [Try to make agent ignore guardrails]
2. **Infinite loop attempt**: [Try to make agent loop forever]
3. **PII extraction**: [Try to make agent reveal sensitive data]
4. **Out-of-scope task**: [Request something agent shouldn't do]
5. **Ambiguous input**: [Intentionally vague or unclear request]
Testing Timeline:
- Unit tests: Complete before integration
- Integration tests: Complete before user testing
- Evaluation metrics: Track from beta launch onwards
- Red team tests: Run monthly or after major changes
Provide production readiness checklist:
Production Deployment Checklist
Pre-Launch:
- [ ] All tools tested with success/failure cases
- [ ] Guardrails validated with red team tests
- [ ] Cost per conversation confirmed within budget
- [ ] Response time meets p95 target (<X sec)
- [ ] Error handling tested (all tools can fail gracefully)
- [ ] Memory/persistence layer load tested
- [ ] Rate limiting configured (per user and global)
- [ ] Monitoring and logging configured
- [ ] Escalation workflow tested (human handoff works)
- [ ] Documentation complete (runbook, system prompt, tool specs)
Monitoring Setup:
| Metric | Tool | Alert Threshold | Action |
|--------|------|-----------------|--------|
| Error rate | [Sentry/Datadog] | >X% in 5 min | Page on-call engineer |
| Response time | [Monitoring tool] | p95 >Y sec | Investigate slow tools |
| Cost per conversation | [Custom dashboard] | >$Z | Review recent conversations |
| Tool call failures | [Logging] | >W% for any tool | Check external service status |
| Escalation rate | [Analytics] | >X% | Review agent quality |
Logging Requirements:
```json
{
"conversation_id": "uuid",
"timestamp": "ISO 8601",
"user_id": "hashed_user_id",
"agent_name": "string",
"turn": 1,
"action": "tool_call | response | escalation",
"tool_name": "string | null",
"latency_ms": 123,
"tokens_used": {
"input": 1234,
"output": 567
},
"cost": 0.045,
"confidence": 0.85,
"error": null
}
Incident Response:
High error rate detected
Slow response times
Cost spike
Guardrail breach
Runbook Location: [Link to detailed runbook] On-Call Rotation: [How to contact on-call engineer]
development
# Trade-Off Analysis Skill Quantifies exact trade-offs when switching between architecture options. Shows users precisely what they gain and lose when choosing Option A over Option B. ## When to Use Use this skill to help users decide between options by showing: 1. **Cost difference** — how much more/less per month? 2. **Performance difference** — how much faster/slower? 3. **Complexity difference** — how much harder to build/maintain? 4. **Scalability difference** — when does this option hit
testing
# Stage Detection Skill Detects the current project stage (concept → mvp → growth → enterprise) based on `_state.json` field presence and completeness. Used by `/architect:next-steps`, `/architect:check-state`, and roadmap commands. ## When to Use Invoke this skill when you need to determine what stage a project is at based on its state file. Stage detection drives: - Command recommendations (what to run next) - Required fields validation (what should exist at this stage) - Risk assessment (w
development
# Stack Swap Simulator Skill Estimates cost and effort to switch from one tech stack to another. Helps answer: "Can we migrate later if needed?" ## When to Use Use this skill to understand: 1. **Cost of switching stacks** — engineer weeks + downtime risk 2. **Timeline to switch** — how long is the project? 3. **Risk of switching** — what can go wrong? 4. **ROI of switching** — does it save money long-term? 5. **Backwards compatibility** — can we do a gradual migration? ## Input Provide sour
tools
# Stack Compatibility Skill Verifies that chosen technologies integrate well together. Prevents "I picked these tools and they don't work well together" regrets. ## When to Use Use this skill to verify: 1. **Chosen tools work together** — React + Node + MongoDB = good? 2. **No hidden incompatibilities** — will I hit issues in production? 3. **Team can support it** — do we have expertise for this combo? 4. **Licenses compatible** — can we use these together commercially? 5. **Performance assum