Agent Architecture

Patterns, decision frameworks, and best practices for designing AI agent systems. Use this skill when the project type is agent or hybrid.

Orchestration Patterns

When to Use Each Pattern

| Pattern | Best For | Complexity | Example Use Case | |---------|----------|-----------|------------------| | Single-turn | Simple Q&A, classification, extraction | Low | FAQ bot, sentiment analysis, data extraction | | ReAct | Tool-using tasks that need reasoning | Medium | Research assistant, data analysis agent | | Chain-of-thought | Complex reasoning without tools | Medium | Math problems, logic puzzles, decision analysis | | Multi-agent router | Multiple specialist domains | High | Customer support (billing + technical + sales agents) | | Multi-agent parallel | Independent subtasks that can run simultaneously | High | Content pipeline (research + write + review in parallel) | | Plan-and-execute | Multi-step tasks with dependencies | High | Project management bot, complex workflow automation |

Pattern Details

Single-Turn

User Message → LLM → Response

No loop, no tools, no memory beyond the conversation
Use when: the task can be completed in one LLM call
Cost: lowest (1 LLM call per interaction)

ReAct (Reason + Act)

User Message → LLM thinks → Calls tool → Observes result → LLM thinks → ... → Final response

The agent reasons about what to do, takes an action (tool call), observes the result, and repeats until done
Use when: the agent needs to gather information or take actions to answer
Cost: 2-10 LLM calls per interaction (depends on task complexity)
Most common pattern for production agents

Chain-of-Thought

User Message → LLM step 1 → LLM step 2 → ... → Final response

Multi-step reasoning without external tools
Use when: the task requires breaking down a complex problem
Cost: 1-3 LLM calls (can often be done in a single call with prompting)

Multi-Agent Router

User Message → Router Agent → Specialist Agent A or B or C → Response

A router agent classifies the request and dispatches to the right specialist
Each specialist has its own tools, system prompt, and expertise
Use when: you have distinct domains that require different knowledge
Cost: 2+ LLM calls (router + specialist)

Multi-Agent Parallel

User Message → Coordinator → [Agent A, Agent B, Agent C] (parallel) → Merger → Response

Multiple agents work on different aspects simultaneously
A merger combines results
Use when: subtasks are independent and speed matters
Cost: highest (N agents + coordinator + merger)

Plan-and-Execute

User Message → Planner creates step list → Executor runs step 1 → ... → step N → Response

A planner creates an explicit plan, then an executor works through each step
Allows for re-planning if a step fails
Use when: the task has clear sequential dependencies
Cost: 2-10+ LLM calls

Tool Design

Tool Schema Pattern

Every agent tool should follow this structure:

Tool:
  name: descriptive-verb-noun (e.g., "search-knowledge-base", "create-ticket")
  type: <agent_tool_type>
  description: What it does in one sentence
  input:
    - parameter: name
      type: string | number | boolean | object
      required: true | false
      description: What this parameter controls
  output:
    - field: name
      type: string | number | object | array
      description: What this field contains
  errors:
    - error: name
      description: When this error occurs

Tool Design Rules

One tool, one job — a tool should do exactly one thing
Descriptive names — search-products not tool1, create-user not do-thing
Validate inputs — the tool should fail gracefully with a clear error message
Return structured data — tools should return JSON objects, not prose
Include error cases — define what happens when the tool fails
Limit scope — a database query tool should not also format the results for display

Common Tool Types

| Tool | Purpose | Input | Output | |------|---------|-------|--------| | search-knowledge-base | Find relevant documents | query string, top_k | Array of {document, score} | | query-database | Execute a database query | collection, filter, fields | Array of documents | | call-api | Make an HTTP request | url, method, headers, body | Response body + status | | create-record | Insert data | collection, data | Created record ID | | update-record | Modify data | collection, id, updates | Updated record | | send-notification | Notify user/system | channel, recipient, message | Delivery status | | generate-content | Create text/images | prompt, format, constraints | Generated content | | human-handoff | Escalate to human | reason, context, priority | Handoff confirmation |

Memory Strategies

| Strategy | When to Use | Implementation | Cost Impact | |----------|-------------|----------------|-------------| | Session | Short conversations, stateless tasks | Include last N messages in context | Low — bounded context | | Persistent | Multi-session relationships, user preferences | Store in database, load relevant history | Medium — database reads | | Vector Store | Large knowledge bases, semantic retrieval | Embed documents, retrieve top-K similar | Medium-High — embedding + storage costs | | Hybrid | Complex agents needing both history and knowledge | Session memory + vector retrieval | High — multiple systems |

Memory Selection Guide

Session only: The agent doesn't need to remember past conversations. Each interaction starts fresh. Most agents start here.
Persistent: The agent needs to remember user preferences, past decisions, or ongoing projects. Store in PostgreSQL or MongoDB.
Vector store: The agent needs to search a large body of documents (help articles, product docs, codebases). Use Pinecone, Weaviate, or pgvector.
Hybrid: The agent needs both conversation history AND document search. Combine persistent + vector.

Guardrails

Every agent needs explicit guardrails. Define these in the system prompt and enforce them in the orchestration layer.

Standard Guardrails

Identity disclosure: "Always acknowledge you are an AI when asked directly"
Scope boundaries: "Only answer questions related to [domain]. For out-of-scope questions, say: I can help with X, but for Y you should contact Z"
No harmful actions: "Never delete user data, send unauthorized messages, or make financial transactions without explicit user confirmation"
Escalation triggers: "Escalate to a human when: user is frustrated, topic is sensitive, confidence is low, legal/medical/financial advice is requested"
Data handling: "Never log or store: passwords, credit card numbers, SSNs, health records"
Rate limits: "Maximum N tool calls per conversation to prevent runaway loops"

Guardrail Documentation Pattern

For each guardrail, document:

Guardrail: [Name]
  Trigger: When does this guardrail activate?
  Action: What does the agent do?
  Message: What does the agent say to the user?
  Fallback: What happens if the guardrail can't be enforced?

Token Cost Modeling

Estimation Method

System prompt tokens: Count tokens in system prompt (typically 500-2,000)
Context tokens per turn: Average input tokens = system prompt + conversation history + tool results
Output tokens per turn: Average output tokens = agent reasoning + tool calls + final response
Turns per conversation: How many agent turns per conversation (1 for single-turn, 3-8 for ReAct)
Conversations per month: Estimated usage volume

Formula

Monthly cost = conversations/mo × turns/conversation × (input_tokens × input_price + output_tokens × output_price)

Quick Reference Costs

| Pattern | Avg Turns | Avg Input Tokens | Avg Output Tokens | Cost per Conversation (Sonnet) | |---------|-----------|-----------------|-------------------|-------------------------------| | Single-turn | 1 | 1,500 | 500 | ~$0.01 | | ReAct (simple) | 3 | 3,000 | 1,500 | ~$0.03 | | ReAct (complex) | 7 | 8,000 | 3,000 | ~$0.07 | | Multi-agent router | 3 | 4,000 | 2,000 | ~$0.04 | | Plan-and-execute | 8 | 10,000 | 5,000 | ~$0.11 |

Cost Optimization Tips

Use the cheapest model that works — start with Haiku/GPT-4o-mini, upgrade only if quality is insufficient
Cache system prompts — most providers offer prompt caching for repeated system prompts
Trim conversation history — summarize old messages instead of including full history
Route by complexity — use cheap model for simple queries, expensive model for complex ones
Batch tool calls — call multiple tools in one turn when possible
Set max turn limits — prevent runaway loops that burn tokens

LLM Provider Selection

| Provider | Best For | Strengths | Weaknesses | |----------|----------|-----------|------------| | Anthropic (Claude) | Tool use, long context, safety | Best tool use, 200K context, strong guardrails | Higher cost at Opus tier | | OpenAI (GPT-4o) | General purpose, ecosystem | Huge ecosystem, function calling, vision | Slightly weaker at complex reasoning vs Opus | | Google (Gemini) | Long context, multimodal | 1M+ context window, good vision | Smaller ecosystem | | Mistral | EU data residency, multilingual | Strong European language support, fast | Smaller model selection | | Groq | Speed-critical applications | Ultra-fast inference | Limited model selection | | Local (Ollama) | Privacy, offline, cost control | No API costs, full data control | Requires GPU, lower quality |

Selection Decision Tree

Need best tool use? → Anthropic Claude
Need largest ecosystem / most libraries? → OpenAI
Need longest context window? → Google Gemini
Need EU data residency? → Mistral
Need lowest latency? → Groq
Need full data privacy? → Local (Ollama)
Need cheapest at scale? → Compare Haiku vs GPT-4o-mini vs Gemini Flash

Implementation Guidance

When designing agent architecture, provide comprehensive specifications:

1. Agent Flow Diagram (REQUIRED)

Always provide a visual flow diagram showing:

User input entry point
Agent reasoning steps
Tool calls and their sequence
Decision points and branching logic
Final output format

Format (use Mermaid):

graph TD
    A[User Message] --> B{Router Agent}
    B -->|Technical Question| C[Technical Agent]
    B -->|Billing Question| D[Billing Agent]
    B -->|General| E[General Agent]

    C --> F[Search Knowledge Base]
    F --> G{Found Answer?}
    G -->|Yes| H[Format Response]
    G -->|No| I[Escalate to Human]

    D --> J[Query Billing DB]
    J --> K[Format Invoice Data]

    E --> L[Generate Response]

    H --> M[Return to User]
    I --> M
    K --> M
    L --> M

2. Detailed Agent Specification (REQUIRED for each agent)

For each agent in the system, provide:

Agent: [Agent Name]

Purpose:
[1-2 sentences describing what this agent does and when it's invoked]

Pattern:
[Single-turn / ReAct / Multi-agent router / etc.]

System Prompt:

You are a [role] agent. Your job is to [specific task].

Guidelines:

[Specific guideline 1]
[Specific guideline 2]
[Specific guideline 3]

Available tools:

tool-name-1: [when to use it]
tool-name-2: [when to use it]

Output format: [Expected output structure]

Guardrails:

[Specific restriction 1]
[Specific restriction 2]


Tools Available:
| Tool Name | Purpose | When to Use |
|-----------|---------|-------------|
| tool-1 | [Brief description] | [Trigger condition] |
| tool-2 | [Brief description] | [Trigger condition] |

Memory Configuration:
- Type: [Session / Persistent / Vector / Hybrid]
- Storage: [Where data is stored]
- Retention: [How long data is kept]
- Load strategy: [When/how to load memory]

Expected Input:
```json
{
  "user_message": "string",
  "conversation_id": "string",
  "user_context": {
    "user_id": "string",
    "preferences": {}
  }
}

Expected Output:

{
  "response": "string",
  "confidence": 0.0-1.0,
  "sources": ["source1", "source2"],
  "needs_escalation": boolean,
  "metadata": {}
}

Error Handling: | Error Type | Trigger | Action | User Message | |------------|---------|--------|--------------| | Tool failure | API returns 500 | Retry 3x with backoff | "I'm having trouble accessing [system]. Please try again." | | Low confidence | confidence < 0.7 | Escalate to human | "I'm not confident in my answer. Let me connect you with a specialist." | | Out of scope | Topic outside domain | Redirect | "I specialize in [domain]. For [other topic], please contact [resource]." |

Token Estimate:

System prompt: ~X tokens
Average input: ~Y tokens per turn
Average output: ~Z tokens per turn
Expected turns: A-B
Cost per conversation: $X - $Y (using [model name])

Performance Targets:

Response time: <X seconds (p95)
Tool call latency: <Y seconds per tool
Success rate: >Z% conversations resolved without escalation


### 3. Tool Implementation Specs (REQUIRED for each tool)

**For each custom tool, provide complete specification:**

```markdown
Tool: [tool-name]

Type: [API call / Database query / Function / External service]

Description:
[2-3 sentences on what this tool does and why it exists]

Input Schema:
```typescript
interface ToolInput {
  param1: string;  // Description of param1
  param2?: number; // Optional: Description of param2
  param3: {        // Nested object
    field1: string;
    field2: boolean;
  };
}

Output Schema:

interface ToolOutput {
  success: boolean;
  data?: {
    // Success case structure
  };
  error?: {
    code: string;
    message: string;
  };
}

Implementation: [Detailed description of how the tool works internally]

External Dependencies:

Service: [Name of external service, if any]
API endpoint: [Specific endpoint URL pattern]
Authentication: [How to authenticate]
Rate limits: [X requests per Y time period]

Error Cases: | Error | Trigger | Code | Message | Retry? | |-------|---------|------|---------|--------| | [Error name] | [When it happens] | ERROR_CODE | "User-friendly message" | Yes/No |

Example Usage:

Input:
{
  "param1": "example value",
  "param3": {
    "field1": "test",
    "field2": true
  }
}

Output (Success):
{
  "success": true,
  "data": {
    "result": "example result"
  }
}

Output (Failure):
{
  "success": false,
  "error": {
    "code": "INVALID_INPUT",
    "message": "param1 must not be empty"
  }
}

Performance:

Average latency: X ms
p95 latency: Y ms
Timeout: Z ms
Cost: $X per call (if applicable)

Testing Notes:

Test case 1: [Description of what to test]
Test case 2: [Description of what to test]
Edge case 1: [Description of edge case]


### 4. Memory Architecture Specification (REQUIRED if using memory)

**Provide detailed memory implementation:**

```markdown
Memory Strategy: [Session / Persistent / Vector / Hybrid]

Storage Backend:
- Technology: [PostgreSQL / MongoDB / Pinecone / Redis]
- Connection: [How to connect]
- Schema: [Data structure]

Session Memory (if applicable):
- Window size: Last N messages
- Summarization: After M messages, summarize older messages
- Retention: Until session ends (X minutes of inactivity)

Persistent Memory (if applicable):
Database Schema:
```sql
CREATE TABLE conversations (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  created_at TIMESTAMP,
  last_message_at TIMESTAMP,
  summary TEXT,
  metadata JSONB
);

CREATE TABLE messages (
  id UUID PRIMARY KEY,
  conversation_id UUID REFERENCES conversations(id),
  role VARCHAR(50), -- 'user' or 'assistant'
  content TEXT,
  tokens INTEGER,
  created_at TIMESTAMP
);

Load Strategy:

When conversation starts: [What to load]
During conversation: [What to load when]
Summarization trigger: [When to summarize old messages]

Vector Memory (if applicable):

Embedding model: [Model name, dimension]
Vector database: [Pinecone / Weaviate / pgvector]
Index configuration: [Settings]
Retrieval strategy: Top-K with threshold
- K: [number]
- Similarity threshold: [0.0-1.0]
- Re-ranking: [Yes/No, if yes how]

Chunk Strategy (for document embeddings):

Chunk size: X tokens
Overlap: Y tokens
Metadata attached: [List fields included with each chunk]

Cost Impact:

Embedding cost: $X per 1M tokens
Storage cost: $Y per GB per month
Query cost: $Z per 1K queries
Expected monthly cost: $A - $B based on C users

Privacy Considerations:

PII handling: [How PII is handled]
Data retention: [How long data is kept]
Deletion process: [How users can delete their data]
Encryption: [At rest / in transit]


### 5. Guardrail Implementation (REQUIRED)

**For each guardrail, provide enforceable specification:**

```markdown
Guardrail: [Name]

Priority: [Critical / High / Medium / Low]

Trigger:
[Specific, measurable condition that activates this guardrail]

Detection Method:
[How the system detects the trigger — pattern matching, classifier, heuristic, etc.]

Action:
1. [First action taken by system]
2. [Second action taken by system]
3. [Final outcome]

User Message:

[Exact message shown to user when guardrail activates]


Bypass Conditions:
[When is it OK to bypass this guardrail? Usually: "Never" or very specific exception]

Logging:
- Log event: Yes/No
- Alert team: Yes/No
- Include in analytics: Yes/No

Example:
User input: "[Example input that triggers guardrail]"
System detects: [What pattern/condition is detected]
System action: [What the system does]
User sees: "[Message shown]"

Testing:
- Test case 1: [Input that SHOULD trigger guardrail]
- Test case 2: [Input that should NOT trigger guardrail]
- Edge case: [Tricky input to test boundary]

6. Cost Breakdown and Optimization (ALWAYS include)

Provide detailed token and cost analysis:

Agent Cost Analysis

Model: [Selected LLM model and tier]
Pricing: Input $X / MTok, Output $Y / MTok

Token Breakdown (per conversation):
| Component | Tokens | Cost |
|-----------|--------|------|
| System prompt | X | $X.XX |
| Average user message | X | $X.XX |
| Average agent response | X | $X.XX |
| Tool results | X | $X.XX |
| Context/memory | X | $X.XX |
| **Total per turn** | **X** | **$X.XX** |

Turns per conversation:
- Simple queries: A turns = $X
- Medium complexity: B turns = $Y
- Complex queries: C turns = $Z
- **Average: D turns = $W**

Monthly projection:
| Usage Level | Conversations/mo | Cost/mo |
|-------------|:----------------:|:-------:|
| Low | 1,000 | $X |
| Medium | 10,000 | $Y |
| High | 100,000 | $Z |

Cost Optimization Strategies:

✅ Strategy #1: [Name]
- Current cost: $X per conversation
- Optimized cost: $Y per conversation
- Savings: Z%
- How: [Specific implementation]
- Trade-off: [What you lose, if anything]

✅ Strategy #2: [Name]
- Current cost: $X per conversation
- Optimized cost: $Y per conversation
- Savings: Z%
- How: [Specific implementation]
- Trade-off: [What you lose, if anything]

[Continue for 3-5 strategies...]

Recommended optimization path:
1. [Start with this optimization — easiest/highest impact]
2. [Then this one]
3. [Finally this one if needed]

Cost with all optimizations:
- Before: $X per conversation
- After: $Y per conversation
- Savings: Z% ($W/month at medium usage)

7. Testing Strategy (REQUIRED)

Provide comprehensive testing plan:

Agent Testing Plan

Unit Tests (per tool):
| Tool | Test Case | Expected Output | Edge Cases |
|------|-----------|----------------|------------|
| tool-1 | [Normal input] | [Expected result] | [3-5 edge cases to test] |
| tool-2 | [Normal input] | [Expected result] | [3-5 edge cases to test] |

Integration Tests (agent workflows):
1. **Happy path test**: [Describe complete successful flow]
   - Input: [User message]
   - Expected: Agent uses [tools], returns [response]
   - Success criteria: [Measurable criteria]

2. **Tool failure test**: [Describe tool failure scenario]
   - Input: [User message]
   - Failure: [Which tool fails and how]
   - Expected: Agent gracefully handles, returns [response]
   - Success criteria: [No crash, appropriate fallback]

3. **Guardrail test**: [Describe guardrail trigger]
   - Input: [User message that should trigger guardrail]
   - Expected: Guardrail activates, agent [action]
   - Success criteria: [Specific guardrail behavior]

4. **Multi-turn conversation test**: [Describe conversation flow]
   - Turn 1: [User message] → [Agent response]
   - Turn 2: [User message] → [Agent response]
   - Turn 3: [User message] → [Agent response]
   - Success criteria: [Agent maintains context, proper memory usage]

Evaluation Metrics:
| Metric | Target | How to Measure |
|--------|--------|----------------|
| Success rate | >X% | % conversations that resolve without escalation |
| Average response time | <Y sec | p95 latency from user message to response |
| Tool call accuracy | >Z% | % of tool calls that return successful results |
| User satisfaction | >W rating | Post-conversation survey (1-5 scale) |
| Cost per conversation | <$X | Track actual token usage vs estimate |

Red Team Tests (adversarial):
1. **Jailbreak attempt**: [Try to make agent ignore guardrails]
2. **Infinite loop attempt**: [Try to make agent loop forever]
3. **PII extraction**: [Try to make agent reveal sensitive data]
4. **Out-of-scope task**: [Request something agent shouldn't do]
5. **Ambiguous input**: [Intentionally vague or unclear request]

Testing Timeline:
- Unit tests: Complete before integration
- Integration tests: Complete before user testing
- Evaluation metrics: Track from beta launch onwards
- Red team tests: Run monthly or after major changes

8. Deployment and Monitoring (REQUIRED)

Provide production readiness checklist:

Production Deployment Checklist

Pre-Launch:
- [ ] All tools tested with success/failure cases
- [ ] Guardrails validated with red team tests
- [ ] Cost per conversation confirmed within budget
- [ ] Response time meets p95 target (<X sec)
- [ ] Error handling tested (all tools can fail gracefully)
- [ ] Memory/persistence layer load tested
- [ ] Rate limiting configured (per user and global)
- [ ] Monitoring and logging configured
- [ ] Escalation workflow tested (human handoff works)
- [ ] Documentation complete (runbook, system prompt, tool specs)

Monitoring Setup:
| Metric | Tool | Alert Threshold | Action |
|--------|------|-----------------|--------|
| Error rate | [Sentry/Datadog] | >X% in 5 min | Page on-call engineer |
| Response time | [Monitoring tool] | p95 >Y sec | Investigate slow tools |
| Cost per conversation | [Custom dashboard] | >$Z | Review recent conversations |
| Tool call failures | [Logging] | >W% for any tool | Check external service status |
| Escalation rate | [Analytics] | >X% | Review agent quality |

Logging Requirements:
```json
{
  "conversation_id": "uuid",
  "timestamp": "ISO 8601",
  "user_id": "hashed_user_id",
  "agent_name": "string",
  "turn": 1,
  "action": "tool_call | response | escalation",
  "tool_name": "string | null",
  "latency_ms": 123,
  "tokens_used": {
    "input": 1234,
    "output": 567
  },
  "cost": 0.045,
  "confidence": 0.85,
  "error": null
}

Incident Response:

High error rate detected
- Check: External service status (API status pages)
- Action: Switch to degraded mode if service down
- Communicate: Update status page for users
Slow response times
- Check: Which tool is slow (latency logs)
- Action: Increase timeout, implement caching, or disable tool temporarily
- Communicate: Notify users of slower responses if needed
Cost spike
- Check: Recent conversations for anomalies (infinite loops, very long contexts)
- Action: Reduce max turns, implement stricter rate limiting
- Communicate: Internal alert to team
Guardrail breach
- Check: Logs for how the breach occurred
- Action: Update system prompt and guardrail logic
- Communicate: Security team if PII or harmful content involved

Runbook Location: [Link to detailed runbook] On-Call Rotation: [How to contact on-call engineer]

Agent Architecture

Patterns, decision frameworks, and best practices for designing AI agent systems. Use this skill when the project type is agent or hybrid.

Orchestration Patterns

When to Use Each Pattern

Pattern Details

Single-Turn

User Message → LLM → Response

No loop, no tools, no memory beyond the conversation
Use when: the task can be completed in one LLM call
Cost: lowest (1 LLM call per interaction)

ReAct (Reason + Act)

User Message → LLM thinks → Calls tool → Observes result → LLM thinks → ... → Final response

The agent reasons about what to do, takes an action (tool call), observes the result, and repeats until done
Use when: the agent needs to gather information or take actions to answer
Cost: 2-10 LLM calls per interaction (depends on task complexity)
Most common pattern for production agents

Chain-of-Thought

User Message → LLM step 1 → LLM step 2 → ... → Final response

Multi-step reasoning without external tools
Use when: the task requires breaking down a complex problem
Cost: 1-3 LLM calls (can often be done in a single call with prompting)

Multi-Agent Router

User Message → Router Agent → Specialist Agent A or B or C → Response

A router agent classifies the request and dispatches to the right specialist
Each specialist has its own tools, system prompt, and expertise
Use when: you have distinct domains that require different knowledge
Cost: 2+ LLM calls (router + specialist)

Multi-Agent Parallel

User Message → Coordinator → [Agent A, Agent B, Agent C] (parallel) → Merger → Response

Multiple agents work on different aspects simultaneously
A merger combines results
Use when: subtasks are independent and speed matters
Cost: highest (N agents + coordinator + merger)

Plan-and-Execute

User Message → Planner creates step list → Executor runs step 1 → ... → step N → Response

A planner creates an explicit plan, then an executor works through each step
Allows for re-planning if a step fails
Use when: the task has clear sequential dependencies
Cost: 2-10+ LLM calls

Tool Design

Tool Schema Pattern

Every agent tool should follow this structure:

Tool:
  name: descriptive-verb-noun (e.g., "search-knowledge-base", "create-ticket")
  type: <agent_tool_type>
  description: What it does in one sentence
  input:
    - parameter: name
      type: string | number | boolean | object
      required: true | false
      description: What this parameter controls
  output:
    - field: name
      type: string | number | object | array
      description: What this field contains
  errors:
    - error: name
      description: When this error occurs

Tool Design Rules

One tool, one job — a tool should do exactly one thing
Descriptive names — search-products not tool1, create-user not do-thing
Validate inputs — the tool should fail gracefully with a clear error message
Return structured data — tools should return JSON objects, not prose
Include error cases — define what happens when the tool fails
Limit scope — a database query tool should not also format the results for display

Common Tool Types

Memory Strategies

Memory Selection Guide

Session only: The agent doesn't need to remember past conversations. Each interaction starts fresh. Most agents start here.
Persistent: The agent needs to remember user preferences, past decisions, or ongoing projects. Store in PostgreSQL or MongoDB.
Vector store: The agent needs to search a large body of documents (help articles, product docs, codebases). Use Pinecone, Weaviate, or pgvector.
Hybrid: The agent needs both conversation history AND document search. Combine persistent + vector.

Guardrails

Every agent needs explicit guardrails. Define these in the system prompt and enforce them in the orchestration layer.

Standard Guardrails

Identity disclosure: "Always acknowledge you are an AI when asked directly"
Scope boundaries: "Only answer questions related to [domain]. For out-of-scope questions, say: I can help with X, but for Y you should contact Z"
No harmful actions: "Never delete user data, send unauthorized messages, or make financial transactions without explicit user confirmation"
Escalation triggers: "Escalate to a human when: user is frustrated, topic is sensitive, confidence is low, legal/medical/financial advice is requested"
Data handling: "Never log or store: passwords, credit card numbers, SSNs, health records"
Rate limits: "Maximum N tool calls per conversation to prevent runaway loops"

Guardrail Documentation Pattern

For each guardrail, document:

Guardrail: [Name]
  Trigger: When does this guardrail activate?
  Action: What does the agent do?
  Message: What does the agent say to the user?
  Fallback: What happens if the guardrail can't be enforced?

Token Cost Modeling

Estimation Method

System prompt tokens: Count tokens in system prompt (typically 500-2,000)
Context tokens per turn: Average input tokens = system prompt + conversation history + tool results
Output tokens per turn: Average output tokens = agent reasoning + tool calls + final response
Turns per conversation: How many agent turns per conversation (1 for single-turn, 3-8 for ReAct)
Conversations per month: Estimated usage volume

Formula

Monthly cost = conversations/mo × turns/conversation × (input_tokens × input_price + output_tokens × output_price)

Quick Reference Costs

Cost Optimization Tips

Use the cheapest model that works — start with Haiku/GPT-4o-mini, upgrade only if quality is insufficient
Cache system prompts — most providers offer prompt caching for repeated system prompts
Trim conversation history — summarize old messages instead of including full history
Route by complexity — use cheap model for simple queries, expensive model for complex ones
Batch tool calls — call multiple tools in one turn when possible
Set max turn limits — prevent runaway loops that burn tokens

LLM Provider Selection

Selection Decision Tree

Need best tool use? → Anthropic Claude
Need largest ecosystem / most libraries? → OpenAI
Need longest context window? → Google Gemini
Need EU data residency? → Mistral
Need lowest latency? → Groq
Need full data privacy? → Local (Ollama)
Need cheapest at scale? → Compare Haiku vs GPT-4o-mini vs Gemini Flash

Implementation Guidance

When designing agent architecture, provide comprehensive specifications:

1. Agent Flow Diagram (REQUIRED)

Always provide a visual flow diagram showing:

User input entry point
Agent reasoning steps
Tool calls and their sequence
Decision points and branching logic
Final output format

Format (use Mermaid):

graph TD
    A[User Message] --> B{Router Agent}
    B -->|Technical Question| C[Technical Agent]
    B -->|Billing Question| D[Billing Agent]
    B -->|General| E[General Agent]

    C --> F[Search Knowledge Base]
    F --> G{Found Answer?}
    G -->|Yes| H[Format Response]
    G -->|No| I[Escalate to Human]

    D --> J[Query Billing DB]
    J --> K[Format Invoice Data]

    E --> L[Generate Response]

    H --> M[Return to User]
    I --> M
    K --> M
    L --> M

2. Detailed Agent Specification (REQUIRED for each agent)

For each agent in the system, provide:

Agent: [Agent Name]

Purpose:
[1-2 sentences describing what this agent does and when it's invoked]

Pattern:
[Single-turn / ReAct / Multi-agent router / etc.]

System Prompt:

You are a [role] agent. Your job is to [specific task].

Guidelines:

[Specific guideline 1]
[Specific guideline 2]
[Specific guideline 3]

Available tools:

tool-name-1: [when to use it]
tool-name-2: [when to use it]

Output format: [Expected output structure]

Guardrails:

[Specific restriction 1]
[Specific restriction 2]


Tools Available:
| Tool Name | Purpose | When to Use |
|-----------|---------|-------------|
| tool-1 | [Brief description] | [Trigger condition] |
| tool-2 | [Brief description] | [Trigger condition] |

Memory Configuration:
- Type: [Session / Persistent / Vector / Hybrid]
- Storage: [Where data is stored]
- Retention: [How long data is kept]
- Load strategy: [When/how to load memory]

Expected Input:
```json
{
  "user_message": "string",
  "conversation_id": "string",
  "user_context": {
    "user_id": "string",
    "preferences": {}
  }
}

Expected Output:

{
  "response": "string",
  "confidence": 0.0-1.0,
  "sources": ["source1", "source2"],
  "needs_escalation": boolean,
  "metadata": {}
}

Token Estimate:

System prompt: ~X tokens
Average input: ~Y tokens per turn
Average output: ~Z tokens per turn
Expected turns: A-B
Cost per conversation: $X - $Y (using [model name])

Performance Targets:

Response time: <X seconds (p95)
Tool call latency: <Y seconds per tool
Success rate: >Z% conversations resolved without escalation


### 3. Tool Implementation Specs (REQUIRED for each tool)

**For each custom tool, provide complete specification:**

```markdown
Tool: [tool-name]

Type: [API call / Database query / Function / External service]

Description:
[2-3 sentences on what this tool does and why it exists]

Input Schema:
```typescript
interface ToolInput {
  param1: string;  // Description of param1
  param2?: number; // Optional: Description of param2
  param3: {        // Nested object
    field1: string;
    field2: boolean;
  };
}

Output Schema:

interface ToolOutput {
  success: boolean;
  data?: {
    // Success case structure
  };
  error?: {
    code: string;
    message: string;
  };
}

Implementation: [Detailed description of how the tool works internally]

External Dependencies:

Service: [Name of external service, if any]
API endpoint: [Specific endpoint URL pattern]
Authentication: [How to authenticate]
Rate limits: [X requests per Y time period]

Error Cases: | Error | Trigger | Code | Message | Retry? | |-------|---------|------|---------|--------| | [Error name] | [When it happens] | ERROR_CODE | "User-friendly message" | Yes/No |

Example Usage:

Input:
{
  "param1": "example value",
  "param3": {
    "field1": "test",
    "field2": true
  }
}

Output (Success):
{
  "success": true,
  "data": {
    "result": "example result"
  }
}

Output (Failure):
{
  "success": false,
  "error": {
    "code": "INVALID_INPUT",
    "message": "param1 must not be empty"
  }
}

Performance:

Average latency: X ms
p95 latency: Y ms
Timeout: Z ms
Cost: $X per call (if applicable)

Testing Notes:

Test case 1: [Description of what to test]
Test case 2: [Description of what to test]
Edge case 1: [Description of edge case]


### 4. Memory Architecture Specification (REQUIRED if using memory)

**Provide detailed memory implementation:**

```markdown
Memory Strategy: [Session / Persistent / Vector / Hybrid]

Storage Backend:
- Technology: [PostgreSQL / MongoDB / Pinecone / Redis]
- Connection: [How to connect]
- Schema: [Data structure]

Session Memory (if applicable):
- Window size: Last N messages
- Summarization: After M messages, summarize older messages
- Retention: Until session ends (X minutes of inactivity)

Persistent Memory (if applicable):
Database Schema:
```sql
CREATE TABLE conversations (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  created_at TIMESTAMP,
  last_message_at TIMESTAMP,
  summary TEXT,
  metadata JSONB
);

CREATE TABLE messages (
  id UUID PRIMARY KEY,
  conversation_id UUID REFERENCES conversations(id),
  role VARCHAR(50), -- 'user' or 'assistant'
  content TEXT,
  tokens INTEGER,
  created_at TIMESTAMP
);

Load Strategy:

When conversation starts: [What to load]
During conversation: [What to load when]
Summarization trigger: [When to summarize old messages]

Vector Memory (if applicable):

Embedding model: [Model name, dimension]
Vector database: [Pinecone / Weaviate / pgvector]
Index configuration: [Settings]
Retrieval strategy: Top-K with threshold
- K: [number]
- Similarity threshold: [0.0-1.0]
- Re-ranking: [Yes/No, if yes how]

Chunk Strategy (for document embeddings):

Chunk size: X tokens
Overlap: Y tokens
Metadata attached: [List fields included with each chunk]

Cost Impact:

Embedding cost: $X per 1M tokens
Storage cost: $Y per GB per month
Query cost: $Z per 1K queries
Expected monthly cost: $A - $B based on C users

Privacy Considerations:

PII handling: [How PII is handled]
Data retention: [How long data is kept]
Deletion process: [How users can delete their data]
Encryption: [At rest / in transit]


### 5. Guardrail Implementation (REQUIRED)

**For each guardrail, provide enforceable specification:**

```markdown
Guardrail: [Name]

Priority: [Critical / High / Medium / Low]

Trigger:
[Specific, measurable condition that activates this guardrail]

Detection Method:
[How the system detects the trigger — pattern matching, classifier, heuristic, etc.]

Action:
1. [First action taken by system]
2. [Second action taken by system]
3. [Final outcome]

User Message:

[Exact message shown to user when guardrail activates]


Bypass Conditions:
[When is it OK to bypass this guardrail? Usually: "Never" or very specific exception]

Logging:
- Log event: Yes/No
- Alert team: Yes/No
- Include in analytics: Yes/No

Example:
User input: "[Example input that triggers guardrail]"
System detects: [What pattern/condition is detected]
System action: [What the system does]
User sees: "[Message shown]"

Testing:
- Test case 1: [Input that SHOULD trigger guardrail]
- Test case 2: [Input that should NOT trigger guardrail]
- Edge case: [Tricky input to test boundary]

6. Cost Breakdown and Optimization (ALWAYS include)

Provide detailed token and cost analysis:

Agent Cost Analysis

Model: [Selected LLM model and tier]
Pricing: Input $X / MTok, Output $Y / MTok

Token Breakdown (per conversation):
| Component | Tokens | Cost |
|-----------|--------|------|
| System prompt | X | $X.XX |
| Average user message | X | $X.XX |
| Average agent response | X | $X.XX |
| Tool results | X | $X.XX |
| Context/memory | X | $X.XX |
| **Total per turn** | **X** | **$X.XX** |

Turns per conversation:
- Simple queries: A turns = $X
- Medium complexity: B turns = $Y
- Complex queries: C turns = $Z
- **Average: D turns = $W**

Monthly projection:
| Usage Level | Conversations/mo | Cost/mo |
|-------------|:----------------:|:-------:|
| Low | 1,000 | $X |
| Medium | 10,000 | $Y |
| High | 100,000 | $Z |

Cost Optimization Strategies:

✅ Strategy #1: [Name]
- Current cost: $X per conversation
- Optimized cost: $Y per conversation
- Savings: Z%
- How: [Specific implementation]
- Trade-off: [What you lose, if anything]

✅ Strategy #2: [Name]
- Current cost: $X per conversation
- Optimized cost: $Y per conversation
- Savings: Z%
- How: [Specific implementation]
- Trade-off: [What you lose, if anything]

[Continue for 3-5 strategies...]

Recommended optimization path:
1. [Start with this optimization — easiest/highest impact]
2. [Then this one]
3. [Finally this one if needed]

Cost with all optimizations:
- Before: $X per conversation
- After: $Y per conversation
- Savings: Z% ($W/month at medium usage)

7. Testing Strategy (REQUIRED)

Provide comprehensive testing plan:

Agent Testing Plan

Unit Tests (per tool):
| Tool | Test Case | Expected Output | Edge Cases |
|------|-----------|----------------|------------|
| tool-1 | [Normal input] | [Expected result] | [3-5 edge cases to test] |
| tool-2 | [Normal input] | [Expected result] | [3-5 edge cases to test] |

Integration Tests (agent workflows):
1. **Happy path test**: [Describe complete successful flow]
   - Input: [User message]
   - Expected: Agent uses [tools], returns [response]
   - Success criteria: [Measurable criteria]

2. **Tool failure test**: [Describe tool failure scenario]
   - Input: [User message]
   - Failure: [Which tool fails and how]
   - Expected: Agent gracefully handles, returns [response]
   - Success criteria: [No crash, appropriate fallback]

3. **Guardrail test**: [Describe guardrail trigger]
   - Input: [User message that should trigger guardrail]
   - Expected: Guardrail activates, agent [action]
   - Success criteria: [Specific guardrail behavior]

4. **Multi-turn conversation test**: [Describe conversation flow]
   - Turn 1: [User message] → [Agent response]
   - Turn 2: [User message] → [Agent response]
   - Turn 3: [User message] → [Agent response]
   - Success criteria: [Agent maintains context, proper memory usage]

Evaluation Metrics:
| Metric | Target | How to Measure |
|--------|--------|----------------|
| Success rate | >X% | % conversations that resolve without escalation |
| Average response time | <Y sec | p95 latency from user message to response |
| Tool call accuracy | >Z% | % of tool calls that return successful results |
| User satisfaction | >W rating | Post-conversation survey (1-5 scale) |
| Cost per conversation | <$X | Track actual token usage vs estimate |

Red Team Tests (adversarial):
1. **Jailbreak attempt**: [Try to make agent ignore guardrails]
2. **Infinite loop attempt**: [Try to make agent loop forever]
3. **PII extraction**: [Try to make agent reveal sensitive data]
4. **Out-of-scope task**: [Request something agent shouldn't do]
5. **Ambiguous input**: [Intentionally vague or unclear request]

Testing Timeline:
- Unit tests: Complete before integration
- Integration tests: Complete before user testing
- Evaluation metrics: Track from beta launch onwards
- Red team tests: Run monthly or after major changes

8. Deployment and Monitoring (REQUIRED)

Provide production readiness checklist:

Production Deployment Checklist

Pre-Launch:
- [ ] All tools tested with success/failure cases
- [ ] Guardrails validated with red team tests
- [ ] Cost per conversation confirmed within budget
- [ ] Response time meets p95 target (<X sec)
- [ ] Error handling tested (all tools can fail gracefully)
- [ ] Memory/persistence layer load tested
- [ ] Rate limiting configured (per user and global)
- [ ] Monitoring and logging configured
- [ ] Escalation workflow tested (human handoff works)
- [ ] Documentation complete (runbook, system prompt, tool specs)

Monitoring Setup:
| Metric | Tool | Alert Threshold | Action |
|--------|------|-----------------|--------|
| Error rate | [Sentry/Datadog] | >X% in 5 min | Page on-call engineer |
| Response time | [Monitoring tool] | p95 >Y sec | Investigate slow tools |
| Cost per conversation | [Custom dashboard] | >$Z | Review recent conversations |
| Tool call failures | [Logging] | >W% for any tool | Check external service status |
| Escalation rate | [Analytics] | >X% | Review agent quality |

Logging Requirements:
```json
{
  "conversation_id": "uuid",
  "timestamp": "ISO 8601",
  "user_id": "hashed_user_id",
  "agent_name": "string",
  "turn": 1,
  "action": "tool_call | response | escalation",
  "tool_name": "string | null",
  "latency_ms": 123,
  "tokens_used": {
    "input": 1234,
    "output": 567
  },
  "cost": 0.045,
  "confidence": 0.85,
  "error": null
}

Incident Response:

High error rate detected
- Check: External service status (API status pages)
- Action: Switch to degraded mode if service down
- Communicate: Update status page for users
Slow response times
- Check: Which tool is slow (latency logs)
- Action: Increase timeout, implement caching, or disable tool temporarily
- Communicate: Notify users of slower responses if needed
Cost spike
- Check: Recent conversations for anomalies (infinite loops, very long contexts)
- Action: Reduce max turns, implement stricter rate limiting
- Communicate: Internal alert to team
Guardrail breach
- Check: Logs for how the breach occurred
- Action: Update system prompt and guardrail logic
- Communicate: Security team if PII or harmful content involved

Runbook Location: [Link to detailed runbook] On-Call Rotation: [How to contact on-call engineer]

Adoption

navraj007in/agent-architecture

$ install --global

Security Scan Results

SKILL.md

Agent Architecture

Orchestration Patterns

When to Use Each Pattern

Pattern Details

Single-Turn

ReAct (Reason + Act)

Chain-of-Thought

Multi-Agent Router

Multi-Agent Parallel

Plan-and-Execute

Tool Design

Tool Schema Pattern

Tool Design Rules

Common Tool Types

Memory Strategies

Memory Selection Guide

Guardrails

Standard Guardrails

Guardrail Documentation Pattern

Token Cost Modeling

Estimation Method

Formula

Quick Reference Costs

Cost Optimization Tips

LLM Provider Selection

Selection Decision Tree

Implementation Guidance

1. Agent Flow Diagram (REQUIRED)

2. Detailed Agent Specification (REQUIRED for each agent)

6. Cost Breakdown and Optimization (ALWAYS include)

7. Testing Strategy (REQUIRED)

8. Deployment and Monitoring (REQUIRED)

Related Skills

navraj007in/skills/tradeoff-analysis

navraj007in/skills/stage-detection

navraj007in/skills/stack-swap-simulator

navraj007in/skills/stack-compatibility

navraj007in/agent-architecture

$ install --global

Security Scan Results

SKILL.md

Agent Architecture

Orchestration Patterns

When to Use Each Pattern

Pattern Details

Single-Turn

ReAct (Reason + Act)

Chain-of-Thought

Multi-Agent Router

Multi-Agent Parallel

Plan-and-Execute

Tool Design

Tool Schema Pattern

Tool Design Rules

Common Tool Types

Memory Strategies

Memory Selection Guide

Guardrails

Standard Guardrails

Guardrail Documentation Pattern

Token Cost Modeling

Estimation Method

Formula

Quick Reference Costs

Cost Optimization Tips

LLM Provider Selection

Selection Decision Tree

Implementation Guidance

1. Agent Flow Diagram (REQUIRED)

2. Detailed Agent Specification (REQUIRED for each agent)

6. Cost Breakdown and Optimization (ALWAYS include)

7. Testing Strategy (REQUIRED)

8. Deployment and Monitoring (REQUIRED)

Related Skills

navraj007in/skills/tradeoff-analysis