skills/agent-review/SKILL.md
Review AI agent implementations for best practices in architecture, folder structure, design patterns, error handling, and observability. Use when auditing agent codebases or designing new agent systems.
npx skillsauth add igbuend/grimbard agent-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Review AI agent implementations for architectural best practices.
Target: $ARGUMENTS (path to agent project or codebase)
agent-project/
├── src/
│ ├── agents/ # Agent definitions
│ │ ├── base.py # Base agent class
│ │ ├── planner.py # Planning agent
│ │ └── executor.py # Execution agent
│ ├── tools/ # Tool implementations
│ │ ├── __init__.py
│ │ ├── base.py # Tool base class/interface
│ │ ├── search.py # Search tool
│ │ └── code.py # Code execution tool
│ ├── memory/ # Memory/state management
│ │ ├── short_term.py # Conversation context
│ │ ├── long_term.py # Persistent storage
│ │ └── vector_store.py # Embeddings/RAG
│ ├── prompts/ # Prompt templates
│ │ ├── system.py # System prompts
│ │ └── templates/ # Jinja/string templates
│ ├── orchestration/ # Multi-agent coordination
│ │ ├── router.py # Request routing
│ │ └── workflow.py # Agent workflows
│ ├── models/ # Data models/schemas
│ │ ├── messages.py # Message types
│ │ └── state.py # State schemas
│ └── utils/ # Shared utilities
│ ├── logging.py # Structured logging
│ └── retry.py # Retry logic
├── config/ # Configuration
│ ├── default.yaml # Default settings
│ └── prompts/ # External prompt files
├── tests/ # Test suite
│ ├── unit/
│ ├── integration/
│ └── fixtures/ # Test data
└── scripts/ # CLI/automation
| Component | Required | Check | |-----------|----------|-------| | Agent definitions separated | Yes | [ ] | | Tools in dedicated module | Yes | [ ] | | Prompts externalized | Recommended | [ ] | | Configuration separated | Yes | [ ] | | Tests present | Yes | [ ] | | Clear separation of concerns | Yes | [ ] |
Required Patterns:
BAD:
def search(query):
return requests.get(f"https://api.com?q={query}").json()
GOOD:
class SearchTool(BaseTool):
name = "search"
description = "Search the web for information"
class InputSchema(BaseModel):
query: str = Field(..., min_length=1, max_length=500)
def execute(self, query: str) -> ToolResult:
try:
response = self.client.search(query, timeout=10)
return ToolResult(success=True, data=response)
except Timeout:
return ToolResult(success=False, error="Search timed out")
except Exception as e:
return ToolResult(success=False, error=str(e))
Required Patterns:
GOOD:
class Agent:
MAX_ITERATIONS = 10
async def run(self, task: str) -> AgentResult:
state = AgentState(task=task)
for i in range(self.MAX_ITERATIONS):
if self._should_stop(state):
break
# Think
action = await self.plan(state)
# Act
result = await self.execute(action)
# Observe
state = self.update_state(state, result)
return self.finalize(state)
Required Patterns:
Memory Types:
| Type | Purpose | Persistence | |------|---------|-------------| | Working | Current task context | Session | | Short-term | Recent conversation | Session | | Long-term | User preferences, facts | Persistent | | Episodic | Past task summaries | Persistent | | Semantic | Embeddings/RAG | Persistent |
Required Patterns:
Error Categories:
| Category | Retry | Action | |----------|-------|--------| | Rate limit | Yes | Exponential backoff | | Timeout | Yes | Retry with longer timeout | | Auth failure | No | Fail with clear message | | Invalid input | No | Return validation error | | Tool failure | Maybe | Try alternative tool | | Model error | Yes | Retry or fallback model |
GOOD:
class AgentError(Exception):
def __init__(self, message: str, code: str, recoverable: bool = False):
self.message = message
self.code = code
self.recoverable = recoverable
@retry(
retry=retry_if_exception_type(RateLimitError),
wait=wait_exponential(multiplier=1, max=60),
stop=stop_after_attempt(3)
)
async def call_model(self, messages: list) -> str:
try:
return await self.client.complete(messages)
except RateLimitError:
raise # Let retry handle it
except AuthError as e:
raise AgentError("Authentication failed", "AUTH_ERROR", recoverable=False)
Required Patterns:
GOOD:
@dataclass(frozen=True)
class AgentState:
task: str
messages: tuple[Message, ...]
tool_results: tuple[ToolResult, ...]
iteration: int = 0
status: Literal["running", "completed", "failed"] = "running"
def with_message(self, message: Message) -> "AgentState":
return replace(self, messages=self.messages + (message,))
def with_tool_result(self, result: ToolResult) -> "AgentState":
return replace(self, tool_results=self.tool_results + (result,))
Patterns (if applicable):
Coordination Patterns:
| Pattern | Use Case | |---------|----------| | Supervisor | One agent routes to specialists | | Pipeline | Sequential agent processing | | Debate | Multiple agents propose, one decides | | Swarm | Autonomous agents, shared goals | | Hierarchical | Manager → workers structure |
Required Patterns:
GOOD:
# prompts/system.yaml
agent_system_prompt:
version: "1.2"
template: |
You are a helpful assistant with access to these tools:
{% for tool in tools %}
- {{ tool.name }}: {{ tool.description }}
{% endfor %}
Current date: {{ current_date }}
User preferences: {{ user_prefs }}
Required Patterns:
Logging Checklist:
| Event | Log Level | Required Fields | |-------|-----------|-----------------| | Agent start | INFO | task_id, user_id, task | | Tool call | DEBUG | tool_name, inputs, duration | | Model call | DEBUG | model, tokens_in, tokens_out, latency | | Error | ERROR | error_code, message, stack_trace | | Agent complete | INFO | task_id, status, total_duration, total_tokens |
GOOD:
logger.info("agent_started", extra={
"task_id": task_id,
"user_id": user_id,
"task_type": task.type,
})
logger.debug("tool_executed", extra={
"task_id": task_id,
"tool": tool.name,
"duration_ms": duration,
"success": result.success,
})
| Setting | Type | Description |
|---------|------|-------------|
| model | string | Model identifier |
| max_iterations | int | Loop limit |
| timeout_seconds | int | Overall timeout |
| tool_timeout | int | Per-tool timeout |
| max_tokens | int | Response limit |
| temperature | float | Model temperature |
| retry_attempts | int | Retry count |
1. Environment variables (secrets, deployment-specific)
2. Config files (default.yaml, production.yaml)
3. Code defaults (fallbacks only)
GOOD:
class AgentConfig(BaseSettings):
model: str = "claude-3-sonnet"
max_iterations: int = 10
timeout_seconds: int = 300
class Config:
env_prefix = "AGENT_"
env_file = ".env"
| Type | Coverage | Purpose | |------|----------|---------| | Unit | Tools, utilities | Isolated component tests | | Integration | Agent + tools | End-to-end flows | | Snapshot | Prompts | Detect prompt regressions | | Eval | Agent responses | Quality benchmarks |
GOOD:
def test_search_tool_timeout():
tool = SearchTool(timeout=0.001)
result = tool.execute("test query")
assert not result.success
assert "timeout" in result.error.lower()
def test_agent_max_iterations():
agent = Agent(max_iterations=3)
# Mock tool that never completes
agent.tools = [InfiniteLoopTool()]
result = agent.run("impossible task")
assert result.iterations == 3
assert result.status == "max_iterations_reached"
## Agent Review: [project-name]
### Summary
[1-2 sentence overview]
### Architecture Score
| Category | Score | Notes |
|----------|-------|-------|
| Folder Structure | X/5 | |
| Tool Design | X/5 | |
| Agent Loop | X/5 | |
| Memory Management | X/5 | |
| Error Handling | X/5 | |
| State Management | X/5 | |
| Observability | X/5 | |
| Testing | X/5 | |
| **Overall** | **X/5** | |
### Critical Issues
- [ ] [Issue] - Location: [file]
### Recommendations
- [ ] [Recommendation] - Priority: [High/Medium/Low]
### Strengths
- [What the implementation does well]
| Anti-Pattern | Problem | Fix | |--------------|---------|-----| | God Agent | Single agent does everything | Split by responsibility | | Infinite Loop | No termination condition | Add max iterations | | Silent Failures | Errors swallowed | Structured error handling | | Hardcoded Prompts | Prompts in code | Externalize to files | | No Observability | Can't debug production | Add structured logging | | Mutable State | Race conditions, bugs | Immutable state updates | | No Timeouts | Hanging requests | Configure all timeouts | | Missing Validation | Invalid inputs accepted | Schema validation |
development
Security anti-pattern for Cross-Site Scripting vulnerabilities (CWE-79). Use when generating or reviewing code that renders HTML, handles user input in web pages, uses innerHTML/document.write, or builds dynamic web content. Covers Reflected, Stored, and DOM-based XSS. AI code has 86% XSS failure rate.
development
Security anti-pattern for XPath injection vulnerabilities (CWE-643). Use when generating or reviewing code that queries XML documents, constructs XPath expressions, or handles user input in XML operations. Detects unescaped quotes and special characters in XPath queries.
development
Security anti-pattern for weak password hashing (CWE-327, CWE-759). Use when generating or reviewing code that stores or verifies user passwords. Detects use of MD5, SHA1, SHA256 without salt, or missing password hashing entirely. Recommends bcrypt, Argon2, or scrypt.
development
Security anti-pattern for weak encryption (CWE-326, CWE-327). Use when generating or reviewing code that encrypts data, handles encryption keys, or uses cryptographic modes. Detects DES, ECB mode, static IVs, and custom crypto implementations.