skills/ai-agents/SKILL.md
Building AI agents — tool use, chains, memory, and autonomous workflows with LLMs. Use when user mentions "AI agent", "agent development", "tool use", "function calling", "agent loop", "ReAct pattern", "agent memory", "autonomous agent", "multi-agent", "langchain agents", "crew AI", or building systems where LLMs take actions.
npx skillsauth add 1mangesh1/dev-skills-collection ai-agentsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
An AI agent is an LLM connected to tools and running in a loop. The LLM decides what to do, calls a tool, observes the result, and repeats until the task is done. Without the loop and tools, it is just a chatbot.
Every agent follows this pattern:
1. OBSERVE - Receive input (user message or tool result)
2. THINK - LLM reasons about what to do next
3. ACT - Call a tool or return a final answer
4. OBSERVE - Get tool result, go back to step 2
The loop terminates when the LLM decides no more tool calls are needed and returns a final response. A maximum iteration limit prevents runaway loops.
tools = [
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for a query",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
tools = [
{
"name": "search_web",
"description": "Search the web for a query",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
tools=tools
)
The model returns a tool_use block (Anthropic) or tool_calls array (OpenAI). Your code executes the tool and feeds the result back as the next message.
ReAct interleaves reasoning traces with actions. The LLM explicitly writes out its thinking before each tool call, making the decision process inspectable.
Thought: I need to find the current stock price of AAPL.
Action: search_web("AAPL stock price")
Observation: AAPL is trading at $187.44.
Thought: I have the price. I can answer the user now.
Answer: Apple (AAPL) is currently trading at $187.44.
With modern tool-calling APIs, ReAct happens naturally -- the model reasons in its text output and issues tool calls in structured blocks. You do not need to parse "Action:" strings from raw text anymore.
No frameworks. Just API calls and a tool dispatch dictionary.
import anthropic
import json
client = anthropic.Anthropic()
# Define tools
def read_file(path: str) -> str:
with open(path) as f:
return f.read()
def write_file(path: str, content: str) -> str:
with open(path, "w") as f:
f.write(content)
return f"Wrote {len(content)} bytes to {path}"
tool_definitions = [
{
"name": "read_file",
"description": "Read a file from disk",
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]
}
},
{
"name": "write_file",
"description": "Write content to a file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["path", "content"]
}
}
]
dispatch = {
"read_file": lambda args: read_file(args["path"]),
"write_file": lambda args: write_file(args["path"], args["content"]),
}
def run_agent(user_message: str, max_iterations: int = 10):
messages = [{"role": "user", "content": user_message}]
for _ in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=tool_definitions,
messages=messages,
)
# Append assistant response
messages.append({"role": "assistant", "content": response.content})
# Check if the model wants to use tools
tool_blocks = [b for b in response.content if b.type == "tool_use"]
if not tool_blocks:
# No tool calls -- agent is done
text = "".join(b.text for b in response.content if b.type == "text")
return text
# Execute each tool and collect results
tool_results = []
for block in tool_blocks:
try:
result = dispatch[block.name](block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
except Exception as e:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"Error: {e}",
"is_error": True,
})
messages.append({"role": "user", "content": tool_results})
return "Agent hit max iterations without completing."
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const tools: Anthropic.Tool[] = [
{
name: "search_web",
description: "Search the web",
input_schema: {
type: "object" as const,
properties: { query: { type: "string" } },
required: ["query"],
},
},
];
async function executeTool(name: string, input: Record<string, unknown>): Promise<string> {
if (name === "search_web") {
// Replace with real implementation
return `Results for: ${input.query}`;
}
throw new Error(`Unknown tool: ${name}`);
}
async function runAgent(userMessage: string, maxIterations = 10): Promise<string> {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: userMessage },
];
for (let i = 0; i < maxIterations; i++) {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
tools,
messages,
});
messages.push({ role: "assistant", content: response.content });
const toolBlocks = response.content.filter(
(b): b is Anthropic.ToolUseBlock => b.type === "tool_use"
);
if (toolBlocks.length === 0) {
return response.content
.filter((b): b is Anthropic.TextBlock => b.type === "text")
.map((b) => b.text)
.join("");
}
const toolResults: Anthropic.ToolResultBlockParam[] = await Promise.all(
toolBlocks.map(async (block) => {
try {
const result = await executeTool(block.name, block.input as Record<string, unknown>);
return { type: "tool_result" as const, tool_use_id: block.id, content: result };
} catch (e) {
return {
type: "tool_result" as const,
tool_use_id: block.id,
content: `Error: ${e}`,
is_error: true,
};
}
})
);
messages.push({ role: "user", content: toolResults });
}
return "Agent hit max iterations.";
}
Pass the full message array to each API call. This is the simplest form of memory but hits context window limits on long conversations.
When the conversation grows too long, summarize older messages. Keep the system prompt and last few exchanges intact, replace everything in between with a summary generated by a separate LLM call.
Store past interactions or documents as embeddings. Before each LLM call, retrieve the top-k relevant chunks and inject them into the prompt. Use any vector database (Pinecone, ChromaDB, pgvector, Qdrant).
Supervisor: One coordinating agent delegates subtasks to specialist agents and synthesizes their outputs.
Debate / Critique: Two agents review each other's work. Agent A drafts, Agent B critiques, Agent A revises. Improves output quality at the cost of more API calls.
Pipeline: Agents are chained sequentially. Agent 1 researches, Agent 2 writes, Agent 3 reviews. Each agent sees only the output of the previous stage.
Parallel Fan-Out: A router sends independent subtasks to multiple agents simultaneously, then merges results.
| Tool | Use Case | |------|----------| | Web search | Grounding in current information | | Code execution (sandbox) | Running Python/JS to verify answers | | File read/write | Persisting work products | | Shell commands | System operations, git, builds | | API calls (HTTP) | Interacting with external services | | Database queries | Reading/writing structured data | | Browser automation | Scraping, form filling |
Keep tool descriptions concise and specific. Vague descriptions cause the model to misuse tools.
is_error: true in examples above). The model can often self-correct.usage.input_tokens, usage.output_tokens).Validate user inputs before they reach the agent. Check for prompt injection attempts, excessively long inputs, and disallowed content.
Check agent outputs before returning to the user or executing dangerous operations:
BLOCKED_COMMANDS = ["rm -rf /", "DROP TABLE", "FORMAT C:"]
def validate_tool_call(name: str, args: dict) -> bool:
if name == "run_shell":
cmd = args.get("command", "")
if any(blocked in cmd for blocked in BLOCKED_COMMANDS):
return False
return True
For high-stakes actions (sending emails, making purchases, modifying production data), pause and ask for human approval before executing the tool. Return a rejection message to the LLM if the user declines.
| Framework | Language | Key Strength | |-----------|----------|-------------| | LangChain | Python/JS | Large ecosystem, many integrations | | LangGraph | Python/JS | Stateful, graph-based agent workflows | | CrewAI | Python | Multi-agent role-based collaboration | | AutoGen | Python | Multi-agent conversation patterns | | Claude Agent SDK | Python | Lightweight agent loop with Claude | | Vercel AI SDK | TypeScript | Streaming-first, React integration | | Mastra | TypeScript | Agent framework with built-in memory/tools |
Start without a framework. Add one when you need features you are reimplementing (state persistence, complex routing, built-in tool libraries). Frameworks add abstraction layers that make debugging harder.
Testing agents is harder than testing deterministic code. Strategies:
def test_research_agent():
result = run_agent("What is the population of Tokyo?")
assert "13" in result or "14" in result # millions, approximately
# Check that web search was called
assert any("search_web" in str(m) for m in recorded_messages)
Tools: web search, URL reader, note-taking. The agent searches for information, reads pages, extracts facts, and compiles a report. Useful for market research, literature review, competitive analysis.
Tools: file read/write, shell execution, web search. The agent reads existing code, plans changes, writes code, runs tests, and iterates on failures. Key design decision: sandbox the execution environment.
Tools: code execution (Python with pandas/numpy), file read, chart generation. The agent loads data, explores it, runs statistical analysis, and generates visualizations. Give it a Python sandbox with data science libraries pre-installed.
Tools: knowledge base search, ticket system API, escalation. The agent retrieves relevant documentation, answers questions, and escalates when confidence is low or the request requires human judgment.
Tools: email, calendar, project management APIs. The agent performs multi-step business processes (schedule meetings, send follow-ups, update tasks). Always use human-in-the-loop for actions with external side effects.
tools
Parallel execution with xargs, GNU parallel, and batch processing patterns. Use when user mentions "xargs", "parallel", "batch processing", "run in parallel", "parallel execution", "process list of files", "bulk operations", "concurrent commands", "map over files", or running commands on multiple inputs.
development
WebSocket implementation for real-time bidirectional communication. Use when user mentions "websocket", "ws://", "wss://", "real-time", "live updates", "chat application", "socket.io", "Server-Sent Events", "SSE", "push notifications", "live data", "streaming data", "bidirectional communication", "websocket server", "reconnection", or building real-time features.
tools
Frontend bundler configuration for Webpack and Vite. Use when user mentions "webpack", "vite", "bundler", "vite config", "webpack config", "code splitting", "tree shaking", "hot module replacement", "HMR", "build optimization", "bundle size", "chunk splitting", "loader", "plugin", "esbuild", "rollup", "dev server", or configuring JavaScript build tools.
tools
VS Code configuration, extensions, keybindings, and workspace optimization. Use when user mentions "vscode", "vs code", "vscode settings", "vscode extensions", "keybindings", "code editor", "workspace settings", "settings.json", "launch.json", "tasks.json", "vscode snippets", "devcontainer", "remote development", or customizing their VS Code setup.