skills/building-agents/SKILL.md
Building tool-using LLM agents in Python -- runtime context, prompt shape, tools, validation, parsing, context reduction, memory, delegation. Load when designing or implementing agents, ReAct loops, or multi-agent systems.
npx skillsauth add oornnery/.agents building-agentsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use for agent runtime architecture: context collection, prompt shape, tool contracts, parsing, permissioning, context reduction, memory, delegation.
python for Python impl/toolchain.arch when runtime must fit system architecture or SDD.security for tool risk, approvals, trust boundaries, threat modeling.quality for eval loops, regression checks, RCA after agent failures.docs when deliverable is design/ops doc.Use assets over inline examples when implementing:
assets/project/pyproject.toml -- Python agent project setupassets/project/main.py -- entrypointassets/project/agent.py -- agent construction/result typingassets/project/tools.py -- tool registry + implementationsassets/project/session.py -- memory/transcript shapingassets/project/tests/test_agent.py -- runtime testsAgent = runtime harness around model. Model emits tool call or final answer; harness owns everything else.
Core loop:
Non-negotiables:
max_steps and max_attempts; no infinite loops.| Component | Contract |
| ----------------- | -------------------------------------------------------------------------- |
| Runtime context | immutable snapshot + render() prompt text |
| Prompt shape | stable prefix + volatile suffix; cache stable parts |
| Tools | closed registry of typed Tool objects |
| Validation | check tool name, args, paths, domains, mutation risk, recursion/delegation |
| Permissions | ASK / AUTO / NEVER; risky tools require approval |
| Parser | one response = one tool call, one final answer, or retry |
| Context reduction | clip tool output, dedupe old reads, summarize old transcript |
| Sessions | append-only transcript + small working memory |
| Delegation | bounded read-only child agents with smaller step budget and depth limit |
Default shape:
dataclass(frozen=True, slots=True)pathlib.Pathrender() -> strCoding context includes:
git rev-parse --show-toplevelAGENTS*, README.md, manifestsUse assets for concrete code.
Split prompt:
Build stable prefix once. Cache if provider supports it; otherwise keep text identical for auto-cache hits.
Prompt rules:
<tool>...</tool> or one <final>...</final>Expose closed set of named tools. Avoid arbitrary command execution by default.
Tool contract:
namedescriptionsignaturerisk: safe or riskyrun(args: dict[str, Any]) -> strCommon tools:
list_files, read_file, search, write_file, patch_file, run_shellweb_search, fetch_url, read_file, save_notelist_tasks, create_task, complete_task, query_memoryquery_logs, query_metrics, list_alerts, run_runbook, page_oncalldelegateFlat registry beats plugin hierarchy until real extension pressure exists.
Validate before execution:
Filesystem invariant:
Path.relative_toApproval invariant:
ASK: ask humanAUTO: allowNEVER: denyDispatcher owns validation, approval, execution, error capture. Return errors as strings the model can see and correct next turn.
Response contract: exactly one tool call or one final answer.
Supported formats:
<tool> for simple args<final>...</final> for final answerParser returns tagged result:
("tool", payload)("final", text)("retry", message)Retry message is recorded into next turn. This makes weak/malformed model output recoverable.
Context hygiene keeps agents alive after turn 8.
Rules:
Suggested limits:
Two layers:
Working memory tracks:
Use Path.write_text + JSON first. Add database only after persistence pressure is real.
Delegation reduces main transcript noise and parallelizes bounded side work.
Constraints:
NEVERmax_depth small, usually 1max_steps smaller than parentDo not create SubAgent subclass unless responsibilities truly diverge. Same Agent class with stricter config is enough.
Agent.ask should stay small:
Use ModelClient Protocol with one method:
class ModelClient(Protocol):
def complete(self, prompt: str, max_new_tokens: int = 512) -> str: ...
Test loop with FakeModelClient returning canned outputs.
| Agent | Context | First Tools | Memory |
| ------------------ | ------------------------------------------- | ---------------------------------------------- | --------------------------------- |
| Coding | WorkspaceContext, git state, anchor docs | files, search, patch/write, shell, delegate | task, last files, tool notes |
| Research | question, deadline, constraints | search, fetch, read, save/list notes, delegate | sources, hypotheses, subquestions |
| Personal assistant | user, time zone, calendar/tasks handles | tasks, memory, notes, calendar, web | goal, entities, decisions |
| Ops/support | connected systems, on-call, active incident | logs, metrics, alerts, runbook, page, status | incident state, actions, evidence |
Risky prod tools use approval.
Generic runtime:
src/agent_runtime/
├── agent.py
├── context.py
├── prompt.py
├── parser.py
├── compaction.py
├── permissions.py
├── session.py
├── models/
└── tools/
tests/
Specializations stay thin:
src/coding_agent/
├── workspace.py
├── tools.py
└── cli.py
Runtime shared. Domain layer chooses tools/context.
Build v1:
SessionStoreWorkingMemorysafe_path if touching filesDefer:
rasbt/mini-coding-agent: https://github.com/rasbt/mini-coding-agentbadlogic/pi-mono: https://github.com/badlogic/pi-mono/tree/mainsanbuphy/learn-coding-agent: https://github.com/sanbuphy/learn-coding-agentLeonxlnx/agentic-ai-prompt-research: https://github.com/Leonxlnx/agentic-ai-prompt-researchpython -c "import this"development
--- name: verification description: Discover and run project validation gates: format, lint, typecheck, LSP diagnostics, tests, build, static security checks, dependency audits, and RTK output handling. Use before claiming work is complete, when fixing broken checks, or when setting up a validation plan. --- # Verification Use this skill to prove changes with the strongest practical checks the repo already supports. ## Discovery Order 1. Read task aliases: `package.json`, `pyproject.toml`, `
tools
Build, review, or validate standalone Python scripts run with uv inline metadata. Use for one-file automation, operational scripts, script dependencies, shebangs, idempotency, safety, representative runs, and promoting scripts to packages.
development
Build, review, or validate Python packages and libraries where public API stability, packaging metadata, imports, examples, changelogs, build output, and compatibility matter.
tools
Build, review, or validate Python command-line applications and terminal tools. Use for argparse, Typer, Rich, Textual-adjacent CLI UX, stdout/stderr contracts, exit codes, automation-friendly flags, help output, and CLI tests.