skills/designing-agent-teams/SKILL.md
Use this skill to design or refine a multi-agent coding team with model-to-role assignments. Triggers when creating an agent team for a codebase, adding agents to an existing team, reviewing an agent team configuration, choosing which AI model to assign to each role, or optimising cost/quality/speed tradeoffs across agents.
npx skillsauth add mattobee/skills designing-agent-teamsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate multi-agent coding teams with optimal model assignments. Outputs a reasoning document (AGENTS.md) plus agent instruction files (.agents/) and platform-specific configs where applicable.
Creating for an existing codebase? → Follow "Gather inputs: existing codebase" below Starting a new project? → Follow "Gather inputs: new project" below Adding agents to an existing team? → Follow "Gather inputs: add" below Reviewing an existing team? → Follow "Gather inputs: review" below
Gather what is available from the codebase and the user's request. If the user has not specified priorities, infer them from context (a solo side project likely values speed and cost; a team production app likely values quality). Do not block on missing information — use sensible defaults and note assumptions in AGENTS.md.
.agents/, AGENTS.md, and any platform-specific configs) to understand the current team: roles, models, tools, escalation paths.agents/, AGENTS.md, and any platform-specific configs)Read references/model-guide.md for current model characteristics, pricing, and assignment heuristics.
Apply these principles in order:
1. Start from roles, not models. Identify what agent roles the project needs before choosing models.
2. Consider whether a team is needed at all. A well-prompted single agent with a good spec handles many projects effectively. Multi-agent coordination adds the most value when a single agent's accuracy is below ~45% on the task, or when the project has clearly parallelisable work streams with independent file boundaries. For simple projects, a single mid-tier agent with a clear AGENTS.md spec may outperform a poorly-coordinated team.
3. Assemble the core team, then bring in specialists.
Think of the agent team like a human dev team: a small core that's always present, plus specialist consultants brought in based on the project's requirements.
Core team (always present):
These three are present in every team this skill generates. They form the plan-implement-verify loop observed across prominent multi-agent systems (MetaGPT, AgentCoder, ChatDev).
Specialist consultants (added per project):
Bring in specialists based on what the project needs. Each should cover a cross-cutting concern that the core team would otherwise miss or handle inconsistently:
Prefer cross-cutting concern specialists (security, accessibility, performance) over tool-specific ones. Tool-specific agents are justified primarily by context window constraints, not domain expertise — if the project only uses one or two platforms, the coder can handle them with MCP access.
When the project has relevant agent skills installed (e.g., reviewing-accessibility, writing-accessibility-tests), instruct specialist agents to load those skills for domain-specific procedures and checklists. This grounds the specialist in documented workflows rather than relying solely on the model's general knowledge.
4. Match cognitive demand to model tier. See references/model-guide.md for the full heuristics. Summary:
5. Use cross-family models for specialist review. When specialist agents (accessibility, security) review code written by the Coder, assign them a model from a different provider family. Models from the same family share training biases and blind spots — a Claude-based reviewer is unlikely to catch errors that a Claude-based Coder also missed. Cross-family review (e.g., GPT reviewing Claude-generated code) closes significantly more of the gap to frontier quality. Choose the reviewing family based on domain-specific benchmarks, not general coding benchmarks. See references/model-guide.md for guidance and references/rationale.md for the research backing.
6. Prevent write conflicts. How strictly to scope write access depends on whether agents run in parallel or sequentially:
In both cases, when a task genuinely crosses boundaries, the lead should coordinate the handoff rather than giving multiple agents overlapping write access.
7. Scope tools tightly. Each agent gets only the tools it needs. A documentation agent does not need shell access. A file explorer does not need write permissions.
8. Ground specialists in official sources. Specialist agents for specific tools or platforms (e.g., Supabase, Vercel, Sanity, Netlify) should defer to authoritative sources rather than relying on the model's training knowledge, which may be outdated or incomplete. When designing a specialist agent:
9. Design for escalation. Build concrete escalation paths: test failures, error counts, or complexity markers trigger handoff from cheaper to more capable agents.
10. Use deterministic quality gates. Do not rely on an LLM to judge whether its own work is good enough. Quality gates should be automated and deterministic: linting, type-checking, test pass rates, build success. An LLM reviewer in a fresh context adds value for nuanced issues (security patterns, architectural fit), but the primary gate must be tooling, not judgment.
11. Target the 70/20/10 split. Roughly 70% fast, 20% mid-tier, 10% frontier by call volume. If more than 30% of calls go to frontier models, re-examine whether those tasks genuinely need top-tier reasoning.
All outputs go into the current working directory (the project repo root).
These are platform-agnostic and work with any agent that can read markdown files:
AGENTS.md # Team overview, orchestration, escalation map (repo root)
.agents/
├── lead.md # Core: always present
├── coder.md # Core: always present
├── tester.md # Core: always present
└── [specialist].md # Specialists: added per project (e.g., accessibility.md, security.md)
Create AGENTS.md in the project root if it does not exist. This is the team-level document that lives at the repo root.
AGENTS.md contains:
If AGENTS.md already exists, read it first and update in place. In add mode, append the new agent(s) to the roster and update the orchestration, cost, and escalation sections. Do not rewrite descriptions of unchanged agents.
Create .agents/ in the project root if it does not exist.
Individual agent files (e.g., lead.md, coder.md, tester.md) — one file per agent. Each contains:
Keep each agent file under 500 words. The agent's capabilities come from the model and its tools, not from lengthy instructions.
After generating the core outputs, check whether the project already has platform-specific agent configuration. Look for directories and files such as .opencode/agents/, CLAUDE.md, .cursor/agents/, or similar. If found, generate or update the platform-specific config files to reference the .agents/*.md instruction files.
If no existing platform config is detected, ask the user which platform they are using. If the user does not specify, generate only the core outputs — they are sufficient for most platforms to pick up.
When generating platform-specific configs:
.agents/*.md instruction file for the full role description. Do not duplicate instructions across files.references/model-guide.md for the tier mapping). Always use canonical stable names, not dated snapshot IDs — check the provider's documentation for current canonical IDs.If the platform already has agent config files, read them first and merge new agents. Preserve existing agents not being replaced.
.agents/*.md and reference from platform configsThese patterns emerged from real-world use and significantly improved team effectiveness. They are recommendations, not requirements — adapt them to your project's needs.
The lead should explain each handoff to the user: which agent is being called, why, and what it returned. Without narration, multi-agent workflows are a black box — the user sees a final result but has no idea what happened in between, making it hard to debug, trust, or improve the team.
Add this to the lead's instructions:
Before each delegation, explain which agent you are calling and why. After each agent returns, summarise what it found or did.
Reinforce this in any platform-specific config preamble so the lead sees it at invocation time, not just in its instruction file.
Domain-critical specialists (accessibility, security, performance) are most effective when called at two points in the workflow:
This is more effective than a single review gate at the end, because:
Not every specialist needs two touchpoints. Use dual-touchpoint for concerns that are expensive to retrofit (accessibility, security). A documentation writer, for instance, only needs a late pass.
Dual-touchpoint specialists can use different model tiers per touchpoint. Early passes (risk assessment, threat modelling) benefit from mid-tier reasoning depth — the specialist needs to reason about interaction patterns, attack vectors, or architectural implications. Late passes (checklist-driven review against established criteria like WCAG or OWASP) can run on fast-tier models when the agent instructions are structured for skill-guided evaluation. Since late passes are the higher-volume touchpoint (they run on both Full and Standard pipeline tasks — see "Task classification" below), making them fast-tier-friendly is a significant cost and speed optimisation.
When using dual-touchpoint specialists, name them "Specialist" rather than "Reviewer" — "Reviewer" implies they only look at finished work and doesn't reflect the advisory role.
The early-pass output should include a structured "Coder Requirements" section: a numbered list of specific, implementable requirements arising from the risk assessment. Each item states what to do and why (citing the relevant standard or criterion). This section is passed directly to the Coder — write it for that audience. The Lead should pass this section verbatim rather than summarising it; detail is lost in translation when the Lead paraphrases specialist guidance.
When multiple quality-gate agents run after implementation, order them by dependency:
This avoids wasted specialist reviews on code that's about to change due to test failures.
Not every task needs the full agent pipeline. The Lead should classify each incoming request to determine which specialist passes to invoke:
| Pipeline | Specialist passes | When | |---|---|---| | Full | Early + Late | Novel patterns, new auth flows, custom widgets, complex interactions | | Standard | Late only | CRUD routes, forms, component additions, standard patterns | | Lightweight | None — quality gates only | Config, deps, copy, styling, pure refactors |
The default should be the cheapest pipeline that doesn't sacrifice quality for the task type. Most tasks in a mature codebase are Standard or Lightweight — a new CRUD endpoint doesn't need a threat model if the project has established auth and validation patterns.
For Standard tasks, the Lead substitutes early specialist passes with a standing guidance reference in the Coder's brief (see "Standing conventions" below). Late passes still run, but specialists focus on issues beyond the baseline.
This classification is a template — adapt the tiers and criteria to your project. The principle is: invest specialist attention where it changes outcomes, not uniformly across all tasks.
Task classification only works if the Coder has baseline guidance to follow when early specialist passes are skipped. Codify recurring specialist recommendations as standing conventions in a shared reference document (e.g., STACK.md, a project conventions file, or a dedicated section of the README).
Standing conventions should cover the fundamentals that apply to every task:
The Coder follows these on every task regardless of pipeline tier. When classifying a task as Standard, the Lead appends a brief guidance note to the Coder's brief:
Accessibility: Follow standing accessibility conventions in [reference doc]. [Task-specific notes.]
Security: Follow standing security conventions in [reference doc]. [Task-specific notes.]
This lets the Lead add task-specific guidance without invoking the full specialist pass. The specialist late pass then focuses on issues beyond the standing conventions — complex interaction patterns, non-obvious vulnerabilities, edge cases the baseline doesn't cover — rather than re-checking fundamentals.
For tasks where quality is subjective (visual design, UX, content tone), a single pass/fail gate is not enough. The Tester or a specialist can grade the Coder's output against explicit criteria and iterate, rather than just passing or failing.
This works when:
Deterministic gates (tests, linting, type-checks) stay primary. Evaluator loops cover what automated tools can't measure. 3–5 iterations typically hit diminishing returns.
Before the Coder starts a chunk of work, have it agree a work contract with the Tester: what will be built, and what counts as "done." The Coder proposes scope and acceptance criteria, the Tester pushes back, and both agree before implementation starts.
This closes the gap between a high-level plan and testable output. Without a contract, the Tester grades against implicit expectations. Use work contracts when the Lead's plan is intentionally high-level and the spec-to-implementation gap is wide enough for misalignment.
CLAUDE.md), it may contain project-specific instructions unrelated to agents. Only touch the agent team section. Rewriting the whole file will destroy existing project configuration..agents/[name].md file and a corresponding platform config file. Do not rewrite existing agent files.claude-sonnet-4, not claude-sonnet-4-20250514). Dated snapshot IDs pin to a specific version that goes stale. Check the provider's docs for current canonical IDs rather than guessing the format.references/rationale.md).When reviewing a team (any mode), check for:
tools
Use this skill to work through review feedback on a pull request — read the inline review comments, assess each one's validity, make the code changes that are warranted, and reply to every thread with a one-line explanation of what was done (or why it was declined). Triggers when the user asks to address PR feedback, respond to reviewers, work through review comments, handle a code review, action the comments on a PR, or asks "what do the reviewers want changed?" Also triggers when resuming work on a PR that has open review threads.
testing
Use this skill to write Playwright accessibility tests using the two-layer strategy (axe-core scans + targeted assertions). Triggers when adding accessibility test coverage, reviewing test gaps, writing axe scans, or creating Playwright assertions for accessible names, landmarks, ARIA states, focus management, or contrast.
tools
Use this skill to suggest prioritised next steps for a project. Triggers when the user asks what to work on next, wants to resume after a break, or needs help prioritising a backlog.
development
Use this skill to review implemented UI code for WCAG accessibility compliance. Triggers when reviewing components, pages, or templates for accessibility, auditing a feature after implementation, or answering questions about accessible patterns, ARIA, keyboard navigation, or screen reader support.