skills/aorchestra-automating-sub-agent-creation/SKILL.md
Dynamically create specialized sub-agents for complex multi-step tasks using the AOrchestra pattern: decompose goals, then spawn tailored (Instruction, Context, Tools, Model) executors on-the-fly. Use when: 'break this task into sub-agents', 'orchestrate agents for this problem', 'create a multi-agent workflow', 'delegate subtasks to specialized agents', 'build an agent pipeline for this', 'dynamically assign agents to subtasks'.
npx skillsauth add ndpvt-web/arxiv-claude-skills aorchestra-automating-sub-agent-creationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to apply the AOrchestra orchestration pattern from the paper "AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration" (Ruan et al., 2026). Instead of using pre-defined agent roles, you dynamically decompose complex tasks and spawn specialized sub-agents on-the-fly by concretizing a four-tuple -- (Instruction, Context, Tools, Model) -- for each subtask. This produces better results than static multi-agent designs because each executor is purpose-built for its specific subtask, receives only relevant context (not the full history), and uses appropriately-scoped tools and model choices.
The Four-Tuple Agent Abstraction. AOrchestra models every agent as a tuple (I, C, T, M): Instruction (the specific, actionable subtask directive), Context (curated findings from prior steps -- not the full history), Tools (the subset of capabilities needed), and Model (the LLM chosen based on subtask complexity). This abstraction is framework-agnostic -- it works whether you're spawning Claude Code agents via the Task tool, calling APIs, or orchestrating shell processes.
Curated Context Over Full Context. A critical finding from the paper: passing all accumulated history to sub-agents actually hurts performance (84% accuracy) compared to passing no context (86%). The winning strategy is curated context -- selectively injecting only the relevant findings and artifacts from previous steps (96% accuracy). This means the orchestrator must actively decide what each sub-agent needs to know, filtering out noise and distracting details from prior execution traces.
The Orchestrator's Restricted Action Space. The central orchestrator never interacts with the environment directly. Its only two actions are Delegate(I, C, T, M) and Finish(answer). This forces clean separation between planning and execution. At each step, the orchestrator reviews subtask history, evaluates whether results are sufficient, and either delegates the next subtask or returns the final answer. This Review-Evaluate-Decide loop continues until the task is complete or the attempt budget is exhausted.
Analyze the top-level goal. Parse the user's request to identify the end objective, constraints, and success criteria. Determine whether the task genuinely requires multi-agent decomposition or can be solved directly.
Decompose into subtasks. Break the goal into a sequence of concrete, actionable subtasks. Each subtask should be independently executable and have a clear deliverable. Order them by dependency -- identify which subtasks can run in parallel and which require prior results.
For each subtask, concretize the four-tuple:
Spawn the sub-agent. Use the Task tool with the appropriate subagent_type and pass the constructed instruction as the prompt. Include curated context inline in the prompt. Set model to match your M selection. Choose the subagent_type that matches the needed tools (e.g., Bash for command execution, general-purpose for multi-tool tasks, Explore for codebase search).
Collect and summarize results. When the sub-agent returns, extract the core findings, artifacts, and any errors. Summarize these into a structured format that can serve as curated context for downstream subtasks.
Review-Evaluate-Decide loop. After each sub-agent completes:
Handle failures adaptively. If a sub-agent fails or times out, don't blindly retry. Analyze the failure, adjust the tuple -- tighten the instruction, add error context, upgrade the model, or change the tool set -- then re-delegate.
Synthesize the final answer. Once all necessary subtasks are complete, combine their results into a coherent response to the user's original request. Reference specific sub-agent outputs rather than re-deriving information.
Example 1: Multi-step research and code generation
User: "Find the top 3 most-starred Python testing frameworks on GitHub,
then create a comparison table as a CSV file."
Orchestration plan:
Subtask 1 - Research:
I: "Search GitHub for the top 3 most-starred Python testing frameworks.
Return each framework's name, star count, and one-line description."
C: (none -- first subtask)
T: WebFetch, Bash (for gh CLI)
M: haiku (straightforward search task)
→ subagent_type: general-purpose, model: haiku
Subtask 2 - Generate CSV:
I: "Create a CSV file at /tmp/testing_frameworks.csv with columns:
Name, Stars, Description. Populate with the provided data."
C: "Research results: 1) pytest - 12.5k stars - Simple powerful testing
2) unittest - (stdlib) - Built-in test framework
3) robot - 9.8k stars - Generic automation framework"
T: Write (file creation only)
M: haiku (simple formatting task)
→ subagent_type: general-purpose, model: haiku
Result: CSV file created with accurate, sourced data.
Example 2: Debugging a failing test suite
User: "The tests in tests/api/ are failing. Fix them."
Orchestration plan:
Subtask 1 - Diagnose:
I: "Run pytest tests/api/ -x --tb=short and report: which tests fail,
the exact error messages, and the relevant source file paths."
C: (none -- first subtask)
T: Bash
M: haiku (just running a command and parsing output)
→ subagent_type: Bash, model: haiku
Subtask 2 - Investigate root cause:
I: "Read the failing test file and the source module it tests.
Identify why test_create_user fails with 'KeyError: email'.
Report the root cause and proposed fix."
C: "pytest output: tests/api/test_users.py::test_create_user FAILED
KeyError: 'email' in src/api/users.py:42"
T: Read, Grep, Glob
M: sonnet (requires code reasoning)
→ subagent_type: general-purpose, model: sonnet
Subtask 3 - Implement fix:
I: "In src/api/users.py:42, the code accesses request.json['email']
but the field was renamed to 'email_address' in the schema update.
Fix the key access and update the test assertion to match."
C: "Root cause: schema migration renamed 'email' to 'email_address'
in UserCreate model. Source file: src/api/users.py line 42.
Test file: tests/api/test_users.py line 18."
T: Read, Edit
M: sonnet (code modification requires precision)
→ subagent_type: general-purpose, model: sonnet
Subtask 4 - Verify:
I: "Run pytest tests/api/ and confirm all tests pass. Report results."
C: "Fix applied: changed request.json['email'] to
request.json['email_address'] in src/api/users.py:42"
T: Bash
M: haiku (just running tests)
→ subagent_type: Bash, model: haiku
Result: Tests pass. User sees diagnosis, fix explanation, and green tests.
Example 3: Cost-optimized document processing pipeline
User: "Process all PDF invoices in /data/invoices/, extract totals,
and flag any with amounts over $10,000."
Orchestration plan:
Subtask 1 - Enumerate files:
I: "List all PDF files in /data/invoices/ and return their paths."
C: (none)
T: Glob
M: haiku (trivial file listing)
→ subagent_type: Explore, model: haiku
Subtask 2 - Extract totals (per-file, parallelizable):
I: "Read the PDF at {path}. Extract the invoice total amount.
Return: {filename, total_amount, currency}."
C: (none -- independent per file)
T: Read (PDF support)
M: haiku (structured extraction from clear documents)
→ subagent_type: general-purpose, model: haiku
Note: Launch multiple agents in parallel for throughput.
Subtask 3 - Analyze and flag:
I: "Given the extracted invoice data, identify all invoices with
total > $10,000. Format as a markdown table with columns:
File, Amount, Flagged."
C: "Extracted data: [{invoice_001.pdf, $4,500, USD}, ...]"
T: (none -- pure reasoning)
M: haiku (simple filtering and formatting)
→ subagent_type: general-purpose, model: haiku
Result: Flagged invoices table delivered to user.
Do:
haiku for search, file operations, and formatting; use sonnet or opus for code reasoning, debugging, and architectural decisions.Avoid:
Bash agent can't use Edit).aorchestra/tools/delegate.py for tuple construction and aorchestra/main_agent.py for the Review-Evaluate-Decide loop.development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".