skills/agent-primitives-reusable-latent/SKILL.md
Design and orchestrate multi-agent systems using reusable Agent Primitives (Review, Voting/Selection, Planning/Execution) that compose into task-specific pipelines. Use when asked to: 'build a multi-agent workflow', 'create an agent pipeline for this task', 'set up agents to review and refine output', 'orchestrate parallel agent voting', 'decompose this into a planning and execution pipeline', 'design a reusable agent architecture'.
npx skillsauth add ndpvt-web/arxiv-claude-skills agent-primitives-reusable-latentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to design, compose, and implement multi-agent systems (MAS) from three reusable primitives -- Review, Voting and Selection, and Planning and Execution -- instead of hand-crafting bespoke agent roles for every task. Inspired by how neural networks are built from standard layers (convolution, attention, etc.), Agent Primitives let you decompose any complex MAS architecture into a small set of recurring computation patterns, then recombine them for new problems. This yields systems that are 12-16% more accurate than single-agent baselines while using 3-4x fewer tokens than text-based multi-agent approaches.
The core insight is that most multi-agent architectures -- Self-Refine, Multi-Agent Debate, Self-Consistency, task decomposition -- reduce to just three internal computation patterns. Rather than designing unique agent roles and interaction prompts per task, you instantiate one or more of these primitives and wire them together:
Review Primitive -- A Solver generates an initial response; a Critic evaluates it and provides structured feedback; the Solver refines iteratively. This replaces Self-Refine, debate, and any iterative-improvement loop. The Critic must identify errors but never revise the solution itself -- separation of concerns prevents feedback collapse.
Voting and Selection Primitive -- Multiple independent Solvers each produce a candidate solution in parallel. A Selector agent evaluates all candidates and picks or aggregates the best one based on correctness and consistency. This replaces majority voting, self-consistency, and any ensemble approach.
Planning and Execution Primitive -- A Planner decomposes the task into structured intermediate steps. An Executor follows the plan to produce the final output, without modifying the plan. This replaces task decomposition, chain-of-thought scaffolding, and orchestrator-worker patterns.
Composition via an Organizer. An Organizer agent acts as a lightweight router: given a new query, it consults a knowledge pool of (query-pattern, effective-primitive-composition) pairs, selects which primitives to instantiate, determines execution order (sequential, parallel, nested), and emits a composition plan. The knowledge pool is populated from successful runs and can be bootstrapped from known MAS architectures mapped to their primitive decompositions.
Efficiency through structured state passing. In the original paper, primitives communicate via KV cache concatenation rather than serialized text, reducing information degradation across stages. When implementing with Claude Code agents (which communicate via text), we approximate this by passing structured JSON state objects between primitives -- not free-form natural language summaries. This preserves signal fidelity and keeps token counts low.
Analyze the task and identify the primitive composition. Classify the user's task: does it need iterative refinement (Review), parallel diversity (Voting/Selection), decomposition (Planning/Execution), or a combination? Map the task to a primitive graph. For example, a complex math problem might use Planning/Execution -> Review, while a code generation task might use Voting/Selection -> Review.
Design the Organizer's routing logic. Define a decision function that maps task characteristics to primitive compositions. Use a simple heuristic or a knowledge pool lookup:
Instantiate the Review primitive (if selected). Create two agents with distinct system prompts:
Instantiate the Voting/Selection primitive (if selected). Create N independent Solver agents (typically 3-5) and one Selector:
Instantiate the Planning/Execution primitive (if selected). Create a Planner and an Executor:
Wire primitives together via structured state. Define the data contract between primitives as a JSON schema. Each primitive receives a structured input and emits a structured output. Example state object:
{
"query": "original user query",
"plan": ["step1", "step2"],
"candidates": [{"solution": "...", "confidence": 0.9}],
"review_feedback": ["issue1", "issue2"],
"final_answer": "..."
}
Implement the execution engine. Write the orchestration code that:
Populate the knowledge pool. After each successful run, store the (task_type, primitive_composition, accuracy) triple. On future runs, the Organizer retrieves the top-k most similar past configurations to guide its routing decision. Start with these seed mappings:
Tune iteration and parallelism parameters. For Review: 2-3 iterations is typically optimal (diminishing returns beyond 3). For Voting: 3-5 candidates balances diversity against cost. For Planning: limit plan steps to 3-7 to avoid over-decomposition.
Validate and measure. Compare the primitive-based MAS against a single-agent baseline on the target task. Track three metrics: accuracy, total tokens used, and end-to-end latency. Expect 12-16% accuracy gains at 1.3-1.6x the single-agent token cost.
Example 1: Code Generation with Review Primitive
User: "Build an agent pipeline that generates a Python function and then reviews it for bugs."
Approach:
Implementation:
import subprocess, json
def run_agent(system_prompt, user_message):
"""Call Claude via CLI or API with a system prompt and user message."""
# Replace with your preferred invocation method
result = subprocess.run(
["claude", "-p", user_message, "--system", system_prompt, "--output-format", "json"],
capture_output=True, text=True
)
return json.loads(result.stdout)["result"]
def review_primitive(query, max_iterations=2):
solver_prompt = (
"You are a Solver. Write a Python function for the given specification. "
"If review feedback is provided, fix the identified issues."
)
critic_prompt = (
"You are a Critic. Review the code below for bugs, edge cases, and logic errors. "
"Output a JSON array of issues found. Output an empty array [] if no issues remain. "
"Do NOT fix or rewrite the code."
)
solution = run_agent(solver_prompt, f"Specification: {query}")
for i in range(max_iterations):
feedback = run_agent(critic_prompt, f"Code to review:\n```python\n{solution}\n```")
if feedback.strip() == "[]":
break
solution = run_agent(
solver_prompt,
f"Specification: {query}\n\nPrevious solution:\n```python\n{solution}\n```\n\nFeedback: {feedback}"
)
return solution
# Usage
result = review_primitive("Write a function that merges two sorted lists into one sorted list.")
print(result)
Output: A refined Python function that has been reviewed for bugs across 2 iterations, with each iteration targeting specific issues identified by the Critic.
Example 2: Math Problem Solving with Voting/Selection
User: "I need a reliable agent setup for solving competition math problems. Single attempts are too inconsistent."
Approach:
Implementation:
import concurrent.futures
def voting_primitive(query, num_solvers=5):
solver_prompt = (
"Solve the following math problem step by step. "
"Show your work and state the final answer clearly."
)
# Run solvers in parallel
with concurrent.futures.ThreadPoolExecutor(max_workers=num_solvers) as pool:
futures = [pool.submit(run_agent, solver_prompt, query) for _ in range(num_solvers)]
candidates = [f.result() for f in concurrent.futures.as_completed(futures)]
# Selector evaluates candidates
selector_prompt = (
"You are a Selector. You are given multiple candidate solutions to a math problem. "
"Evaluate each for correctness. Identify the most common final answer. "
"If answers disagree, verify the reasoning of each and select the correct one. "
"Output ONLY the final verified answer."
)
candidate_text = "\n\n---\n\n".join(
[f"Candidate {i+1}:\n{c}" for i, c in enumerate(candidates)]
)
return run_agent(selector_prompt, f"Problem: {query}\n\n{candidate_text}")
result = voting_primitive("Find all real solutions to x^4 - 5x^2 + 6 = 0.")
Output: The Selector identifies that 4 of 5 candidates agree on x = +/-sqrt(2), +/-sqrt(3), confirms the reasoning, and returns the verified answer.
Example 3: Research Synthesis with Composed Primitives (Planning/Execution -> Review)
User: "Design an agent system that can take a research question, break it down, gather findings, and produce a vetted synthesis."
Approach:
Implementation:
def composed_pipeline(query):
# Stage 1: Planning/Execution
planner_prompt = (
"Decompose this research question into 3-5 specific sub-questions. "
"Output as a JSON array of strings."
)
plan = json.loads(run_agent(planner_prompt, query))
executor_prompt = (
"You are given a research question and a structured plan of sub-questions. "
"Address each sub-question, then synthesize a coherent answer to the main question. "
"Follow the plan order exactly."
)
plan_text = "\n".join([f"{i+1}. {step}" for i, step in enumerate(plan)])
synthesis = run_agent(executor_prompt, f"Question: {query}\n\nPlan:\n{plan_text}")
# Stage 2: Review
refined = review_primitive(
f"Review and improve this research synthesis for accuracy and completeness:\n\n"
f"Original question: {query}\n\nSynthesis:\n{synthesis}",
max_iterations=1
)
return refined
Do:
Avoid:
Paper: Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems (Jin et al., 2026). Look for Section 3 (primitive definitions and KV cache formulation), Section 4 (Organizer and knowledge pool design), and Table 2 (accuracy comparisons across 8 benchmarks showing 12-16% gains over single-agent baselines at 1.3-1.6x cost).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".