skills/from-assumptions-actions-turning/SKILL.md
Build uncertainty-aware planners for multi-agent systems using the PCE (Planner-Composer-Evaluator) decision tree framework. Converts implicit LLM reasoning assumptions into scored decision trees that select actions under uncertainty without heavy inter-agent communication. Use when: 'build an agent that plans under uncertainty', 'create a decision tree from assumptions', 'multi-agent planning with partial observability', 'reduce agent communication overhead', 'score actions by likelihood and cost', 'uncertainty-aware action selection'.
npx skillsauth add ndpvt-web/arxiv-claude-skills from-assumptions-actions-turningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to implement the Planner-Composer-Evaluator (PCE) framework from ICLR 2026, which turns the implicit assumptions buried inside LLM reasoning traces into explicit, scored decision trees. Instead of relying on expensive back-and-forth communication between agents to resolve uncertainty, PCE structures what the agent already suspects into a tree where internal nodes are binary assumptions about the world, leaves are candidate actions, and each path is scored by scenario likelihood, goal-directed gain, and execution cost. The result is principled action selection under partial observability with minimal communication.
The core insight is that when an LLM reasons about what to do next, its chain-of-thought already contains assumptions about the environment—they are just buried, fragmented, and never systematically evaluated. PCE extracts these assumptions, organizes them into a binary decision tree, and scores each root-to-leaf path to pick the best action. This replaces the common pattern of "ask the other agent to resolve my uncertainty" with "structure my uncertainty, estimate which scenario is most likely, and act on the best expected payoff."
The three phases work as follows. The Planner takes the agent's context (goal, progress, observations, message history, available actions) and generates a candidate action with a reasoning trace. The Composer then mines that trace for implicit assumptions, builds a decision tree of depth D (default 3) where each internal node is a True/False assumption split and each leaf is an action (physical or communicative). The tree is expanded top-down, prioritizing assumptions that most reduce uncertainty and most influence action choice; expansion stops early when further splits would not change the recommended action. Finally, the Evaluator scores every root-to-leaf path using: Scenario Likelihood L(S) (how probable is this branch's premise?), Conditional Gain G(a) (how much does this action advance the goal given the premise?), and Execution Cost C(a) (movement distance or communication token cost). The utility formula is U(S, a) = L(S) * G(a) - lambda * C(a), and the agent executes the leaf action with maximum U.
Why this works better than scaling communication: PCE treats communication as just another action to be scored against physical alternatives, rather than as the default mechanism for uncertainty resolution. This means the agent only communicates when the expected information gain exceeds the cost—producing communication patterns that human partners in user studies rated as more efficient and trustworthy.
Gather agent context. Collect the agent's current goal, task progress so far, observation history (last K_action=10 actions), message log (last K_message=3 messages), and the list of available actions (physical moves, object interactions, communication).
Run the Planner phase. Prompt the LLM with the context and ask it to select an action with full reasoning. Capture the entire chain-of-thought trace—this is the raw material containing latent assumptions.
Extract assumptions from the reasoning trace. Parse the Planner's trace to identify conditional statements, hedges, and uncertainty markers ("might be," "if X is in Y," "assuming the other agent hasn't already," etc.). Each becomes a candidate assumption node.
Build the decision tree (Composer phase). Construct a binary tree of depth D (start with D=3). At each internal node, place the assumption that most divides the remaining action space. For each True/False branch, either add another assumption node or terminate with a leaf action. Stop expanding when further splits would not change the recommended action at that subtree.
Assign leaf actions. Each leaf gets a concrete action from the available action set. Multiple leaves may share the same action if it is robust across scenarios. Leaves can be physical actions (go to location, pick up object) or communication actions (send a message asking for information).
Score each path with the Evaluator. For every root-to-leaf path, compute three values using LLM estimation:
L(S): Likelihood that the conjunction of assumptions along this path is true (0-1 scale)G(a): Goal-directed gain if action a is executed and the scenario holds (0-1 scale)C(a) = alpha * d(a) * is_move + beta * l(a) * is_comm: Execution cost combining movement distance and communication token length (default alpha=1, beta=1)Compute utility and select. For each leaf, calculate U = L(S) * G(a) - lambda * C(a) (default lambda=1). Select the action corresponding to the maximum-utility leaf.
Execute and iterate. Execute the selected action, update the observation history, and repeat from step 1 at the next planning cycle. The tree is rebuilt each cycle with fresh context.
Tune hyperparameters. Adjust tree depth D (higher = more nuanced but costlier), lambda (higher = more cost-sensitive, favoring cheap actions), and alpha/beta ratio (higher alpha penalizes movement more; higher beta penalizes communication more) based on domain requirements.
Validate with ablation. Test the system with individual components removed (no tree structure, no cost scoring, no likelihood estimation) to confirm each contributes to performance in your specific domain.
Example 1: Multi-agent household task planner
User: "Build a planner for two agents collaborating to set a dinner table. Each agent has partial visibility and can move between rooms, pick up items, or send short messages."
Approach:
{goal: "set table with plates, cups, napkins", agent_obs: [...], partner_messages: [...], available_actions: [goto, pickup, putdown, send_message]}Root: "Plates are in kitchen cabinet"
├─ True: "Partner already picked up plates"
│ ├─ True → [goto] dining_room (plates handled, do other tasks)
│ └─ False → [goto] kitchen (go get the plates)
└─ False: "Plates are in dining room sideboard"
├─ True → [goto] dining_room (check sideboard)
└─ False → [send_message] "Do you know where the plates are?"
paths = [
{"scenario": "kitchen+partner_has", "L": 0.3, "G": 0.6, "C": 2, "U": 0.3*0.6 - 1*2 = -1.82},
{"scenario": "kitchen+partner_hasnt","L": 0.4, "G": 0.9, "C": 3, "U": 0.4*0.9 - 1*3 = -2.64},
{"scenario": "not_kitchen+sideboard","L": 0.2, "G": 0.8, "C": 1, "U": 0.2*0.8 - 1*1 = -0.84},
{"scenario": "not_kitchen+not_side", "L": 0.1, "G": 0.5, "C": 0.5,"U": 0.1*0.5 - 1*0.5 = -0.45},
]
# Selected: path 4 → send_message (lowest cost, acceptable gain given high overall uncertainty)
# But if agent is near sideboard: path 3 cost drops, and it wins
Output: The agent selects the action with highest U given current position and observation history.
Example 2: Implementing PCE as a Python module
User: "Give me a reusable PCE decision tree implementation I can plug into my LLM agent loop."
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class TreeNode:
assumption: Optional[str] = None # None for leaf nodes
action: Optional[str] = None # None for internal nodes
true_branch: Optional["TreeNode"] = None
false_branch: Optional["TreeNode"] = None
@dataclass
class ScoredPath:
assumptions: list[tuple[str, bool]] # (assumption_text, assumed_true)
action: str
likelihood: float # L(S): product of branch likelihoods
gain: float # G(a): goal-directed gain
cost: float # C(a): execution cost
utility: float = 0.0 # U = L*G - lambda*C
def extract_assumptions(reasoning_trace: str, llm_fn) -> list[str]:
"""Ask LLM to extract binary assumptions from a reasoning trace."""
prompt = (
"Extract the key uncertain assumptions from this reasoning trace. "
"Return each as a yes/no statement.\n\n"
f"Trace: {reasoning_trace}"
)
response = llm_fn(prompt)
return [line.strip("- ") for line in response.strip().split("\n") if line.strip()]
def build_tree(assumptions: list[str], actions: list[str], llm_fn, depth: int = 3) -> TreeNode:
"""Recursively build a binary decision tree from assumptions."""
if depth == 0 or not assumptions:
best_action = llm_fn(
f"Given these remaining assumptions {assumptions}, "
f"which action from {actions} is best?"
)
return TreeNode(action=best_action)
node = TreeNode(assumption=assumptions[0])
remaining = assumptions[1:]
node.true_branch = build_tree(remaining, actions, llm_fn, depth - 1)
node.false_branch = build_tree(remaining, actions, llm_fn, depth - 1)
return node
def enumerate_paths(node: TreeNode, path=None) -> list[ScoredPath]:
"""Walk the tree and collect all root-to-leaf paths."""
if path is None:
path = []
if node.action is not None:
return [ScoredPath(assumptions=list(path), action=node.action,
likelihood=0, gain=0, cost=0)]
results = []
results += enumerate_paths(node.true_branch, path + [(node.assumption, True)])
results += enumerate_paths(node.false_branch, path + [(node.assumption, False)])
return results
def score_paths(paths: list[ScoredPath], context: dict, llm_fn,
alpha=1.0, beta=1.0, lam=1.0) -> list[ScoredPath]:
"""Score each path using LLM-estimated likelihood, gain, and cost."""
for p in paths:
scenario_desc = ", ".join(
f"{a} is {'true' if v else 'false'}" for a, v in p.assumptions
)
p.likelihood = float(llm_fn(
f"Rate 0-1 how likely this scenario is given observations: {scenario_desc}\n"
f"Context: {context}"
))
p.gain = float(llm_fn(
f"Rate 0-1 how much action '{p.action}' advances the goal "
f"given scenario: {scenario_desc}\nGoal: {context['goal']}"
))
is_move = p.action.startswith("goto")
is_comm = p.action.startswith("send")
dist = float(llm_fn(f"Estimate distance for {p.action}")) if is_move else 0
msg_len = float(llm_fn(f"Estimate token length for {p.action}")) if is_comm else 0
p.cost = alpha * dist * int(is_move) + beta * msg_len * int(is_comm)
p.utility = p.likelihood * p.gain - lam * p.cost
return sorted(paths, key=lambda p: p.utility, reverse=True)
def pce_select_action(context: dict, llm_fn, depth=3, lam=1.0) -> str:
"""Full PCE pipeline: plan, compose tree, evaluate, return best action."""
# Planner
reasoning = llm_fn(f"Plan next action with reasoning: {context}")
# Composer
assumptions = extract_assumptions(reasoning, llm_fn)[:depth]
tree = build_tree(assumptions, context["available_actions"], llm_fn, depth)
paths = enumerate_paths(tree)
# Evaluator
scored = score_paths(paths, context, llm_fn, lam=lam)
return scored[0].action
Example 3: Adding PCE to an existing ReAct agent loop
User: "I have a ReAct agent that keeps asking its partner agent redundant questions. How do I add PCE to reduce unnecessary communication?"
Approach:
send_message is one possible leaf but physical actions are alternativessend_message wins on utility despite the higher cost penalty# In your ReAct loop, replace direct action emission:
# OLD:
# action = llm(f"Choose action: {context}")
# NEW:
action = pce_select_action(
context={"goal": task_goal, "obs": observations,
"messages": msg_log[-3:], "available_actions": action_list},
llm_fn=your_llm_call,
depth=3,
lam=1.0 # increase to further suppress costly actions
)
Result: Communication drops by 40-60% while task success rate improves, matching the paper's findings on C-WAH and TDW-MAT benchmarks.
Paper: "From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents" (ICLR 2026) arXiv: https://arxiv.org/abs/2602.04326v1 What to look for: Section 3 for the full PCE formalization (Planner prompt templates, Composer tree-building algorithm, Evaluator utility formula), Section 5 for ablation results showing which components matter most, and Appendix for prompt templates and hyperparameter sensitivity analysis.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".