skills/skillxiv-v0.0.2-claude-opus-4.6/dynaact-dynamic-action-spaces-reasoning/SKILL.md
Automatically construct compact, diverse action spaces for LLM reasoning through corpus-based estimation and submodular optimization—enabling efficient decision-making without manual specification or expensive exhaustive search.
npx skillsauth add ADu2021/skillXiv dynaact-dynamic-action-spaces-reasoningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Language model agents typically use manually-defined action spaces (fixed operations available at each step) that lack scalability, or generate all possible actions exhaustively—an expensive approach. DynaAct solves this through a two-stage method: automatically estimate the space of viable actions from a corpus of reasoning tasks, then greedily select a compact subset optimized for relevance and diversity to the current problem.
The result is efficient, adaptive action spaces that improve performance on six reasoning benchmarks while maintaining fast inference without significant latency overhead.
DynaAct treats action space construction as a submodular optimization problem. The system learns general action patterns from diverse reasoning examples, then selects task-specific actions balancing two criteria:
This two-stage approach avoids both manual specification burdens and exhaustive generation costs, enabling scalable, efficient reasoning agents.
Step 1: Extract Action Sketches from Corpus
Analyze diverse reasoning problems to identify common solution patterns.
def extract_action_sketches(problem_corpus: List[str], llm_api) -> List[str]:
"""
Extract high-level action patterns from diverse reasoning problems.
Args:
problem_corpus: List of reasoning problem descriptions
llm_api: Language model API for analysis
Returns:
action_sketches: List of general solution strategy descriptions
"""
prompt_template = """Analyze this reasoning problem and identify the key
solution strategy or action type needed:
Problem: {problem}
What high-level action or strategy would solve this?
Examples: 'decompose into subproblems', 'search through possibilities',
'construct a proof', 'optimize a sequence', etc.
Respond with only the action description."""
action_sketches = set()
# Sample diverse problems to discover action types
sampled_problems = random.sample(
problem_corpus,
min(1000, len(problem_corpus))
)
for problem in sampled_problems:
prompt = prompt_template.format(problem=problem)
sketch = llm_api.generate(prompt, max_tokens=50)
action_sketches.add(sketch.strip())
return list(action_sketches)
def estimate_action_space(action_sketches: List[str]) -> Dict[str, List[str]]:
"""
Build structured representation of estimated action space.
Args:
action_sketches: Extracted action strategy descriptions
Returns:
action_space: Dictionary mapping action types to implementations
"""
action_categories = {
'decomposition': [],
'search': [],
'construction': [],
'optimization': [],
'verification': []
}
for sketch in action_sketches:
sketch_lower = sketch.lower()
# Categorize sketches
if any(word in sketch_lower for word in ['decompose', 'split', 'break']):
action_categories['decomposition'].append(sketch)
elif any(word in sketch_lower for word in ['search', 'find', 'explore']):
action_categories['search'].append(sketch)
elif any(word in sketch_lower for word in ['construct', 'build', 'create']):
action_categories['construction'].append(sketch)
elif any(word in sketch_lower for word in ['optimize', 'minimize', 'maximize']):
action_categories['optimization'].append(sketch)
elif any(word in sketch_lower for word in ['verify', 'check', 'validate']):
action_categories['verification'].append(sketch)
return action_categories
Step 2: Define Submodular Objective Function
Create an objective that balances relevance and diversity in action selection.
import numpy as np
from typing import Callable
class SubmodularActionObjective:
"""
Submodular function measuring quality of action subset.
"""
def __init__(self, all_actions: List[str], llm_relevance_fn: Callable):
"""
Args:
all_actions: Complete set of candidate actions
llm_relevance_fn: Function computing action relevance to problem
"""
self.all_actions = all_actions
self.relevance_fn = llm_relevance_fn
# Precompute action embeddings for diversity
from sentence_transformers import SentenceTransformer
self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
self.embeddings = self.encoder.encode(all_actions)
def compute_relevance(self, problem: str, actions: List[str]) -> np.ndarray:
"""
Compute relevance scores: how well each action fits the problem.
Args:
problem: Current problem description
actions: Candidate actions
Returns:
scores: Relevance scores [0, 1]
"""
scores = []
# Could use LLM, but efficient approximation: check keyword overlap
problem_keywords = set(problem.lower().split())
for action in actions:
action_keywords = set(action.lower().split())
overlap = len(problem_keywords & action_keywords)
score = overlap / max(len(problem_keywords), 1)
scores.append(score)
return np.array(scores)
def compute_diversity(self, selected_actions_idx: List[int]) -> float:
"""
Measure diversity of selected actions (average pairwise distance).
Args:
selected_actions_idx: Indices of selected actions
Returns:
diversity_score: Higher means more diverse [0, 1]
"""
if len(selected_actions_idx) <= 1:
return 0.0
diversity = 0.0
selected_embeddings = self.embeddings[selected_actions_idx]
# Average pairwise cosine distance
from sklearn.metrics.pairwise import cosine_distances
distances = cosine_distances(selected_embeddings)
diversity = distances[np.triu_indices_from(distances, k=1)].mean()
return diversity
def evaluate_subset(self, problem: str, subset_indices: List[int],
relevance_weight: float = 0.7,
diversity_weight: float = 0.3) -> float:
"""
Evaluate quality of action subset for given problem.
Args:
problem: Problem description
subset_indices: Indices of selected actions
relevance_weight: Weight for relevance term
diversity_weight: Weight for diversity term
Returns:
objective_value: Combined score [0, 1]
"""
subset_actions = [self.all_actions[i] for i in subset_indices]
# Compute relevance of subset (average relevance)
relevance_scores = self.compute_relevance(problem, subset_actions)
avg_relevance = relevance_scores.mean()
# Compute diversity of subset
diversity = self.compute_diversity(subset_indices)
# Combined objective (submodular)
objective = (relevance_weight * avg_relevance +
diversity_weight * diversity)
return objective
Step 3: Greedy Selection Algorithm
Implement greedy selection to maximize submodular objective.
def greedy_action_selection(problem: str, objective: SubmodularActionObjective,
k: int = 5, candidate_actions: List[str] = None) -> List[str]:
"""
Greedily select k actions maximizing submodular objective.
Args:
problem: Current problem description
objective: SubmodularActionObjective instance
k: Number of actions to select
candidate_actions: Actions to consider (or use objective.all_actions)
Returns:
selected_actions: Top-k selected action descriptions
"""
if candidate_actions is None:
candidate_actions = objective.all_actions
# Indices of all candidate actions
candidate_indices = list(range(len(candidate_actions)))
# Greedy selection
selected_indices = []
for step in range(k):
best_idx = None
best_gain = -float('inf')
# Try adding each remaining action
for idx in candidate_indices:
if idx in selected_indices:
continue
# Evaluate objective with this action added
candidate_subset = selected_indices + [idx]
value = objective.evaluate_subset(problem, candidate_subset)
# Marginal gain (submodular property)
if selected_indices:
current_value = objective.evaluate_subset(problem, selected_indices)
gain = value - current_value
else:
gain = value
if gain > best_gain:
best_gain = gain
best_idx = idx
if best_idx is not None:
selected_indices.append(best_idx)
# Return selected actions
selected_actions = [candidate_actions[i] for i in selected_indices]
return selected_actions
Step 4: Action Space Augmented Reasoning
Integrate selected actions into LLM prompts during inference.
def construct_reasoning_prompt_with_actions(
problem: str,
selected_actions: List[str],
reasoning_context: str = "") -> str:
"""
Build prompt with dynamic action space.
Args:
problem: Problem to reason about
selected_actions: Curated actions for this problem
reasoning_context: Prior reasoning steps (if any)
Returns:
augmented_prompt: Prompt with available actions
"""
prompt = f"""Solve this problem using the following available actions:
Problem: {problem}
Available actions:
"""
for i, action in enumerate(selected_actions, 1):
prompt += f"{i}. {action}\n"
if reasoning_context:
prompt += f"\nCurrent progress: {reasoning_context}\n"
prompt += "\nChoose the best action to take next and explain your reasoning."
return prompt
def run_reasoning_with_dynamic_actions(
problem: str,
objective: SubmodularActionObjective,
llm_api,
max_steps: int = 10,
k_actions: int = 5) -> Dict[str, Any]:
"""
Execute multi-step reasoning with dynamically selected actions.
Args:
problem: Problem to solve
objective: SubmodularActionObjective instance
llm_api: Language model for reasoning
max_steps: Maximum reasoning steps
k_actions: Number of actions to select per step
Returns:
reasoning_result: {final_answer, steps, actions_used}
"""
reasoning_context = ""
reasoning_steps = []
actions_used = []
for step in range(max_steps):
# Select actions for current state
selected_actions = greedy_action_selection(
problem + " " + reasoning_context,
objective,
k=k_actions
)
actions_used.append(selected_actions)
# Construct prompt with actions
prompt = construct_reasoning_prompt_with_actions(
problem, selected_actions, reasoning_context
)
# Get LLM reasoning step
response = llm_api.generate(prompt, max_tokens=500)
reasoning_steps.append(response)
# Update context for next step
reasoning_context += "\n" + response
# Check for termination (simple heuristic)
if "final answer" in response.lower() or step == max_steps - 1:
break
return {
'reasoning_steps': reasoning_steps,
'actions_used': actions_used,
'final_reasoning': reasoning_context
}
When to Use DynaAct:
When NOT to Use:
Hyperparameters and Configuration:
Pitfalls to Avoid:
Reference: https://arxiv.org/abs/2511.08043
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.