skills/skillxiv-v0.0.2-claude-opus-4.6/bavt-budget-aware-search/SKILL.md
Allocate LLM reasoning budget optimally via value tree search: use residual value prediction to estimate step utility, then dynamically shift exploration-exploitation balance as budget depletes. Outperform high-budget baselines at 1/4 cost.
npx skillsauth add ADu2021/skillXiv bavt-budget-aware-searchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Agentic reasoning requires exploring multiple paths, but computational budgets are finite. Budget-Aware Value Tree (BAVT) search makes principled allocation decisions: it estimates marginal utility per step using residual value prediction, then dynamically modulates exploration strength as budget depletes.
The key insight is a power-law scaling exponent inversely proportional to remaining budget, creating smooth transitions from broad exploration to aggressive greedy exploitation.
BAVT operates through three mechanisms:
This achieves superior performance under budget constraints: 4× budget-efficient compared to baseline.
Critic estimates incremental progress, not absolute returns.
import torch
import torch.nn as nn
class ResidualValuePredictor(nn.Module):
def __init__(self, hidden_dim=768):
super().__init__()
self.hidden_dim = hidden_dim
# MLP for residual value prediction
self.predictor = nn.Sequential(
nn.Linear(hidden_dim, hidden_dim * 2),
nn.ReLU(),
nn.Linear(hidden_dim * 2, 1) # Single scalar output (residual)
)
def forward(self, context_embeddings):
"""
Predict residual value from current state.
context_embeddings: (batch, hidden_dim) current reasoning state
returns: (batch, 1) residual value predictions (information delta)
"""
residual_value = self.predictor(context_embeddings)
return residual_value
class CriticHead(nn.Module):
def __init__(self, model, hidden_dim=768):
super().__init__()
self.model = model
self.value_predictor = ResidualValuePredictor(hidden_dim)
def estimate_residual_value(self, trajectory_text):
"""
Estimate marginal utility of reaching current state.
trajectory_text: str of reasoning steps so far
"""
# Encode trajectory
hidden_state = self.model.encode(trajectory_text) # (1, hidden_dim)
# Predict residual value
residual = self.value_predictor(hidden_state)
return residual.item()
Dynamically adjust exploration-exploitation trade-off based on remaining budget.
class BudgetAwareUCB:
def __init__(self, exploration_constant=1.0):
self.exploration_constant = exploration_constant
def compute_ucb_score(
self,
node_value,
visit_count,
total_budget,
remaining_budget
):
"""
Compute UCB score with budget-dependent exploration.
Scaling exponent: α_t = 1 / r_t (r_t = remaining budget)
"""
# Exploitation term: average value
exploitation = node_value / (visit_count + 1e-8)
# Exploration term with budget-dependent scaling
exploration_scale = self.exploration_constant / (remaining_budget + 1e-8)
# UCB = exploitation + sqrt(scale * ln(parent_visits) / visit_count)
import math
ucb_score = (
exploitation +
exploration_scale * math.sqrt(
math.log(total_budget + 1) / (visit_count + 1e-8)
)
)
return ucb_score
def select_best_child(self, node, remaining_budget, total_budget):
"""
Select child with highest UCB score.
"""
best_child = None
best_score = -float('inf')
for child in node.children:
ucb_score = self.compute_ucb_score(
child.value_sum,
child.visit_count,
total_budget,
remaining_budget
)
if ucb_score > best_score:
best_score = ucb_score
best_child = child
return best_child
Use single LLM model for both generation and critic roles.
class TreeNode:
def __init__(self, text="", parent=None):
self.text = text
self.parent = parent
self.children = []
self.visit_count = 0
self.value_sum = 0.0
self.residual_value = None
class BudgetAwareValueTreeSearch:
def __init__(self, model, critic_head, budget=1000):
self.model = model
self.critic = critic_head
self.total_budget = budget
self.remaining_budget = budget
def search(self, question, max_depth=10):
"""
Perform budget-aware tree search for question.
"""
root = TreeNode(question)
ucb = BudgetAwareUCB()
for budget_step in range(self.total_budget):
# Update remaining budget
self.remaining_budget = self.total_budget - budget_step
# Selection and expansion
leaf_node = self._select_and_expand(
root,
ucb,
max_depth
)
if leaf_node is None:
break
# Evaluation: critic estimates residual value
residual_value = self.critic.estimate_residual_value(
leaf_node.text
)
# Backup: propagate value up tree
self._backup(leaf_node, residual_value)
# Select best trajectory
best_path = self._extract_best_path(root)
return best_path
def _select_and_expand(self, node, ucb, max_depth):
"""
Traverse tree using UCB, expand at leaf.
"""
current = node
depth = 0
while current.children and depth < max_depth:
current = ucb.select_best_child(
current,
self.remaining_budget,
self.total_budget
)
depth += 1
# Expand: generate next steps
if depth < max_depth:
next_steps = self.model.generate_next_steps(
current.text,
num_candidates=3
)
for step in next_steps:
child = TreeNode(current.text + "\n" + step, parent=current)
current.children.append(child)
# Return first child for evaluation
if current.children:
return current.children[0]
return None
def _backup(self, node, value):
"""
Propagate value up tree from leaf.
"""
current = node
while current is not None:
current.visit_count += 1
current.value_sum += value
current = current.parent
def _extract_best_path(self, root):
"""
Extract highest-value path from root to leaf.
"""
path = []
current = root
while current.children:
# Select child with highest average value
best_child = max(
current.children,
key=lambda c: c.value_sum / (c.visit_count + 1e-8)
)
path.append(best_child.text)
current = best_child
return '\n'.join(path)
Use model for both roles without additional training.
class DualRoleLLMAgent:
def __init__(self, base_model):
self.model = base_model
self.critic_head = CriticHead(base_model)
def reason_with_budget(
self,
question,
budget_tokens=1000,
max_depth=10
):
"""
Multi-step reasoning within token budget.
"""
search = BudgetAwareValueTreeSearch(
self.model,
self.critic_head,
budget=budget_tokens
)
best_reasoning = search.search(question, max_depth)
return best_reasoning
def generate_next_steps(self, context, num_candidates=3):
"""
Generator role: propose next reasoning steps.
"""
prompt = f"{context}\n\nNext reasoning steps:"
steps = []
for _ in range(num_candidates):
step = self.model.generate(prompt, max_tokens=50, temperature=0.7)
steps.append(step.strip())
return steps
def compare_budgets(self, question, budgets=[100, 250, 500, 1000]):
"""
Demonstrate budget-performance trade-off.
"""
results = {}
for budget in budgets:
reasoning = self.reason_with_budget(question, budget)
# Evaluate (external)
score = self.evaluate(reasoning, question)
results[budget] = score
return results
Compare budget efficiency to standard approaches.
def benchmark_budget_efficiency():
"""
Demonstrate BAVT efficiency gains.
"""
agent = DualRoleLLMAgent(base_model)
questions = [...] # Test set
results = {
'bavt': {},
'baseline_fixed': {},
'baseline_10x_budget': {}
}
# BAVT with varying budgets
for budget in [250, 500, 1000]:
scores = []
for q in questions:
answer = agent.reason_with_budget(q, budget)
score = evaluate(answer, q)
scores.append(score)
results['bavt'][budget] = sum(scores) / len(scores)
# Baseline with fixed budget
baseline_budget = 1000
scores = []
for q in questions:
answer = agent.reason_fixed_budget(q, baseline_budget)
score = evaluate(answer, q)
scores.append(score)
results['baseline_fixed'][baseline_budget] = sum(scores) / len(scores)
# Baseline with 10x budget
scores = []
for q in questions:
answer = agent.reason_fixed_budget(q, baseline_budget * 10)
score = evaluate(answer, q)
scores.append(score)
results['baseline_10x_budget'][baseline_budget * 10] = sum(scores) / len(scores)
return results
When to Use:
When NOT to Use:
Hyperparameter Tuning:
Common Pitfalls:
BAVT paper on arXiv
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.