skills/skillxiv-v0.0.2-claude-opus-4.6/adaptive-agent-foundation-model/SKILL.md
Route queries to specialized reasoning modes (internal reasoning, tool calling, or instant answers) using task-aware routing and Adaptive Policy Optimization to reduce inference costs by 45% while maintaining accuracy.
npx skillsauth add ADu2021/skillXiv adaptive-agent-foundation-modelInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Current agent systems face a fundamental efficiency problem: reasoning-centric LLMs excel at internal chain-of-thought but cannot invoke external tools, while agentic LLMs can call tools but often lack deep reasoning. Both architectures tend to over-apply their primary capability—reasoning models overthink simple queries, agentic models make unnecessary tool calls. A²FM solves this by dynamically routing to the right mode for each query.
Rather than forcing all queries through the same pipeline, A²FM identifies simple queries that need instant answers, moderate queries requiring reasoning, and complex queries demanding tool interaction. This three-mode approach prevents wasted computation while maintaining performance across diverse benchmarks.
A²FM operates on a route-then-align principle:
The innovation prevents over-specification: answering "What is the capital of France?" through multi-step tool calls is wasteful, as is reasoning deeply about straightforward requests.
The routing decision happens once per query. This example shows how to implement query classification and mode-specific forward passes.
import torch
import torch.nn as nn
class AdaptiveRouterA2FM(nn.Module):
"""Route queries to instant/reasoning/tool modes based on complexity."""
def __init__(self, embedding_dim=768, hidden_dim=512):
super().__init__()
self.query_encoder = nn.Linear(embedding_dim, hidden_dim)
self.router = nn.Sequential(
nn.ReLU(),
nn.Linear(hidden_dim, 256),
nn.ReLU(),
nn.Linear(256, 3) # 3 modes: instant, reasoning, tool
)
self.mode_names = ["instant", "reasoning", "tool"]
def forward(self, query_embedding):
"""
Args:
query_embedding: shape (batch, embedding_dim)
Returns:
mode_logits: shape (batch, 3) - scores for each mode
selected_mode: shape (batch,) - argmax mode indices
"""
encoded = self.query_encoder(query_embedding)
mode_logits = self.router(encoded)
selected_mode = torch.argmax(mode_logits, dim=1)
return mode_logits, selected_mode
def adaptive_forward_pass(
query,
router_model,
instant_handler,
reasoning_model,
tool_model
):
"""
Encode query, route to appropriate handler, return answer.
"""
# Encode query
query_emb = encode_query(query) # placeholder
# Route to mode
mode_logits, selected_mode = router_model(query_emb)
mode_idx = selected_mode.item()
# Execute mode-specific handler
if mode_idx == 0: # Instant mode
answer = instant_handler.answer(query)
cost = 1 # Cheap
elif mode_idx == 1: # Reasoning mode
answer = reasoning_model.chain_of_thought(query, max_steps=10)
cost = 5 # Medium
else: # Tool mode
answer = tool_model.invoke_tools_with_reasoning(
query, available_tools=TOOLS, max_calls=3
)
cost = 15 # Expensive
return answer, cost, mode_idx
Cost-regularized rewards: During RL training, reward = accuracy - 0.1 * cost. Higher lambda penalizes wasteful routing. Tune on development set to balance accuracy vs. efficiency.
def adaptive_policy_optimization_reward(
answer_correct,
mode_cost,
lambda_cost=0.1
):
"""
Reward function for APO training.
Penalizes both incorrect answers AND expensive routing.
"""
accuracy_reward = float(answer_correct)
cost_penalty = -lambda_cost * mode_cost
total_reward = accuracy_reward + cost_penalty
return total_reward
| Query Type | Best Mode | Example | |-----------|-----------|---------| | Factual (capital, definition) | Instant | "What is Paris?" | | Reasoning (math, analysis) | Reasoning | "Explain why democracies need checks and balances" | | Tool-dependent (weather, current info) | Tool | "What events are trending today?" |
When to Use:
When NOT to Use:
Common Pitfalls:
A²FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.