skills/skillxiv-v0.0.2-claude-opus-4.6/deep-agent-reasoning/SKILL.md
Enables autonomous reasoning agents to discover and invoke tools efficiently through end-to-end training. Uses autonomous memory folding to compress interaction history and ToolPO to learn general-purpose tool use, applicable across diverse benchmarks from QA to web automation.
npx skillsauth add ADu2021/skillXiv deep-agent-reasoningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Existing reasoning agents struggle with two key limitations: they accumulate errors across long-horizon tasks through verbose interaction histories, and they require task-specific tool interfaces rather than learning generalizable tool use patterns.
DeepAgent solves this by integrating autonomous thinking, tool discovery, and action execution into a single end-to-end reasoning process. The system combines memory compression with learned tool invocation, enabling agents to handle complex multi-step tasks efficiently.
DeepAgent operates through three integrated mechanisms:
The memory folding mechanism selectively summarizes interactions at each step. Rather than maintaining full conversation history, compress past state and actions into dense representations:
class MemoryFolder:
def fold_interaction(self, history, current_state):
# Compress episodic memory: factual outcomes from past steps
episodic = self.compress_facts(history)
# Working memory: intermediate reasoning state
working = self.compress_reasoning(current_state)
# Tool memory: effective tool patterns
tools = self.extract_tool_patterns(history)
return {episodic, working, tools}
def compress_facts(self, history):
# Extract key outcomes and state changes
return [fact for fact in history if is_critical(fact)]
def extract_tool_patterns(self, history):
# Track which tools succeeded in which contexts
return {(context, goal): tool for context, goal, tool in history}
ToolPO applies advantage attribution at the token level for tool calls. Rather than assigning credit to entire generation steps, focus reward signals on the tokens that invoke tools:
class ToolPO:
def compute_advantage(self, trajectory, reward):
# Identify tool-call tokens in the generation
tool_tokens = [idx for idx, token in enumerate(trajectory)
if is_tool_invocation(token)]
# Assign advantage only to tool-invocation tokens
advantage = {}
for idx in tool_tokens:
# Fine-grained credit based on outcome
advantage[idx] = compute_token_advantage(trajectory, idx, reward)
return advantage
| Aspect | Recommendation | |--------|-----------------| | Memory compression ratio | 4:1 to 8:1 (reduce interaction sequences by 75-87%) | | Tool-call token weighting | 2-5x higher than other tokens during RL training | | Episodic memory retention | Keep last N=10 critical facts per domain | | Simulated API complexity | Match target environment sophistication |
When to use DeepAgent:
When NOT to use:
Common pitfalls:
Reference: DeepAgent on arXiv
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.