Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ADu2021/deep-agent-reasoning

Name: deep-agent-reasoning
Author: ADu2021

skills/skillxiv-v0.0.2-claude-opus-4.6/deep-agent-reasoning/SKILL.md

npx skillsauth add ADu2021/skillXiv deep-agent-reasoning

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

DeepAgent: Unified Autonomous Reasoning with Tool Learning

Existing reasoning agents struggle with two key limitations: they accumulate errors across long-horizon tasks through verbose interaction histories, and they require task-specific tool interfaces rather than learning generalizable tool use patterns.

DeepAgent solves this by integrating autonomous thinking, tool discovery, and action execution into a single end-to-end reasoning process. The system combines memory compression with learned tool invocation, enabling agents to handle complex multi-step tasks efficiently.

Core Concept

DeepAgent operates through three integrated mechanisms:

Autonomous Memory Folding: Compresses past interactions into structured episodic, working, and tool memories, reducing error propagation
ToolPO (Tool Policy Optimization): An end-to-end RL strategy using simulated APIs and fine-grained tool-call advantage attribution
Tool Retrieval: Handles both labeled-tool and open-set discovery scenarios

Architecture Overview

Memory compression captures essential interaction patterns without verbose history
Tool-call advantage attribution isolates credit signals to tool invocation tokens
Memory types (episodic, working, tool) serve different reasoning stages
End-to-end training enables discovery of effective tool combinations

Implementation Steps

The memory folding mechanism selectively summarizes interactions at each step. Rather than maintaining full conversation history, compress past state and actions into dense representations:

class MemoryFolder:
    def fold_interaction(self, history, current_state):
        # Compress episodic memory: factual outcomes from past steps
        episodic = self.compress_facts(history)
        # Working memory: intermediate reasoning state
        working = self.compress_reasoning(current_state)
        # Tool memory: effective tool patterns
        tools = self.extract_tool_patterns(history)
        return {episodic, working, tools}

    def compress_facts(self, history):
        # Extract key outcomes and state changes
        return [fact for fact in history if is_critical(fact)]

    def extract_tool_patterns(self, history):
        # Track which tools succeeded in which contexts
        return {(context, goal): tool for context, goal, tool in history}

ToolPO applies advantage attribution at the token level for tool calls. Rather than assigning credit to entire generation steps, focus reward signals on the tokens that invoke tools:

class ToolPO:
    def compute_advantage(self, trajectory, reward):
        # Identify tool-call tokens in the generation
        tool_tokens = [idx for idx, token in enumerate(trajectory)
                      if is_tool_invocation(token)]

        # Assign advantage only to tool-invocation tokens
        advantage = {}
        for idx in tool_tokens:
            # Fine-grained credit based on outcome
            advantage[idx] = compute_token_advantage(trajectory, idx, reward)

        return advantage

Practical Guidance

| Aspect | Recommendation | |--------|-----------------| | Memory compression ratio | 4:1 to 8:1 (reduce interaction sequences by 75-87%) | | Tool-call token weighting | 2-5x higher than other tokens during RL training | | Episodic memory retention | Keep last N=10 critical facts per domain | | Simulated API complexity | Match target environment sophistication |

When to use DeepAgent:

Multi-step reasoning tasks requiring tool invocation
Long-horizon problems where error accumulation matters
Scenarios with large, diverse tool libraries to explore

When NOT to use:

Single-step tasks without tool requirements
Domains with strictly defined tool interfaces (use API-specific agents)
Real-time systems where memory compression adds latency

Common pitfalls:

Over-compressing memory and losing critical context
Under-weighting tool-specific advantage signals
Insufficient diversity in simulated API trajectories during training

Reference: DeepAgent on arXiv

ADu2021/deep-agent-reasoning

skills/skillxiv-v0.0.2-claude-opus-4.6/deep-agent-reasoning/SKILL.md

Enables autonomous reasoning agents to discover and invoke tools efficiently through end-to-end training. Uses autonomous memory folding to compress interaction history and ToolPO to learn general-purpose tool use, applicable across diverse benchmarks from QA to web automation.

2 stars

tools

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add ADu2021/skillXiv deep-agent-reasoning

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 5:32 AM18.2s1 file scanned

SKILL.md

name:: deep-agent-reasoning
title:: DeepAgent: A General Reasoning Agent with Scalable Toolsets
version:: 0.0.2
engine:: skillxiv-v0.0.2-claude-opus-4.6
license:: MIT
url:: https://arxiv.org/abs/2510.21618
keywords:: [Agent, Reasoning, Tool Learning, RL, Memory Management]
description:: Enables autonomous reasoning agents to discover and invoke tools efficiently through end-to-end training. Uses autonomous memory folding to compress interaction history and ToolPO to learn general-purpose tool use, applicable across diverse benchmarks from QA to web automation.

DeepAgent: Unified Autonomous Reasoning with Tool Learning

Core Concept

DeepAgent operates through three integrated mechanisms:

Autonomous Memory Folding: Compresses past interactions into structured episodic, working, and tool memories, reducing error propagation
ToolPO (Tool Policy Optimization): An end-to-end RL strategy using simulated APIs and fine-grained tool-call advantage attribution
Tool Retrieval: Handles both labeled-tool and open-set discovery scenarios

Architecture Overview

Memory compression captures essential interaction patterns without verbose history
Tool-call advantage attribution isolates credit signals to tool invocation tokens
Memory types (episodic, working, tool) serve different reasoning stages
End-to-end training enables discovery of effective tool combinations

Implementation Steps

The memory folding mechanism selectively summarizes interactions at each step. Rather than maintaining full conversation history, compress past state and actions into dense representations:

class MemoryFolder:
    def fold_interaction(self, history, current_state):
        # Compress episodic memory: factual outcomes from past steps
        episodic = self.compress_facts(history)
        # Working memory: intermediate reasoning state
        working = self.compress_reasoning(current_state)
        # Tool memory: effective tool patterns
        tools = self.extract_tool_patterns(history)
        return {episodic, working, tools}

    def compress_facts(self, history):
        # Extract key outcomes and state changes
        return [fact for fact in history if is_critical(fact)]

    def extract_tool_patterns(self, history):
        # Track which tools succeeded in which contexts
        return {(context, goal): tool for context, goal, tool in history}

ToolPO applies advantage attribution at the token level for tool calls. Rather than assigning credit to entire generation steps, focus reward signals on the tokens that invoke tools:

class ToolPO:
    def compute_advantage(self, trajectory, reward):
        # Identify tool-call tokens in the generation
        tool_tokens = [idx for idx, token in enumerate(trajectory)
                      if is_tool_invocation(token)]

        # Assign advantage only to tool-invocation tokens
        advantage = {}
        for idx in tool_tokens:
            # Fine-grained credit based on outcome
            advantage[idx] = compute_token_advantage(trajectory, idx, reward)

        return advantage

Practical Guidance

When to use DeepAgent:

Multi-step reasoning tasks requiring tool invocation
Long-horizon problems where error accumulation matters
Scenarios with large, diverse tool libraries to explore

When NOT to use:

Single-step tasks without tool requirements
Domains with strictly defined tool interfaces (use API-specific agents)
Real-time systems where memory compression adds latency

Common pitfalls:

Over-compressing memory and losing critical context
Under-weighting tool-specific advantage signals
Insufficient diversity in simulated API trajectories during training

Reference: DeepAgent on arXiv

Related Skills

ADu2021/flow-map-trajectory-tilting

testing

VerifiedTrustedCommunity

Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flow-map-trajectory-tilting

ADu2021/flexible-data-mixture-of-experts

testing

VerifiedTrustedCommunity

Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexible-data-mixture-of-experts

ADu2021/flexibility-trap-diffusion-reasoning

data-ai

VerifiedTrustedCommunity

Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexibility-trap-diffusion-reasoning

ADu2021/flex-continuous-agent-evolution

devops

VerifiedTrustedCommunity

Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flex-continuous-agent-evolution

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ADu2021/skillXiv.git

# Copy into Claude Code skills folder (global)
cp -r skillXiv/skills/skillxiv-v0.0.2-claude-opus-4.6/deep-agent-reasoning ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ADu2021/skillXiv

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT