Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ADu2021/agentic-science-cognitive-accumulation

Name: agentic-science-cognitive-accumulation
Author: ADu2021

skills/skillxiv-v0.0.2-claude-opus-4.6/agentic-science-cognitive-accumulation/SKILL.md

npx skillsauth add ADu2021/skillXiv agentic-science-cognitive-accumulation

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Overview

Implement a hierarchical cognitive caching system for autonomous agents conducting multi-day ML engineering experiments. Rather than maintaining static context windows, the system dynamically distills execution traces into reusable knowledge representations, allowing agents to decouple immediate execution from long-term experimental strategy.

When to Use

For multi-day autonomous research or engineering projects requiring hundreds of experimental steps
When agents need to explore high-dimensional problem spaces beyond human precedent
For ML hyperparameter tuning, architecture search, or scientific discovery tasks
When you need agents to learn from prior experimental failures and optimize future attempts

When NOT to Use

For short-horizon tasks (single-session experiments under 1 hour)
When all relevant context fits in static context windows
For real-time systems where knowledge consolidation adds unacceptable latency
For tasks with simple, deterministic experimental spaces

Key Technical Components

Hierarchical Cognitive Caching (HCC)

Implement multi-tier knowledge distillation that converts verbose execution traces into compressed representations.

# Three-tier hierarchy for knowledge consolidation
class HierarchicalCache:
    def __init__(self):
        self.immediate_context = {}  # Current task execution
        self.session_knowledge = {}    # Session-level insights
        self.cross_task_insights = {}  # Lessons across experiments

    def consolidate_traces(self, execution_trace, level="session"):
        """Distill trace into stable knowledge at specified level"""
        if level == "session":
            # Abstract specific values to principles
            pattern = self.extract_principles(execution_trace)
            self.session_knowledge.update(pattern)
        elif level == "cross_task":
            # Cross-session optimization insights
            insight = self.abstract_to_meta_strategy(execution_trace)
            self.cross_task_insights.update(insight)

Transient Execution Trace Filtering

Identify and remove temporary, problem-specific information while preserving generalizable insights.

# Filter execution traces for essential information
def filter_trace_for_knowledge(trace):
    """Keep optimization patterns, discard problem-specific values"""
    knowledge = {}
    for step in trace:
        if is_generalizable(step):  # e.g., "learning rate decay helps"
            knowledge[step["principle"]] = step["evidence"]
    return knowledge

Strategic Context Management

Manage which knowledge remains active to avoid context pollution from irrelevant prior experiments.

# Active context selection for next experiment
def select_relevant_context(current_problem, cached_knowledge, max_tokens=4000):
    """Select only relevant prior experiences for current task"""
    relevance_scores = [
        similarity(current_problem, cached[0])
        for cached in cached_knowledge
    ]
    return [cached for cached, score in zip(cached_knowledge, relevance_scores)
            if score > threshold][:max_tokens]

Long-Horizon Strategy Decoupling

Enable agents to maintain global exploration strategies independent of immediate execution details.

# Strategy graph for long-horizon planning
class StrategyGraph:
    def __init__(self):
        self.global_strategy = None  # Overall approach
        self.checkpoints = []         # Key decision points

    def update_global_strategy(self, insights):
        """Refine strategy based on accumulated knowledge"""
        self.global_strategy = self.abstract_insights_to_strategy(insights)

    def get_next_direction(self):
        """Return next experimental direction without low-level details"""
        return self.global_strategy.next_unexplored_branch()

Performance Characteristics

Medal rate (wins) on MLE-Bench: 56.44%
Supports 24-hour experiment budgets with hundreds of steps
Reduces context redundancy by distilling traces to compressed representations
Enables cross-task learning from accumulated experiences

Integration Pattern

Initialize HCC with empty knowledge tiers
Execute experimental step, capture full trace
At session boundaries, consolidate traces into session knowledge
Before new experiments, retrieve relevant context from cache
Update global strategy based on cross-task insights

References

Context window limitations require trace distillation for long-horizon tasks
Knowledge consolidation enables pattern reuse across diverse problems
Hierarchical representation prevents context pollution

ADu2021/agentic-science-cognitive-accumulation

skills/skillxiv-v0.0.2-claude-opus-4.6/agentic-science-cognitive-accumulation/SKILL.md

Enables agents to maintain strategic coherence over extended experimental cycles through hierarchical cognitive caching that distills execution traces into stable knowledge, achieving 56.44% on MLE-Bench within 24-hour budgets.

2 stars

testing

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add ADu2021/skillXiv agentic-science-cognitive-accumulation

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 3:03 PM4.4s1 file scanned

SKILL.md

name:: agentic-science-cognitive-accumulation
title:: Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
version:: 0.0.2
engine:: skillxiv-v0.0.2-claude-opus-4.6
license:: MIT
url:: https://arxiv.org/abs/2601.10402
keywords:: [agentic-science, long-horizon, context-management, knowledge-distillation, ML-engineering]
description:: Enables agents to maintain strategic coherence over extended experimental cycles through hierarchical cognitive caching that distills execution traces into stable knowledge, achieving 56.44% on MLE-Bench within 24-hour budgets.

Overview

When to Use

For multi-day autonomous research or engineering projects requiring hundreds of experimental steps
When agents need to explore high-dimensional problem spaces beyond human precedent
For ML hyperparameter tuning, architecture search, or scientific discovery tasks
When you need agents to learn from prior experimental failures and optimize future attempts

When NOT to Use

For short-horizon tasks (single-session experiments under 1 hour)
When all relevant context fits in static context windows
For real-time systems where knowledge consolidation adds unacceptable latency
For tasks with simple, deterministic experimental spaces

Key Technical Components

Hierarchical Cognitive Caching (HCC)

Implement multi-tier knowledge distillation that converts verbose execution traces into compressed representations.

# Three-tier hierarchy for knowledge consolidation
class HierarchicalCache:
    def __init__(self):
        self.immediate_context = {}  # Current task execution
        self.session_knowledge = {}    # Session-level insights
        self.cross_task_insights = {}  # Lessons across experiments

    def consolidate_traces(self, execution_trace, level="session"):
        """Distill trace into stable knowledge at specified level"""
        if level == "session":
            # Abstract specific values to principles
            pattern = self.extract_principles(execution_trace)
            self.session_knowledge.update(pattern)
        elif level == "cross_task":
            # Cross-session optimization insights
            insight = self.abstract_to_meta_strategy(execution_trace)
            self.cross_task_insights.update(insight)

Transient Execution Trace Filtering

Identify and remove temporary, problem-specific information while preserving generalizable insights.

# Filter execution traces for essential information
def filter_trace_for_knowledge(trace):
    """Keep optimization patterns, discard problem-specific values"""
    knowledge = {}
    for step in trace:
        if is_generalizable(step):  # e.g., "learning rate decay helps"
            knowledge[step["principle"]] = step["evidence"]
    return knowledge

Strategic Context Management

Manage which knowledge remains active to avoid context pollution from irrelevant prior experiments.

# Active context selection for next experiment
def select_relevant_context(current_problem, cached_knowledge, max_tokens=4000):
    """Select only relevant prior experiences for current task"""
    relevance_scores = [
        similarity(current_problem, cached[0])
        for cached in cached_knowledge
    ]
    return [cached for cached, score in zip(cached_knowledge, relevance_scores)
            if score > threshold][:max_tokens]

Long-Horizon Strategy Decoupling

Enable agents to maintain global exploration strategies independent of immediate execution details.

# Strategy graph for long-horizon planning
class StrategyGraph:
    def __init__(self):
        self.global_strategy = None  # Overall approach
        self.checkpoints = []         # Key decision points

    def update_global_strategy(self, insights):
        """Refine strategy based on accumulated knowledge"""
        self.global_strategy = self.abstract_insights_to_strategy(insights)

    def get_next_direction(self):
        """Return next experimental direction without low-level details"""
        return self.global_strategy.next_unexplored_branch()

Performance Characteristics

Medal rate (wins) on MLE-Bench: 56.44%
Supports 24-hour experiment budgets with hundreds of steps
Reduces context redundancy by distilling traces to compressed representations
Enables cross-task learning from accumulated experiences

Integration Pattern

Initialize HCC with empty knowledge tiers
Execute experimental step, capture full trace
At session boundaries, consolidate traces into session knowledge
Before new experiments, retrieve relevant context from cache
Update global strategy based on cross-task insights

References

Context window limitations require trace distillation for long-horizon tasks
Knowledge consolidation enables pattern reuse across diverse problems
Hierarchical representation prevents context pollution

Related Skills

ADu2021/flow-map-trajectory-tilting

testing

VerifiedTrustedCommunity

Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flow-map-trajectory-tilting

ADu2021/flexible-data-mixture-of-experts

testing

VerifiedTrustedCommunity

Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexible-data-mixture-of-experts

ADu2021/flexibility-trap-diffusion-reasoning

data-ai

VerifiedTrustedCommunity

Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexibility-trap-diffusion-reasoning

ADu2021/flex-continuous-agent-evolution

devops

VerifiedTrustedCommunity

Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flex-continuous-agent-evolution

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ADu2021/skillXiv.git

# Copy into Claude Code skills folder (global)
cp -r skillXiv/skills/skillxiv-v0.0.2-claude-opus-4.6/agentic-science-cognitive-accumulation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ADu2021/skillXiv

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT