skills/skillxiv-v0.0.2-claude-opus-4.6/agentic-science-cognitive-accumulation/SKILL.md
Enables agents to maintain strategic coherence over extended experimental cycles through hierarchical cognitive caching that distills execution traces into stable knowledge, achieving 56.44% on MLE-Bench within 24-hour budgets.
npx skillsauth add ADu2021/skillXiv agentic-science-cognitive-accumulationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Implement a hierarchical cognitive caching system for autonomous agents conducting multi-day ML engineering experiments. Rather than maintaining static context windows, the system dynamically distills execution traces into reusable knowledge representations, allowing agents to decouple immediate execution from long-term experimental strategy.
Implement multi-tier knowledge distillation that converts verbose execution traces into compressed representations.
# Three-tier hierarchy for knowledge consolidation
class HierarchicalCache:
def __init__(self):
self.immediate_context = {} # Current task execution
self.session_knowledge = {} # Session-level insights
self.cross_task_insights = {} # Lessons across experiments
def consolidate_traces(self, execution_trace, level="session"):
"""Distill trace into stable knowledge at specified level"""
if level == "session":
# Abstract specific values to principles
pattern = self.extract_principles(execution_trace)
self.session_knowledge.update(pattern)
elif level == "cross_task":
# Cross-session optimization insights
insight = self.abstract_to_meta_strategy(execution_trace)
self.cross_task_insights.update(insight)
Identify and remove temporary, problem-specific information while preserving generalizable insights.
# Filter execution traces for essential information
def filter_trace_for_knowledge(trace):
"""Keep optimization patterns, discard problem-specific values"""
knowledge = {}
for step in trace:
if is_generalizable(step): # e.g., "learning rate decay helps"
knowledge[step["principle"]] = step["evidence"]
return knowledge
Manage which knowledge remains active to avoid context pollution from irrelevant prior experiments.
# Active context selection for next experiment
def select_relevant_context(current_problem, cached_knowledge, max_tokens=4000):
"""Select only relevant prior experiences for current task"""
relevance_scores = [
similarity(current_problem, cached[0])
for cached in cached_knowledge
]
return [cached for cached, score in zip(cached_knowledge, relevance_scores)
if score > threshold][:max_tokens]
Enable agents to maintain global exploration strategies independent of immediate execution details.
# Strategy graph for long-horizon planning
class StrategyGraph:
def __init__(self):
self.global_strategy = None # Overall approach
self.checkpoints = [] # Key decision points
def update_global_strategy(self, insights):
"""Refine strategy based on accumulated knowledge"""
self.global_strategy = self.abstract_insights_to_strategy(insights)
def get_next_direction(self):
"""Return next experimental direction without low-level details"""
return self.global_strategy.next_unexplored_branch()
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.