skills/fademem-biologically-inspired-forgetting-agent/SKILL.md
Implement biologically-inspired forgetting mechanisms for LLM agent memory systems. Build dual-layer memory hierarchies with adaptive exponential decay, semantic relevance scoring, and LLM-guided conflict resolution to keep agent context lean and high-quality. Use when: "add forgetting to my agent memory", "implement memory decay for my chatbot", "build an agent memory system with selective retention", "reduce memory bloat in my AI agent", "implement FadeMem-style memory management", "add adaptive memory consolidation to my agent".
npx skillsauth add ndpvt-web/arxiv-claude-skills fademem-biologically-inspired-forgetting-agentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to design and implement agent memory systems that actively forget irrelevant information using biologically-inspired decay mechanisms. Based on the FadeMem architecture, you build dual-layer memory hierarchies (working memory + long-term memory) where each memory entry decays at rates governed by semantic relevance, access frequency, and temporal recency. Rather than storing everything or dropping everything at a context boundary, this approach continuously prunes low-value memories while consolidating important ones -- achieving ~45% storage reduction while improving multi-hop reasoning and retrieval quality.
Dual-Layer Memory with Differential Decay. FadeMem divides agent memory into two tiers: a working memory that holds recent, high-activation entries (analogous to a conversation buffer), and a long-term memory that stores consolidated, durable knowledge. Each memory entry carries a retention score computed by an adaptive exponential decay function:
retention(m) = base_relevance(m) * exp(-lambda(m) * time_since_last_access(m))
The decay rate lambda is not fixed -- it adapts per-entry based on three modulators: (1) semantic relevance to the agent's current task or recent queries, which slows decay for on-topic memories; (2) access frequency, where frequently retrieved memories decay slower (a "use it or lose it" principle); and (3) temporal pattern, where memories from bursty, clustered access patterns are treated as more important than one-off mentions. Entries whose retention score drops below a threshold are either consolidated (merged with related entries via summarization) or permanently forgotten.
LLM-Guided Conflict Resolution and Fusion. When the system detects semantically overlapping or contradictory memories (e.g., a user changed their address, or two sessions give different preferences), it invokes the LLM to evaluate which memory is more current, more contextually grounded, or more consistent with the broader memory store. The winner is kept or a fused summary is generated; the loser decays faster. This prevents stale or contradictory information from polluting retrieval results. Memory fusion also compresses verbose multi-turn exchanges into concise factual entries, reducing storage without losing core information.
Define the memory entry schema. Each entry needs: id, content (text), embedding (vector), created_at (timestamp), last_accessed_at (timestamp), access_count (integer), base_relevance (float 0-1), retention_score (float), layer (enum: working | long_term), and tags (list of topic strings).
Implement the working memory buffer. Create a fixed-capacity buffer (e.g., last 20-50 entries) that holds recent interactions. New entries always land here first. When the buffer is full, trigger a consolidation cycle rather than simply dropping the oldest entry.
Implement the adaptive decay function. For each memory entry, compute:
import math, time
def compute_retention(entry, current_time, query_embedding=None):
time_delta = (current_time - entry["last_accessed_at"]) / 3600 # hours
freq_boost = min(math.log1p(entry["access_count"]) / 5.0, 1.0)
if query_embedding is not None:
semantic_sim = cosine_similarity(query_embedding, entry["embedding"])
else:
semantic_sim = entry["base_relevance"]
lambda_decay = 0.1 * (1.0 - 0.4 * freq_boost - 0.3 * semantic_sim)
retention = entry["base_relevance"] * math.exp(-lambda_decay * time_delta)
return max(retention, 0.0)
The key insight: lambda_decay shrinks for frequently-accessed, semantically-relevant entries, so they decay much slower.
Run periodic decay sweeps. On every N-th interaction (e.g., every 5 turns or on each new session), recompute retention_score for all entries. Entries below a forget_threshold (e.g., 0.15) are candidates for removal. Entries between forget_threshold and a consolidate_threshold (e.g., 0.35) are candidates for fusion.
Implement memory fusion via LLM summarization. Group candidate entries by semantic similarity (cluster embeddings with a cosine threshold of 0.75+). For each cluster, prompt the LLM:
Summarize the following related memory entries into a single concise factual statement.
Preserve key facts, names, dates, and preferences. Drop conversational filler.
Entries: {entries}
Replace the cluster with the fused entry, inheriting the highest base_relevance and summed access_count from the group.
Implement LLM-guided conflict resolution. When fusion detects contradictory entries (e.g., cosine similarity > 0.8 but semantic content diverges), prompt the LLM:
These two memory entries appear to conflict:
A (created {date_a}): {content_a}
B (created {date_b}): {content_b}
Which is more likely to be current/correct? Return the resolved fact or indicate which to keep.
Apply the resolution: keep the winner, accelerate decay on the loser (multiply its lambda by 3x), or replace both with a merged entry.
Promote and demote between layers. After a decay sweep: promote working memory entries with retention_score > 0.7 and access_count >= 3 to long-term memory. Demote long-term entries with retention_score < consolidate_threshold back to working memory for re-evaluation or fusion. Delete any entry with retention_score < forget_threshold.
Implement retrieval with decay-aware ranking. When the agent needs context, retrieve candidate memories by semantic similarity, then re-rank by multiplying similarity with retention_score. This naturally down-ranks stale memories even if they are semantically close:
def retrieve(query_embedding, memories, top_k=10):
scored = []
now = time.time()
for m in memories:
sim = cosine_similarity(query_embedding, m["embedding"])
retention = compute_retention(m, now, query_embedding)
score = sim * 0.6 + retention * 0.4
scored.append((m, score))
m["last_accessed_at"] = now # refresh on access
m["access_count"] += 1
scored.sort(key=lambda x: x[1], reverse=True)
return [m for m, s in scored[:top_k]]
Wire into the agent loop. Insert memory operations at three points: (a) after each user turn, encode and store new entries in working memory; (b) before each LLM call, retrieve top-k memories and inject as context; (c) after every N turns, run the decay sweep + consolidation cycle.
Tune thresholds empirically. Start with forget_threshold=0.15, consolidate_threshold=0.35, promotion_threshold=0.7, base lambda=0.1. Monitor memory store size and retrieval quality. If the agent forgets too aggressively, lower lambda or raise thresholds. If memory bloats, do the opposite.
Example 1: Multi-session customer support agent
User: "Build a memory system for my support chatbot that remembers customer preferences across sessions but doesn't bloat over time."
Approach:
sqlite-vec or similar).base_relevance scored by the LLM (0.3 for small talk, 0.7 for product preferences, 0.9 for active issues).access_count=1 decays to ~0.08 and gets forgotten. Their product preference accessed 12 times stays at ~0.82 and remains.Output structure:
# memory_store.py
class FadeMemStore:
def __init__(self, db_path, forget_threshold=0.15, consolidate_threshold=0.35):
self.db = sqlite3.connect(db_path)
self.forget_threshold = forget_threshold
self.consolidate_threshold = consolidate_threshold
self._init_tables()
def add(self, content, embedding, relevance=0.5):
"""Add new entry to working memory."""
def retrieve(self, query_embedding, top_k=10):
"""Semantic search with decay-aware re-ranking."""
def decay_sweep(self, current_query_embedding=None):
"""Recompute retention scores, consolidate or forget entries."""
def resolve_conflicts(self, entries):
"""LLM-guided resolution for contradictory memory pairs."""
def fuse(self, cluster):
"""Summarize a cluster of related entries into one."""
Example 2: Research assistant agent with long-running context
User: "My research agent accumulates thousands of paper summaries and notes. Help me add FadeMem-style decay so it keeps the most relevant ones."
Approach:
last_accessed_at, access_count, retention_score).base_relevance using the LLM to score each note's relevance to the agent's declared research topics (passed as a topic vector).Output: A wrapper module that monkey-patches the existing store:
# fademem_wrapper.py
class FadeMemWrapper:
def __init__(self, base_store, llm_client, embedding_fn):
self.store = base_store
self.llm = llm_client
self.embed = embedding_fn
def ingest(self, text, topic_relevance=None): ...
def query(self, question, top_k=15): ...
def maintenance_cycle(self): ... # decay + fuse + conflict resolve
Example 3: Adding forgetting to a LangChain or LlamaIndex agent
User: "I'm using LangChain's ConversationBufferMemory but it grows too large. Add FadeMem-style forgetting."
Approach:
ConversationBufferMemory or wrap it with a FadeMemMemory class.save_context to assign decay metadata to each new memory entry.load_memory_variables to run a lightweight decay check (skip full sweep, just filter by precomputed retention_score), returning only entries above forget_threshold.maintenance() method called every N turns that runs the full sweep with consolidation.from langchain.memory import ConversationBufferMemory
class FadeMemMemory(ConversationBufferMemory):
def __init__(self, forget_threshold=0.15, **kwargs):
super().__init__(**kwargs)
self._decay_metadata = {}
self._turn_count = 0
self.forget_threshold = forget_threshold
def save_context(self, inputs, outputs):
super().save_context(inputs, outputs)
# Attach decay metadata to newest entry
...
def load_memory_variables(self, inputs):
# Filter by retention_score before returning
...
base_relevance at ingestion time using the LLM or a classifier. A well-calibrated initial score is the single biggest lever on retention quality. Factual user preferences should score 0.7-0.9; small talk should score 0.1-0.3.last_accessed_at and increment access_count on retrieval hits. This is the "use it or lose it" signal that keeps important memories alive.forget_threshold too aggressively at first. Start conservative (0.10-0.15) and tighten once you confirm the agent isn't losing important information.base_relevance was scored too low at ingestion, or whether the lambda base rate is too high. Add a pinned flag for critical memories (e.g., user's name, core preferences) that exempts them from decay.maintenance_cycle() in the agent loop. Add a turn counter that triggers sweeps automatically.base_relevance for the first N entries.lambda, forget_threshold, and consolidate_threshold per use case.Paper: FadeMem: Biologically-Inspired Forgetting for Efficient Agent Memory -- Wei et al., 2026. Focus on Section 3 (the adaptive decay formulation and dual-layer architecture) and Section 5 (ablation studies showing the contribution of each decay modulator).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".