skills/memory-systems/SKILL.md
This skill should be used for persistent semantic memory in agent systems: cross-session knowledge retention, entity tracking, temporal validity, graph or vector retrieval, memory consolidation, and memory benchmark selection. Route file-backed scratchpads to filesystem-context, handoff summaries to context-compression, and token-efficiency tactics to context-optimization.
npx skillsauth add guanyang/antigravity-skills memory-systemsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.
Activate this skill when:
Do not activate this skill for adjacent work owned by other skills:
filesystem-context.context-compression.context-optimization.bdi-mental-states.Think of memory as a spectrum from volatile context window to persistent storage. Default to the simplest layer that meets retrieval needs, because benchmark evidence suggests tool complexity matters less than reliable retrieval for some memory workloads (claim-memory-locomo-filesystem-baseline). Add structure (graphs, temporal validity) only when retrieval quality degrades or the agent needs multi-hop reasoning, relationship traversal, or time-travel queries.
Select a framework based on the dominant retrieval pattern the agent requires. Use this table to narrow the shortlist, then validate with the benchmark data below.
| Framework | Architecture | Best For | Trade-off | |-----------|-------------|----------|-----------| | Mem0 | Vector store + graph memory, pluggable backends | Multi-tenant systems, broad integrations | Less specialized for multi-agent | | Zep/Graphiti | Temporal knowledge graph, bi-temporal model | Enterprise requiring relationship modeling + temporal reasoning | Advanced features cloud-locked | | Letta | Self-editing memory with tiered storage (in-context/core/archival) | Full agent introspection, stateful services | Complexity for simple use cases | | Cognee | Multi-layer semantic graph via customizable ECL pipeline with customizable Tasks | Evolving agent memory that adapts and learns; multi-hop reasoning | Heavier ingest-time processing | | LangMem | Memory tools for LangGraph workflows | Teams already on LangGraph | Tightly coupled to LangGraph | | File-system | Plain files with naming conventions | Simple agents, prototyping | No semantic search, no relationships |
Choose Zep/Graphiti when the agent needs bi-temporal modeling (tracking both when events occurred and when they were ingested) because its three-tier knowledge graph (episode, semantic entity, community subgraphs) excels at temporal queries. Choose Mem0 when the priority is fast time-to-production with managed infrastructure. Choose Letta when the agent needs deep self-introspection through its Agent Development Environment. Choose Cognee when the agent must build dense multi-layer semantic graphs — it layers text chunks and entity types as nodes with detailed relationship edges, and every core piece (ingestion, entity extraction, post-processing, retrieval) is customizable.
Benchmark Performance Comparison
Consult these benchmarks to set expectations, but treat them as source-specific signals for retrieval dimensions rather than absolute rankings. No single benchmark is definitive.
| System | DMR Accuracy | LoCoMo | HotPotQA (multi-hop) | Latency | |--------|-------------|--------|---------------------|---------| | Cognee | — | — | Published high score | Variable | | Zep (Temporal KG) | Published high score | — | Mid-range across metrics | Low-latency reported | | Letta (filesystem) | — | Published filesystem baseline | — | — | | Mem0 | — | Published specialized-tool baseline | Lower in one comparison | — | | MemGPT | Published high score | — | — | Variable | | GraphRAG | Published mid/high range | — | — | Variable | | Vector RAG baseline | Published lower range | — | — | Fast |
Key takeaway: compare memory systems by retrieval shape, not brand. Use benchmark numbers as dated evidence that must be rechecked before making product claims; the stable design rule is to start shallow, measure retrieval quality, then add semantic or graph structure only when a simpler layer fails.
Pick the shallowest memory layer that satisfies the persistence requirement. Each deeper layer adds infrastructure cost and operational complexity, so only escalate when the shallower layer cannot meet the retrieval or durability need.
| Layer | Persistence | Implementation | When to Use | |-------|------------|----------------|-------------| | Working | Context window only | Scratchpad in system prompt | Always — optimize with attention-favored positions | | Short-term | Session-scoped | File-system, in-memory cache | Intermediate tool results, conversation state | | Long-term | Cross-session | Key-value store → graph DB | User preferences, domain knowledge, entity registries | | Entity | Cross-session | Entity registry + properties | Maintaining identity ("John Doe" = same person across conversations) | | Temporal KG | Cross-session + history | Graph with validity intervals | Facts that change over time, time-travel queries, preventing context clash |
Match the retrieval strategy to the query shape. Semantic search handles direct factual lookups well but degrades on multi-hop reasoning; entity-based traversal handles "everything about X" queries but requires graph structure; temporal filtering handles changing facts but requires validity metadata. When accuracy is paramount and infrastructure budget allows, combine strategies into hybrid retrieval.
| Strategy | Use When | Limitation | |----------|----------|------------| | Semantic (embedding similarity) | Direct factual queries | Degrades on multi-hop reasoning | | Entity-based (graph traversal) | "Tell me everything about X" | Requires graph structure | | Temporal (validity filter) | Facts change over time | Requires validity metadata | | Hybrid (semantic + keyword + graph) | Best overall accuracy | Most infrastructure |
Hybrid approaches reduce active context by retrieving only relevant subgraphs or memories. Cognee implements hybrid retrieval through multiple search modes across graph, vector, and relational stores, letting agents select the retrieval strategy that fits the query type rather than using a one-size-fits-all approach.
Run consolidation periodically to prevent unbounded growth, because unchecked memory accumulation degrades retrieval quality over time. Invalidate but do not discard — preserving history matters for temporal queries that need to reconstruct past states. Trigger consolidation on memory count thresholds, degraded retrieval quality, or scheduled intervals. See Implementation Reference for working consolidation code.
Start with the simplest viable layer and add complexity only when retrieval quality degrades. Most agents do not need a temporal knowledge graph on day one. Follow this escalation path:
Load memories just-in-time rather than preloading everything, because large context payloads are expensive and degrade attention quality. Place retrieved memories in attention-favored positions (beginning or end of context) to maximize their influence on generation.
Handle retrieval failures gracefully because memory systems are inherently noisy. Apply these recovery strategies in order:
valid_until timestamps. If most results are expired, trigger consolidation before retrying.valid_from. Surface the conflict to the user if confidence is low.Example 1: Mem0 Integration
from mem0 import Memory
m = Memory()
m.add("User prefers dark mode and Python 3.12", user_id="alice")
m.add("User switched to light mode", user_id="alice")
# Retrieves current preference (light mode), not outdated one
results = m.search("What theme does the user prefer?", user_id="alice")
Example 2: Temporal Query
# Track entity with validity periods
graph.create_temporal_relationship(
source_id=user_node,
rel_type="LIVES_AT",
target_id=address_node,
valid_from=datetime(2024, 1, 15),
valid_until=datetime(2024, 9, 1), # moved out
)
# Query: Where did user live on March 1, 2024?
results = graph.query_at_time(
{"type": "LIVES_AT", "source_label": "User"},
query_time=datetime(2024, 3, 1)
)
Example 3: Cognee Memory Ingestion and Search
import cognee
from cognee.modules.search.types import SearchType
# Ingest and build knowledge graph
await cognee.add("./docs/")
await cognee.add("any data")
await cognee.cognify()
# Enrich memory
await cognee.memify()
# Agent retrieves relationship-aware context
results = await cognee.search(
query_text="Any query for your memory",
query_type=SearchType.GRAPH_COMPLETION,
)
This skill owns persistent semantic memory. Adjacent skills own scratch storage, compaction, and context tactics:
filesystem-context: file-backed scratchpads, logs, and simple run state before semantic retrieval is needed.context-compression: summaries and handoffs that preserve session state in prose.context-optimization: just-in-time memory loading and retrieval scoping inside active context budgets.context-degradation: stale or conflicting memories as context poisoning or clash.bdi-mental-states: formal mental-state modeling when beliefs, desires, intentions, and provenance chains matter.multi-agent-patterns: shared memory across agents.evaluation: memory quality, retrieval correctness, and benchmark selection.Internal references:
Related skills in this collection:
External resources:
Created: 2025-12-20 Last Updated: 2026-05-15 Author: Agent Skills for Context Engineering Contributors Version: 4.1.0
tools
Implements Manus-style file-based planning to organize and track progress on complex tasks. Creates task_plan.md, findings.md, and progress.md. Use when asked to plan out, break down, or organize a multi-step project, research task, or any work requiring 5+ tool calls. Supports automatic session recovery after /clear.
development
AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream, Replicate and Agnes APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.
development
Guidance for distinctive, intentional visual design when building new UI or reshaping an existing one. Helps with aesthetic direction, typography, and making choices that don't read as templated defaults.
tools
Reference for the Claude API / Anthropic SDK — model ids, pricing, params, streaming, tool use, MCP, agents, caching, token counting, model migration. TRIGGER — read BEFORE opening the target file; don't skip because it "looks like a one-liner" — whenever: the prompt names Claude/Anthropic in any form (Claude, Anthropic, Fable, Opus, Sonnet, Haiku, `anthropic`, `@anthropic-ai`, `claude-*`, `us.anthropic.*`, `[1m]`); the user asks about an LLM (pricing/model choice/limits/caching) — never answer from memory; OR the task is LLM-shaped with provider unstated (agent/MCP/tool-definition/multi-agent/RAG/LLM-judge/computer-use; generate/summarize/extract/classify/rewrite/converse over NL; debugging refusals/cutoffs/streaming/tool-calls/tokens). SKIP only when another provider is being worked on (overrides all triggers): OpenAI/GPT/Gemini/Llama/Mistral/Cohere/Ollama named in the query; OR `grep -rE 'openai|langchain_openai|google.generativeai|genai|mistralai|cohere|ollama'` over the project hits (run this grep FIRST if no provider named — don't Read the file).