skills/agent-memory-systems/SKILL.md
Use when designing memory architecture for AI agents or chatbots. Use when choosing between conversation buffer, summary, entity, knowledge graph, or vector store memory types. Use when implementing RAG chunking strategies or retrieval pipelines. Use when debugging agent memory failures — forgetting context, inconsistent answers, or retrieving wrong information. Use when planning memory lifecycle (TTL, consolidation, contradiction handling). NEVER for Claude Code's own MEMORY.md file management — that's a separate system.
npx skillsauth add sharkitect-solutions/sharkitect-claude-toolkit agent-memory-systemsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Think like a cognitive architect, not a database engineer. Memory failures look like intelligence failures — when an agent "forgets" or gives inconsistent answers, it's almost always a retrieval problem, not a storage problem.
Before designing any memory system, ask:
These four questions determine your entire architecture. Skip them and you'll build the wrong system.
What does the agent need to remember?
│
├─ Recent conversation context (< 20 turns)
│ └─ Conversation Buffer
│ Cost: Linear token growth. Cap at ~10K tokens.
│ Trap: Devs use this as default because it's easy,
│ then wonder why costs explode at scale.
│
├─ Conversation gist across long sessions
│ └─ Conversation Summary Memory
│ Cost: Fixed token budget, but LOSSY.
│ Trap: Summaries silently drop numbers, names, dates.
│ Rule: If precision matters, don't summarize — extract.
│
├─ Facts about specific entities (users, products, topics)
│ └─ Entity Memory
│ Cost: Scales with entity count, not conversation length.
│ Trap: Entity extraction is fragile — "my wife" and "Sarah"
│ must resolve to the same entity. Requires coreference.
│
├─ Relationships between entities
│ └─ Knowledge Graph Memory
│ Cost: Most complex to build and query.
│ When: Relationships matter MORE than raw facts.
│ Example: "Who reported to whom during Q3?" requires graph.
│ Trap: Graph queries are hard to write correctly. Test early.
│
└─ Large knowledge base (docs, FAQs, manuals)
└─ Vector Store (RAG)
Cost: Scales well for storage, but retrieval quality varies wildly.
When: More than ~50 pages of reference material.
Trap: Chunking determines 80% of retrieval quality.
Bad chunks = bad answers, regardless of model quality.
Hybrid is the norm. Production agents typically combine 2-3 types: conversation buffer for immediate context + entity memory for user facts + vector store for knowledge base.
Chunking is the single highest-leverage decision in any RAG system. The difference between good and bad chunking is 40-60% retrieval accuracy.
| Strategy | When to Use | When It Fails | |----------|-------------|---------------| | Fixed-size (512 tokens) | Uniform content (API docs, glossaries) | Breaks mid-thought on narrative content | | Recursive splitting | Code, structured documents | Still syntactic — misses semantic boundaries | | Semantic chunking | Long-form content, articles, manuals | 3x slower to index; embedding quality dependent | | Contextual chunking | High-stakes retrieval where accuracy matters | Increases storage 20-30% (each chunk gets context prepended) | | Agentic chunking | High-value, low-volume content (contracts, policies) | Expensive — LLM call per boundary decision |
Prepend a 1-2 sentence context summary to each chunk before embedding:
Without context: "The fee is 2.5% per transaction."
With context: "From the Stripe pricing page, merchant processing section: The fee is 2.5% per transaction."
This costs 20-30% more storage but dramatically improves retrieval because the embedding captures WHERE the information lives, not just WHAT it says. Always use this for production systems.
Overlap chunks by 10-20% of chunk size. Without overlap, answers that span chunk boundaries become invisible to retrieval. This is the #1 cause of "the answer is in the docs but the agent can't find it."
When an agent gives wrong or incomplete answers despite having the right information stored, diagnose with this hierarchy:
Wrong chunks retrieved — Query doesn't semantically match stored content. Fix: hybrid search (vector + keyword). Hybrid search improves retrieval by 30-50% over vector-only.
Right chunk, split answer — The answer spans two chunks. Fix: increase chunk overlap to 20%. Check: do retrieved chunks end mid-sentence?
Semantic drift — Query wording diverges from stored wording ("cancel subscription" vs "terminate account"). Fix: query expansion — generate 3-5 rephrasings, retrieve for each, merge results.
Recency bias — Recent memories dominate older but more relevant ones. Fix: temporal decay function. Recent memories get a boost but it decays over days/weeks.
Embedding model mismatch — Changed embedding models without re-indexing. ALL existing vectors are now in a different semantic space. There is no fix except full re-indexing.
Not all memories should live forever. Implement TTL (time-to-live) tiers:
| Memory Type | TTL | Why | |-------------|-----|-----| | Session context | End of session | Noise if kept — corrections, tangents, filler | | Task-specific facts | 24-72 hours | Relevant only to the active task | | User preferences | 30-90 days | Preferences change; stale ones cause friction | | Domain knowledge | Permanent (with version) | Core knowledge base; version to handle updates |
When new information contradicts stored memory:
Periodically (daily or weekly), consolidate granular memories into higher-level summaries:
| Rationalization | When It Appears | Why It's Wrong | |----------------|-----------------|----------------| | "Just use a vector store for everything" | Starting a new agent project | Vector stores are terrible for structured facts, entity relationships, and session context. Match memory type to data type. | | "We'll optimize chunking later" | MVP/prototype phase | Chunking determines 80% of retrieval quality. Bad chunks in production poison every answer. Get it right early. | | "Bigger chunks = more context" | Choosing chunk size | Bigger chunks reduce retrieval precision. The embedding represents the average of the chunk — large chunks average out to generic vectors. | | "The embedding model doesn't matter that much" | Selecting infrastructure | Domain-specific embeddings outperform general-purpose by 20-40% on domain queries. This compounds across every retrieval. | | "We can switch embedding models later" | Architecture decisions | Switching requires full re-indexing of every stored vector. On large datasets this is days of compute. Choose carefully upfront. |
When building a new agent memory system:
development
When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.
testing
--- name: using-sharkitect-methodology description: Use when starting any conversation in a Sharkitect workspace OR before any task involving NEW pricing, positioning, proposal, strategy, plan-execution, or schema-design work — mandates invocation of Sharkitect-specific methodology skills (pricing-strategy, marketing-strategy-pmm, smb-cfo, hq-revenue-ops, executing-plans, brainstorming) under the same anti-rationalization discipline as using-superpowers. Documentation has failed 4 times across H
testing
Use when user says 'end session', 'wrap up', 'stop for the day', 'done for today', 'close out', 'save session', 'wrapping up', or invokes /end-session. Runs the full 9-step end-of-session protocol: resource audit, MEMORY.md update, lessons capture, plan status, pending items, workspace checklist, .tmp/ audit, git commit+push, Supabase brain sync, session brief, summary. Final step schedules a detached self-kill of the current session ONLY (3s delay) so the window closes cleanly. Other claude.exe processes (active workspaces) are NOT touched -- orphan cleanup is handled separately by Claude-Orphan-Cleanup-Hourly with proper age safeguards. Do NOT use for: mid-session quick saves (use session-checkpoint), skill syncing (use sync-skills.py), brain memory queries (use supabase-sync.py pull), document freshness reviews (use document-lifecycle), resource gap detection (use resource-auditor).
testing
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, passive voice, negative parallelisms, and filler phrases.