Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

curiositech/always-on-agent-architecture

Name: always-on-agent-architecture
Author: curiositech

skills/always-on-agent-architecture/SKILL.md

npx skillsauth add curiositech/windags-skills always-on-agent-architecture

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

/always-on-agent-architecture — Building Agents That Never Forget

You are designing the architecture for an always-on AI agent with episodic memory. This is not a chatbot with a long context window. This is a system that persists state across sessions, manages its own memory hierarchy, runs as a service, and maintains identity over weeks and months. The core insight: treat the LLM as a CPU that operates on managed memory, not as a stateless function.

Decision Points

Memory Framework Selection Tree

Q1: Do you want a full agent runtime (server, APIs, tools)?
├─ Yes → Use Letta (most complete, production-ready)
└─ No, I have my own agent loop
   ├─ Q2: Do you need temporal/relationship tracking?
   │  ├─ Yes → Use Zep/Graphiti (best temporal knowledge graph)
   │  └─ No → Go to Q3
   │     ├─ Q3: Do you need graph + vector hybrid?
   │     │  ├─ Yes → Use Mem0 (graph mode)
   │     │  └─ No → Go to Q4
   │     │     ├─ Q4: Already on LangGraph?
   │     │     │  ├─ Yes → Use LangMem
   │     │     │  └─ No → Use pgvector or Chroma
   └─ Want zero dependencies? → Custom SQLite + local embeddings

Core Memory Eviction Triggers

| Trigger | Threshold | Action | |---------|-----------|--------| | Size Overflow | Core memory > 4KB | Summarize least-recent block, move summary to archival | | Age Decay | Data unused > 30 days | Mark for compaction review | | Relevance Drop | Access score < 0.3 | Move to archival memory with decay tag | | User Override | User says "forget X" | Immediate removal + archival tombstone | | Conflict Detection | Contradictory facts stored | Prompt agent to reconcile or ask user |

Vector DB Selection Criteria

If query_latency_requirement < 10ms AND data_size > 100M vectors:
    → Use Qdrant (optimized for speed)
Else if already_using_postgresql:
    → Use pgvector (single DB, simpler ops)
Else if need_hybrid_search (keyword + semantic):
    → Use Weaviate (best hybrid)
Else if zero_ops_preferred:
    → Use Pinecone (fully managed)
Else:
    → Use Chroma (local-first, simple API)

Memory Tier Routing Decision

Input: User message or agent observation
│
├─ Contains identity/preference update?
│  └─ Yes → Update core memory, persist immediately
├─ Requires conversation context?
│  └─ Yes → Search recall memory (conversation history)
├─ Needs factual knowledge?
│  └─ Yes → Search archival memory (vector store)
└─ External data needed?
   └─ Yes → Use external tools (APIs, files, etc.)

Failure Modes

Memory Corruption Cascade

Symptoms: Agent personality drift, contradictory responses, core memory conflicts Root Cause: Concurrent writes to core memory without locking, or failed partial updates Detection Rule: If core memory size suddenly drops >50% or contains malformed JSON/YAML Recovery Procedure:

Stop agent immediately to prevent further corruption
Restore core memory from last known good backup (< 1hr old)
Replay conversation log since backup to reconstruct lost updates
Implement write locks on core memory updates before restart

Vector Search Degradation

Symptoms: Increasingly irrelevant search results, agent can't find recently stored facts Root Cause: Embedding model drift, index corruption, or no memory compaction Detection Rule: If average cosine similarity of top-3 results < 0.7 for known queries Recovery Procedure:

Run embedding consistency check on random sample of 100 vectors
If >10% show anomalous embeddings, rebuild entire index
Implement embedding model version pinning
Add embedding drift monitoring to prevent recurrence

Persistence Layer Deadlock

Symptoms: Agent hangs on memory operations, database connection timeouts Root Cause: Simultaneous read/write to same memory blocks, insufficient connection pooling Detection Rule: If memory operation takes >30s or database shows lock wait timeouts Recovery Procedure:

Kill hanging connections to release locks
Implement exponential backoff retry logic for memory operations
Add connection pooling with max connection limits
Review transaction isolation levels for memory updates

Context Window Explosion

Symptoms: API costs spike, response latency increases, token limit errors Root Cause: Core memory bloat, retrieving too many archival chunks per query Detection Rule: If average tokens per request > 80% of model's context limit Recovery Procedure:

Audit core memory size - compress or archive oversized blocks
Reduce archival search result count from default (10 → 5)
Implement token counting before LLM calls
Add cost monitoring alerts for >$1/conversation

Memory Leak - Unbounded Growth

Symptoms: Database size grows linearly, search performance degrades over time Root Cause: No memory compaction, duplicate fact insertion, missing garbage collection Detection Rule: If total memory size grows >100MB/month with normal usage Recovery Procedure:

Run fact deduplication across archival memory (cosine similarity > 0.92)
Implement conversation summarization for recall memory >7 days old
Add relevance scoring with automatic pruning of low-score memories
Schedule weekly compaction jobs

Worked Examples

Example: Building a Personal Research Assistant

Scenario: Design architecture for an agent that helps with technical research, remembers your preferences, and builds knowledge over months.

Step 1 - Memory Tier Design

Core Memory (2KB):
- User name: "Sarah"
- Research domains: ["machine learning", "distributed systems"]
- Preferred paper sources: ["arxiv", "acm digital library"]
- Writing style: "detailed with code examples"
- Current project: "distributed training optimization"

Recall Memory:
- All conversations in PostgreSQL with full-text search
- 30-day retention window, then summarized

Archival Memory:
- Paper summaries, extracted insights, code snippets
- pgvector on PostgreSQL (already using it for recall)
- nomic-embed-text for local embedding (privacy + cost)

Step 2 - Framework Selection Decision Following decision tree:

Need full agent runtime? No (building custom)
Need temporal tracking? No (research facts are mostly timeless)
Need graph+vector? No (simple semantic search sufficient)
Already on LangGraph? No
→ Decision: pgvector + PostgreSQL

Step 3 - Agent Loop Implementation

async def research_step(user_query: str):
    # Load core memory
    core = load_core_memory()  # User prefs, active project
    
    # Check if query relates to current project
    if "optimization" in user_query.lower():
        # Search archival for project-specific knowledge
        relevant_papers = search_archival("distributed training optimization")
        context = f"Current project context: {relevant_papers}"
    else:
        # Search for general domain knowledge
        context = search_archival(user_query)
    
    # Build prompt with core memory + retrieved context
    system_prompt = f"""
    You are Sarah's research assistant.
    User preferences: {core['preferences']}
    Current project: {core['current_project']}
    
    Retrieved context: {context}
    """
    
    response = await llm.chat([
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_query}
    ])
    
    # Persist interaction
    save_to_recall(user_query, response)
    
    return response

What a novice would miss:

Storing raw papers instead of extracted insights in archival
Not implementing conversation search (recall memory)
Putting too much in core memory (research domains list with 50 entries)
No memory compaction strategy

What an expert catches:

Core memory stays focused on "working identity" not knowledge
Archival memory gets curated facts, not raw documents
Implements search before retrieval (agent decides what's relevant)
Plans for memory growth from day one

Quality Gates

Deployment Validation Checklist

[ ] Memory Latency SLO: Core memory loads in <100ms, archival search completes in <500ms
[ ] Memory Consistency: Core memory survives agent restart without corruption (test with deliberate kill)
[ ] Search Relevance: Top-3 archival results have cosine similarity >0.6 for known queries
[ ] Memory Sizing Rules: Core memory ≤4KB, recall retention ≤30 days, archival chunks ≤1KB each
[ ] Persistence Durability: All memory updates survive database restart (ACID compliance verified)
[ ] Cost Controls: Memory operations cost <$0.01/conversation at 100 conversations/day
[ ] Compaction Schedule: Memory compaction runs weekly and reduces total size by ≥10%
[ ] Identity Consistency: Agent personality remains stable across 50+ conversation sessions
[ ] Cold Start Recovery: Agent gracefully handles empty memory state (onboarding flow works)
[ ] Backup Verification: Memory backup restores successfully and preserves agent identity

NOT-FOR Boundaries

Do NOT use this skill for:

Choosing agent training data → Use /always-on-agent-inputs instead - that skill covers what data to feed the agent, this covers how to store and retrieve it
Brainstorming agent applications → Use /always-on-agent-applications instead - that skill covers use case ideation, this covers technical implementation
Agent safety and privacy → Use /always-on-agent-safety instead - that skill covers data governance, consent, and security; this assumes those are already designed
General agentic patterns → Use /agentic-patterns instead - that skill covers ReAct loops, tool use, planning; this covers the persistence layer underneath
One-shot agent tasks → Use /agent-creator instead - if the agent doesn't need to remember across sessions, you don't need always-on architecture
Database schema design → This skill assumes you understand basic database concepts; use database-specific skills for schema optimization
Cost optimization strategies → This skill mentions cost considerations but doesn't deep-dive optimization; delegate to cost-specific skills

curiositech/always-on-agent-architecture

skills/always-on-agent-architecture/SKILL.md

Architecture and systems design for building always-on AI agents with episodic memory. Covers the memory hierarchy (core/recall/archival), persistence layers, agent server infrastructure, vector stores, and framework selection. Provides concrete deployment patterns for agents that maintain identity and learn across sessions. Activate on: "always-on agent", "persistent agent architecture", "episodic memory system", "agent memory design", "long-running agent", "stateful agent", "agent that remembers", "MemGPT architecture", "Letta deployment", "/always-on-agent-architecture". NOT for: choosing what data to feed the agent (use always-on-agent-inputs), brainstorming applications (use always-on-agent-applications), safety and privacy concerns (use always-on-agent-safety), general agentic patterns (use agentic-patterns).

development

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add curiositech/windags-skills always-on-agent-architecture

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 4, 2026, 1:32 PM187.8s1 file scanned

SKILL.md

license:: Apache-2.0
name:: always-on-agent-architecture
description:: |
Architecture and systems design for building always-on AI agents with episodic memory. Covers the memory hierarchy (core/recall/archival), persistence layers, agent server infrastructure, vector stores, and framework selection. Provides concrete deployment patterns for agents that maintain identity and learn across sessions. Activate on:: always-on agent", "persistent agent architecture", "episodic memory system", "agent memory design", "long-running agent", "stateful agent", "agent that remembers", "MemGPT architecture", "Letta deployment", "/always-on-agent-architecture". NOT for: choosing what data to feed the agent (use always-on-agent-inputs), brainstorming applications (use always-on-agent-applications), safety and privacy concerns (use always-on-agent-safety), general agentic patterns (use agentic-patterns).
category:: Agent & Orchestration
- skill:: agent-creator
reason:: The agent-creator skill handles building the agent itself; this skill handles the persistence layer

/always-on-agent-architecture — Building Agents That Never Forget

Decision Points

Memory Framework Selection Tree

Q1: Do you want a full agent runtime (server, APIs, tools)?
├─ Yes → Use Letta (most complete, production-ready)
└─ No, I have my own agent loop
   ├─ Q2: Do you need temporal/relationship tracking?
   │  ├─ Yes → Use Zep/Graphiti (best temporal knowledge graph)
   │  └─ No → Go to Q3
   │     ├─ Q3: Do you need graph + vector hybrid?
   │     │  ├─ Yes → Use Mem0 (graph mode)
   │     │  └─ No → Go to Q4
   │     │     ├─ Q4: Already on LangGraph?
   │     │     │  ├─ Yes → Use LangMem
   │     │     │  └─ No → Use pgvector or Chroma
   └─ Want zero dependencies? → Custom SQLite + local embeddings

Core Memory Eviction Triggers

Vector DB Selection Criteria

If query_latency_requirement < 10ms AND data_size > 100M vectors:
    → Use Qdrant (optimized for speed)
Else if already_using_postgresql:
    → Use pgvector (single DB, simpler ops)
Else if need_hybrid_search (keyword + semantic):
    → Use Weaviate (best hybrid)
Else if zero_ops_preferred:
    → Use Pinecone (fully managed)
Else:
    → Use Chroma (local-first, simple API)

Memory Tier Routing Decision

Input: User message or agent observation
│
├─ Contains identity/preference update?
│  └─ Yes → Update core memory, persist immediately
├─ Requires conversation context?
│  └─ Yes → Search recall memory (conversation history)
├─ Needs factual knowledge?
│  └─ Yes → Search archival memory (vector store)
└─ External data needed?
   └─ Yes → Use external tools (APIs, files, etc.)

Failure Modes

Memory Corruption Cascade

Stop agent immediately to prevent further corruption
Restore core memory from last known good backup (< 1hr old)
Replay conversation log since backup to reconstruct lost updates
Implement write locks on core memory updates before restart

Vector Search Degradation

Run embedding consistency check on random sample of 100 vectors
If >10% show anomalous embeddings, rebuild entire index
Implement embedding model version pinning
Add embedding drift monitoring to prevent recurrence

Persistence Layer Deadlock

Kill hanging connections to release locks
Implement exponential backoff retry logic for memory operations
Add connection pooling with max connection limits
Review transaction isolation levels for memory updates

Context Window Explosion

Audit core memory size - compress or archive oversized blocks
Reduce archival search result count from default (10 → 5)
Implement token counting before LLM calls
Add cost monitoring alerts for >$1/conversation

Memory Leak - Unbounded Growth

Run fact deduplication across archival memory (cosine similarity > 0.92)
Implement conversation summarization for recall memory >7 days old
Add relevance scoring with automatic pruning of low-score memories
Schedule weekly compaction jobs

Worked Examples

Example: Building a Personal Research Assistant

Scenario: Design architecture for an agent that helps with technical research, remembers your preferences, and builds knowledge over months.

Step 1 - Memory Tier Design

Core Memory (2KB):
- User name: "Sarah"
- Research domains: ["machine learning", "distributed systems"]
- Preferred paper sources: ["arxiv", "acm digital library"]
- Writing style: "detailed with code examples"
- Current project: "distributed training optimization"

Recall Memory:
- All conversations in PostgreSQL with full-text search
- 30-day retention window, then summarized

Archival Memory:
- Paper summaries, extracted insights, code snippets
- pgvector on PostgreSQL (already using it for recall)
- nomic-embed-text for local embedding (privacy + cost)

Step 2 - Framework Selection Decision Following decision tree:

Need full agent runtime? No (building custom)
Need temporal tracking? No (research facts are mostly timeless)
Need graph+vector? No (simple semantic search sufficient)
Already on LangGraph? No
→ Decision: pgvector + PostgreSQL

Step 3 - Agent Loop Implementation

async def research_step(user_query: str):
    # Load core memory
    core = load_core_memory()  # User prefs, active project
    
    # Check if query relates to current project
    if "optimization" in user_query.lower():
        # Search archival for project-specific knowledge
        relevant_papers = search_archival("distributed training optimization")
        context = f"Current project context: {relevant_papers}"
    else:
        # Search for general domain knowledge
        context = search_archival(user_query)
    
    # Build prompt with core memory + retrieved context
    system_prompt = f"""
    You are Sarah's research assistant.
    User preferences: {core['preferences']}
    Current project: {core['current_project']}
    
    Retrieved context: {context}
    """
    
    response = await llm.chat([
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_query}
    ])
    
    # Persist interaction
    save_to_recall(user_query, response)
    
    return response

What a novice would miss:

Storing raw papers instead of extracted insights in archival
Not implementing conversation search (recall memory)
Putting too much in core memory (research domains list with 50 entries)
No memory compaction strategy

What an expert catches:

Core memory stays focused on "working identity" not knowledge
Archival memory gets curated facts, not raw documents
Implements search before retrieval (agent decides what's relevant)
Plans for memory growth from day one

Quality Gates

Deployment Validation Checklist

[ ] Memory Latency SLO: Core memory loads in <100ms, archival search completes in <500ms
[ ] Memory Consistency: Core memory survives agent restart without corruption (test with deliberate kill)
[ ] Search Relevance: Top-3 archival results have cosine similarity >0.6 for known queries
[ ] Memory Sizing Rules: Core memory ≤4KB, recall retention ≤30 days, archival chunks ≤1KB each
[ ] Persistence Durability: All memory updates survive database restart (ACID compliance verified)
[ ] Cost Controls: Memory operations cost <$0.01/conversation at 100 conversations/day
[ ] Compaction Schedule: Memory compaction runs weekly and reduces total size by ≥10%
[ ] Identity Consistency: Agent personality remains stable across 50+ conversation sessions
[ ] Cold Start Recovery: Agent gracefully handles empty memory state (onboarding flow works)
[ ] Backup Verification: Memory backup restores successfully and preserves agent identity

NOT-FOR Boundaries

Do NOT use this skill for:

Choosing agent training data → Use /always-on-agent-inputs instead - that skill covers what data to feed the agent, this covers how to store and retrieve it
Brainstorming agent applications → Use /always-on-agent-applications instead - that skill covers use case ideation, this covers technical implementation
Agent safety and privacy → Use /always-on-agent-safety instead - that skill covers data governance, consent, and security; this assumes those are already designed
General agentic patterns → Use /agentic-patterns instead - that skill covers ReAct loops, tool use, planning; this covers the persistence layer underneath
One-shot agent tasks → Use /agent-creator instead - if the agent doesn't need to remember across sessions, you don't need always-on architecture
Database schema design → This skill assumes you understand basic database concepts; use database-specific skills for schema optimization
Cost optimization strategies → This skill mentions cost considerations but doesn't deep-dive optimization; delegate to cost-specific skills

Related Skills

curiositech/revisiting-interview-data-analysing-turn

data-ai

VerifiedTrustedCommunity

license: Apache-2.0 NOT for unrelated tasks outside this domain.

8SKILL.mdUpdated Jul 19, 2026

curiositech/revisiting-interview-data-analysing-turn

curiositech/redis-patterns-expert

development

VerifiedTrustedCommunity

Use when designing caching strategies (cache-aside, write-through, write-behind), implementing distributed locks, building rate limiters, leaderboards, real-time streams (XADD/consumer groups), pub/sub, or tuning eviction policies. Triggers: thundering-herd on cache miss, dogpile on key expiry, Redlock vs SET-NX-PX choice, sliding-window rate limiter, hot-key on a single cluster slot, big-key blowup, MULTI/EXEC across slots, KEYS in production. NOT for Redis Cluster operations/admin (different domain), embedded KV (SQLite, leveldb), in-process LRU caches, or Memcached.

8SKILL.mdUpdated Jul 19, 2026

curiositech/redis-patterns-expert

curiositech/react-server-components-boundary

tools

VerifiedTrustedCommunity

Drawing the `'use client'` boundary correctly in React Server Components apps (Next.js App Router, RSC frameworks) — leaf-pushing, slot composition, serialization rules, and environment poisoning prevention. Grounded in react.dev and Next.js 16 docs.

8SKILL.mdUpdated Jul 19, 2026

curiositech/react-server-components-boundary

curiositech/rate-limiting-strategy

development

VerifiedTrustedCommunity

Use when designing rate limiting for an API, choosing between token bucket / sliding window / leaky bucket / fixed window, implementing it in Redis, deciding edge (Cloudflare/Upstash) vs origin enforcement, sizing per-user vs per-IP vs per-endpoint quotas, returning the right 429 response with Retry-After, or fixing the boundary-burst bug in fixed-window limiters. Triggers: 429 too many requests, INCR + EXPIRE, ZADD + ZREMRANGEBYSCORE + ZCARD, X-RateLimit-Remaining header, Cloudflare WAF rate limiting rules, Upstash @upstash/ratelimit, leaky bucket shaping vs policing, distributed rate limiter consistency. NOT for DDoS mitigation specifically (different scale), CAPTCHA / bot management, full WAF design, or per-user quota billing.

8SKILL.mdUpdated Jul 19, 2026

curiositech/rate-limiting-strategy

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/curiositech/windags-skills.git

# Copy into Claude Code skills folder (global)
cp -r windags-skills/skills/always-on-agent-architecture ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

curiositech/windags-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT