moai-alfred-context-budget

Enterprise Context Window Optimization for Claude Code

Overview

Enterprise-grade context window management for Claude Code covering 200K token optimization, aggressive clearing strategies, memory file management, MCP server optimization, and 2025 best practices for maintaining high-quality AI-assisted development sessions.

Core Capabilities:

✅ Context budget allocation (200K tokens)
✅ Aggressive context clearing patterns
✅ Memory file optimization (<500 lines each)
✅ MCP server efficiency monitoring
✅ Strategic chunking for long-running tasks
✅ Quality-over-quantity principles

Quick Reference

When to Use This Skill

Automatic Activation:

Context window approaching 80% usage
Performance degradation detected
Session handoff preparation
Large project context management

Manual Invocation:

Skill("moai-alfred-context-budget")

Key Principles (2025)

Avoid Last 20% - Performance degrades in final fifth of context
Aggressive Clearing - /clear every 1-3 messages for quality
Lean Memory Files - Keep each file < 500 lines
Disable Unused MCPs - Each server adds tool definitions
Quality > Quantity - 10% with relevant info beats 90% with noise

Pattern 1: Context Budget Allocation

Overview

Claude Code provides 200K token context window with strategic allocation across system, tools, history, and working context.

Context Budget Breakdown

# Claude Code Context Budget (200K tokens)
Total Context Window: 200,000 tokens

Allocation:
  System Prompt: 2,000 tokens (1%)
    - Core instructions
    - CLAUDE.md project guidelines
    - Agent directives
  
  Tool Definitions: 5,000 tokens (2.5%)
    - Read, Write, Edit, Bash, etc.
    - MCP server tools (Context7, Playwright, etc.)
    - Skill() invocation metadata
  
  Session History: 30,000 tokens (15%)
    - Previous messages
    - Tool call results
    - User interactions
  
  Project Context: 40,000 tokens (20%)
    - Memory files (.moai/memory/)
    - Key source files
    - Documentation snippets
  
  Available for Response: 123,000 tokens (61.5%)
    - Current task processing
    - Code generation
    - Analysis output

Monitoring Context Usage

# Check current context usage
/context

# Example output interpretation:
# Context Usage: 156,234 / 200,000 tokens (78%)
# ⚠️ WARNING: Approaching 80% threshold
# Action: Consider /clear or archive old discussions

Context Budget Anti-Patterns

# BAD: Unoptimized Context
Session History: 80,000 tokens (40%)  # Too much history
  - 50 messages of exploratory debugging
  - Stale error logs from 2 hours ago
  - Repeated "try this" iterations

Project Context: 90,000 tokens (45%)  # Too much loaded
  - Entire src/ directory (unnecessary)
  - node_modules types (never needed)
  - 10 documentation files (only need 2)

Available for Response: 23,000 tokens (11.5%)  # TOO LOW!
  - Can't generate quality code
  - Forced to truncate responses
  - Poor reasoning quality

# GOOD: Optimized Context
Session History: 15,000 tokens (7.5%)  # Cleared regularly
  - Only last 5-7 relevant messages
  - Current task discussion
  - Key decisions documented

Project Context: 25,000 tokens (12.5%)  # Targeted loading
  - 3-4 files for current task
  - CLAUDE.md (always)
  - Specific memory files (on-demand)

Available for Response: 155,000 tokens (77.5%)  # OPTIMAL!
  - High-quality code generation
  - Deep reasoning capacity
  - Complex refactoring support

Pattern 2: Aggressive Context Clearing

Overview

The /clear command should become muscle memory, executed every 1-3 messages to maintain output quality.

When to Clear Context

// Decision Tree for /clear Usage

interface ContextClearingStrategy {
  trigger: string;
  frequency: string;
  action: string;
}

const clearingStrategies: ContextClearingStrategy[] = [
  {
    trigger: "Task completed",
    frequency: "Every task",
    action: "/clear immediately after success"
  },
  {
    trigger: "Context > 80%",
    frequency: "Automatic",
    action: "/clear + document key decisions in memory file"
  },
  {
    trigger: "Debugging session",
    frequency: "Every 3 attempts",
    action: "/clear stale error logs, keep only current"
  },
  {
    trigger: "Switching tasks",
    frequency: "Every switch",
    action: "/clear + update session-summary.md"
  },
  {
    trigger: "Poor output quality",
    frequency: "Immediate",
    action: "/clear + re-state requirements concisely"
  }
];

Clearing Workflow Pattern

#!/bin/bash
# Example: Task completion workflow with clearing

# Step 1: Complete current task
implement_feature() {
    echo "Implementing authentication..."
    # ... work done ...
    echo "✓ Authentication implemented"
}

# Step 2: Document key decisions BEFORE clearing
document_decision() {
    cat >> .moai/memory/auth-decisions.md <<EOF
## Authentication Implementation ($(date +%Y-%m-%d))

**Approach**: JWT with httpOnly cookies
**Rationale**: Prevents XSS attacks, CSRF protection via SameSite
**Key Files**: src/auth/jwt.ts, src/middleware/auth-check.ts
EOF
}

# Step 3: Clear context
# /clear

# Step 4: Start fresh with next task
start_next_task() {
    echo "Context cleared. Starting API rate limiting..."
}

# Workflow
implement_feature
document_decision
# User manually executes: /clear
# start_next_task

What to Clear vs Keep

# ✅ ALWAYS CLEAR (After Documenting)
Clear:
  - Exploratory debugging sessions
  - "Try this" iteration history
  - Stale error logs
  - Completed task discussions
  - Old file diffs
  - Abandoned approaches

# 📝 DOCUMENT BEFORE CLEARING
Document First:
  - Key architectural decisions
  - Non-obvious implementation choices
  - Failed approaches (why they failed)
  - Performance insights
  - Security considerations

# ❌ NEVER CLEAR (Part of Project Context)
Keep:
  - CLAUDE.md (project guidelines)
  - Active memory files
  - Current task requirements
  - Ongoing conversation
  - Recent (< 3 messages) exchanges

Pattern 3: Memory File Management

Overview

Memory files are read at session start, consuming context tokens. Keep them lean and focused.

Memory File Structure (Best Practices)

.moai/memory/
├── session-summary.md          # < 300 lines (current session state)
├── architectural-decisions.md   # < 400 lines (ADRs)
├── api-contracts.md            # < 200 lines (interface specs)
├── known-issues.md             # < 150 lines (blockers, workarounds)
└── team-conventions.md         # < 200 lines (code style, patterns)

Total Memory Budget: < 1,250 lines (~25K tokens)

Memory File Template

<!-- .moai/memory/session-summary.md -->
# Session Summary

**Last Updated**: 2025-01-12 14:30
**Current Sprint**: Feature/Auth-Refactor
**Active Tasks**: 2 in progress, 3 pending

## Current State

### ✅ Completed This Session
1. JWT authentication implementation (commit: abc123)
2. Password hashing with bcrypt (commit: def456)

### 🔄 In Progress
1. OAuth2 integration (70% complete)
   - Provider setup done
   - Callback handler in progress
   - Files: src/auth/oauth.ts

### 📋 Pending
1. Rate limiting middleware
2. Session management
3. CSRF protection

## Key Decisions

**Auth Strategy**: JWT in httpOnly cookies (XSS prevention)
**Password Min Length**: 12 chars (OWASP 2025 recommendation)

## Blockers

None currently.

## Next Actions

1. Complete OAuth callback handler
2. Add tests for OAuth flow
3. Document OAuth setup in README

Memory File Anti-Patterns

<!-- ❌ BAD: Bloated Memory File (1,200 lines) -->
# Session Summary

## Completed Tasks (Last 3 Weeks)
<!-- 800 lines of old task history -->
<!-- This is what git commit history is for! -->

## All Code Snippets Ever Written
```javascript
// 400 lines of full code snippets
// Should be in git, not memory files

Session Summary

Last Updated: 2025-01-12 14:30

Active Work (This Session)

OAuth integration: 70% (src/auth/oauth.ts)
Blocker: None

Key Decisions (Last 7 Days)

Auth: JWT in httpOnly cookies (XSS prevention)
Hashing: bcrypt, cost factor 12

Next Actions

Complete OAuth callback
Add OAuth tests
Update README


### Memory File Rotation Strategy

```bash
#!/bin/bash
# Rotate memory files when they exceed limits

rotate_memory_file() {
    local file="$1"
    local max_lines=500
    local current_lines=$(wc -l < "$file")
    
    if [[ $current_lines -gt $max_lines ]]; then
        echo "Rotating $file ($current_lines lines > $max_lines limit)"
        
        # Archive old content
        local timestamp=$(date +%Y%m%d)
        local archive_dir=".moai/memory/archive"
        mkdir -p "$archive_dir"
        
        # Keep only recent content (last 300 lines)
        tail -n 300 "$file" > "${file}.tmp"
        
        # Archive full file
        mv "$file" "${archive_dir}/$(basename "$file" .md)-${timestamp}.md"
        
        # Replace with trimmed version
        mv "${file}.tmp" "$file"
        
        echo "✓ Archived to ${archive_dir}/"
    fi
}

# Check all memory files
for file in .moai/memory/*.md; do
    rotate_memory_file "$file"
done

Pattern 4: MCP Server Optimization

Overview

Each enabled MCP server adds tool definitions to system prompt, consuming context tokens. Disable unused servers.

MCP Context Impact

// .claude/mcp.json - Context-optimized configuration

{
  "mcpServers": {
    // ✅ ENABLED: Active development tools
    "context7": {
      "command": "npx",
      "args": ["-y", "@context7/mcp"],
      "env": {
        "CONTEXT7_API_KEY": "your-key"
      }
    },
    
    // ❌ DISABLED: Not needed for current project
    // "playwright": {
    //   "command": "npx",
    //   "args": ["-y", "@playwright/mcp"]
    // },
    
    // ✅ ENABLED: Documentation research
    "sequential-thinking": {
      "command": "npx",
      "args": ["-y", "@sequential-thinking/mcp"]
    }
    
    // ❌ DISABLED: Slackbot not in use
    // "slack": {
    //   "command": "npx",
    //   "args": ["-y", "@slack/mcp"]
    // }
  }
}

Measuring MCP Overhead

# Monitor MCP context usage
/context

# Example output:
# MCP Servers (3 enabled):
# - context7: 847 tokens (tool definitions)
# - sequential-thinking: 412 tokens
# - playwright: 1,234 tokens (DISABLED to save tokens)
#
# Total MCP Overhead: 1,259 tokens

MCP Usage Strategy

class MCPUsageStrategy:
    """Strategic MCP server management for context optimization"""
    
    STRATEGIES = {
        "documentation_heavy": {
            "enable": ["context7"],
            "disable": ["playwright", "slack", "github"],
            "rationale": "Research phase, need API docs access"
        },
        "testing_phase": {
            "enable": ["playwright", "sequential-thinking"],
            "disable": ["context7", "slack"],
            "rationale": "E2E testing, browser automation needed"
        },
        "code_review": {
            "enable": ["github", "sequential-thinking"],
            "disable": ["context7", "playwright", "slack"],
            "rationale": "PR review, need GitHub API access"
        },
        "minimal": {
            "enable": [],
            "disable": ["*"],
            "rationale": "Maximum context availability, no external tools"
        }
    }
    
    @staticmethod
    def optimize_for_phase(phase: str):
        """
        Reconfigure .claude/mcp.json for current development phase
        """
        strategy = MCPUsageStrategy.STRATEGIES.get(phase, "minimal")
        print(f"Optimizing MCP servers for: {phase}")
        print(f"Enable: {strategy['enable']}")
        print(f"Disable: {strategy['disable']}")
        print(f"Rationale: {strategy['rationale']}")
        # Update .claude/mcp.json accordingly

Pattern 5: Strategic Chunking

Overview

Break large tasks into smaller pieces completable within optimal context bounds (< 80% usage).

Task Chunking Strategy

// Task size estimation and chunking

interface Task {
  name: string;
  estimatedTokens: number;
  dependencies: string[];
}

const chunkTask = (largeTask: Task): Task[] => {
  const MAX_CHUNK_TOKENS = 120_000; // 60% of 200K context
  
  if (largeTask.estimatedTokens <= MAX_CHUNK_TOKENS) {
    return [largeTask]; // No chunking needed
  }
  
  // Example: Authentication system (estimated 250K tokens)
  const chunks: Task[] = [
    {
      name: "Chunk 1: User model & password hashing",
      estimatedTokens: 80_000,
      dependencies: []
    },
    {
      name: "Chunk 2: JWT generation & validation",
      estimatedTokens: 70_000,
      dependencies: ["Chunk 1"]
    },
    {
      name: "Chunk 3: Login/logout endpoints",
      estimatedTokens: 60_000,
      dependencies: ["Chunk 2"]
    },
    {
      name: "Chunk 4: Session middleware & guards",
      estimatedTokens: 40_000,
      dependencies: ["Chunk 3"]
    }
  ];
  
  return chunks;
};

// Workflow:
// 1. Complete Chunk 1
// 2. /clear
// 3. Document Chunk 1 results in memory file
// 4. Start Chunk 2 with minimal context

Chunking Anti-Patterns

# ❌ BAD: Mixing Unrelated Tasks
Chunk 1 (200K tokens - OVERLOADED):
  - User authentication
  - Payment processing
  - Email notifications
  - Admin dashboard
  - Analytics integration
# Result: Poor quality on ALL tasks, context overflow

# ✅ GOOD: Focused Chunks
Chunk 1 (60K tokens):
  - User authentication only
  - Complete, test, document
  
Chunk 2 (70K tokens):
  - Payment processing only
  - Builds on auth from Chunk 1
  
Chunk 3 (50K tokens):
  - Email notifications
  - Uses auth + payment data

Pattern 6: Quality Over Quantity Context

Overview

10% context with highly relevant information produces better results than 90% filled with noise.

Context Quality Checklist

## Before Adding to Context

Ask yourself:

1. **Relevance**: Does this directly support current task?
   - ✅ YES: Load file
   - ❌ NO: Skip or summarize

2. **Freshness**: Is this information current?
   - ✅ Current: Keep in context
   - ❌ Stale (>1 hour): Archive or delete

3. **Actionability**: Will Claude use this to generate code?
   - ✅ Actionable: Include
   - ❌ FYI only: Document in memory file, remove from context

4. **Uniqueness**: Is this duplicated elsewhere?
   - ✅ Unique: Keep
   - ❌ Duplicate: Remove duplicates, keep one canonical source

## High-Quality Context Example (30K tokens, 15%)

Context Contents:
1. CLAUDE.md (2K tokens) - Always loaded
2. src/auth/jwt.ts (5K tokens) - Current file being edited
3. src/types/auth.ts (3K tokens) - Type definitions needed
4. .moai/memory/session-summary.md (4K tokens) - Current session state
5. tests/auth.test.ts (8K tokens) - Test file for reference
6. Last 5 messages (8K tokens) - Recent discussion

Total: 30K tokens
Quality: HIGH - Every token is directly relevant to current task

## Low-Quality Context Example (170K tokens, 85%)

Context Contents:
1. CLAUDE.md (2K tokens)
2. Entire src/ directory (80K tokens) - ❌ 90% irrelevant
3. node_modules/ types (40K tokens) - ❌ Never needed
4. 50 previous messages (30K tokens) - ❌ Stale debugging sessions
5. 10 documentation files (18K tokens) - ❌ Only need 1-2

Total: 170K tokens
Quality: LOW - <10% of tokens are actually useful
Result: Poor code generation, missed context, truncated responses

Best Practices Checklist

Context Allocation:

[ ] Context usage maintained below 80%
[ ] System prompt < 2K tokens
[ ] MCP tools < 5K tokens total
[ ] Session history < 30K tokens
[ ] Project context < 40K tokens
[ ] Available response capacity > 100K tokens

Aggressive Clearing:

[ ] /clear executed every 1-3 messages
[ ] Context cleared after each task completion
[ ] Key decisions documented before clearing
[ ] Stale error logs removed immediately
[ ] Exploratory sessions cleared regularly

Memory File Management:

[ ] Each memory file < 500 lines
[ ] Total memory files < 1,250 lines
[ ] session-summary.md updated before task switches
[ ] Old content archived to .moai/memory/archive/
[ ] No raw code stored in memory (summarize instead)

MCP Optimization:

[ ] Unused MCP servers disabled
[ ] /context checked regularly
[ ] MCP overhead < 5K tokens
[ ] Servers enabled/disabled per development phase

Strategic Chunking:

[ ] Large tasks split into < 120K token chunks
[ ] Related work grouped in same chunk
[ ] Chunk dependencies documented
[ ] /clear between chunks
[ ] Previous chunk results in memory file

Quality Over Quantity:

[ ] Only load files needed for current task
[ ] Remove stale information (>1 hour old)
[ ] Eliminate duplicate context
[ ] Summarize instead of including full files
[ ] Verify every loaded item is actionable

Common Pitfalls to Avoid

Pitfall 1: Loading Entire Codebase

# ❌ BAD
# User: "Help me understand this project"
# Claude loads all 200 files in src/

# ✅ GOOD
# User: "Help me understand the authentication flow"
# Claude loads only:
# - src/auth/jwt.ts
# - src/middleware/auth-check.ts
# - tests/auth.test.ts

Pitfall 2: Never Clearing Context

# ❌ BAD: 3-Hour Session Without Clearing
Context: 195K / 200K tokens (97.5%)
  - 80 messages of trial-and-error debugging
  - 15 failed approaches still in context
  - Stale error logs from 2 hours ago
Result: "I need to truncate my response..."

# ✅ GOOD: Clearing Every 5-10 Minutes
Context: 45K / 200K tokens (22.5%)
  - Only last 5 relevant messages
  - Current task files
  - Fresh, high-quality context
Result: Complete, high-quality responses

Pitfall 3: Bloated Memory Files

<!-- ❌ BAD: 2,000-line session-summary.md -->
- Takes 40K tokens just to load
- 90% is outdated information
- Prevents loading actual source files

<!-- ✅ GOOD: 250-line session-summary.md -->
- Takes 5K tokens to load
- 100% current and relevant
- Leaves room for source files

Tool Versions (2025)

| Tool | Version | Purpose | |------|---------|---------| | Claude Code | 1.5.0+ | CLI interface | | Claude Sonnet | 4.5+ | Model (200K context) | | Context7 MCP | Latest | Documentation research | | Sequential Thinking MCP | Latest | Problem solving |

References

Claude Code Context Management - Official documentation
Claude Code Best Practices - Community guide
Context Window Optimization - 2025 deep dive
Memory Management Strategies - Advanced patterns

Changelog

v4.0.0 (2025-01-12): Enterprise upgrade with 2025 best practices, aggressive clearing patterns, MCP optimization, strategic chunking
v1.0.0 (2025-03-29): Initial release

Works Well With

moai-alfred-practices - Development best practices
moai-alfred-session-state - Session management
moai-cc-memory - Memory file patterns
moai-alfred-workflow - 4-step workflow optimization

moai-alfred-context-budget

Enterprise Context Window Optimization for Claude Code

Overview

Core Capabilities:

✅ Context budget allocation (200K tokens)
✅ Aggressive context clearing patterns
✅ Memory file optimization (<500 lines each)
✅ MCP server efficiency monitoring
✅ Strategic chunking for long-running tasks
✅ Quality-over-quantity principles

Quick Reference

When to Use This Skill

Automatic Activation:

Context window approaching 80% usage
Performance degradation detected
Session handoff preparation
Large project context management

Manual Invocation:

Skill("moai-alfred-context-budget")

Key Principles (2025)

Avoid Last 20% - Performance degrades in final fifth of context
Aggressive Clearing - /clear every 1-3 messages for quality
Lean Memory Files - Keep each file < 500 lines
Disable Unused MCPs - Each server adds tool definitions
Quality > Quantity - 10% with relevant info beats 90% with noise

Pattern 1: Context Budget Allocation

Overview

Claude Code provides 200K token context window with strategic allocation across system, tools, history, and working context.

Context Budget Breakdown

# Claude Code Context Budget (200K tokens)
Total Context Window: 200,000 tokens

Allocation:
  System Prompt: 2,000 tokens (1%)
    - Core instructions
    - CLAUDE.md project guidelines
    - Agent directives
  
  Tool Definitions: 5,000 tokens (2.5%)
    - Read, Write, Edit, Bash, etc.
    - MCP server tools (Context7, Playwright, etc.)
    - Skill() invocation metadata
  
  Session History: 30,000 tokens (15%)
    - Previous messages
    - Tool call results
    - User interactions
  
  Project Context: 40,000 tokens (20%)
    - Memory files (.moai/memory/)
    - Key source files
    - Documentation snippets
  
  Available for Response: 123,000 tokens (61.5%)
    - Current task processing
    - Code generation
    - Analysis output

Monitoring Context Usage

# Check current context usage
/context

# Example output interpretation:
# Context Usage: 156,234 / 200,000 tokens (78%)
# ⚠️ WARNING: Approaching 80% threshold
# Action: Consider /clear or archive old discussions

Context Budget Anti-Patterns

# BAD: Unoptimized Context
Session History: 80,000 tokens (40%)  # Too much history
  - 50 messages of exploratory debugging
  - Stale error logs from 2 hours ago
  - Repeated "try this" iterations

Project Context: 90,000 tokens (45%)  # Too much loaded
  - Entire src/ directory (unnecessary)
  - node_modules types (never needed)
  - 10 documentation files (only need 2)

Available for Response: 23,000 tokens (11.5%)  # TOO LOW!
  - Can't generate quality code
  - Forced to truncate responses
  - Poor reasoning quality

# GOOD: Optimized Context
Session History: 15,000 tokens (7.5%)  # Cleared regularly
  - Only last 5-7 relevant messages
  - Current task discussion
  - Key decisions documented

Project Context: 25,000 tokens (12.5%)  # Targeted loading
  - 3-4 files for current task
  - CLAUDE.md (always)
  - Specific memory files (on-demand)

Available for Response: 155,000 tokens (77.5%)  # OPTIMAL!
  - High-quality code generation
  - Deep reasoning capacity
  - Complex refactoring support

Pattern 2: Aggressive Context Clearing

Overview

The /clear command should become muscle memory, executed every 1-3 messages to maintain output quality.

When to Clear Context

// Decision Tree for /clear Usage

interface ContextClearingStrategy {
  trigger: string;
  frequency: string;
  action: string;
}

const clearingStrategies: ContextClearingStrategy[] = [
  {
    trigger: "Task completed",
    frequency: "Every task",
    action: "/clear immediately after success"
  },
  {
    trigger: "Context > 80%",
    frequency: "Automatic",
    action: "/clear + document key decisions in memory file"
  },
  {
    trigger: "Debugging session",
    frequency: "Every 3 attempts",
    action: "/clear stale error logs, keep only current"
  },
  {
    trigger: "Switching tasks",
    frequency: "Every switch",
    action: "/clear + update session-summary.md"
  },
  {
    trigger: "Poor output quality",
    frequency: "Immediate",
    action: "/clear + re-state requirements concisely"
  }
];

Clearing Workflow Pattern

#!/bin/bash
# Example: Task completion workflow with clearing

# Step 1: Complete current task
implement_feature() {
    echo "Implementing authentication..."
    # ... work done ...
    echo "✓ Authentication implemented"
}

# Step 2: Document key decisions BEFORE clearing
document_decision() {
    cat >> .moai/memory/auth-decisions.md <<EOF
## Authentication Implementation ($(date +%Y-%m-%d))

**Approach**: JWT with httpOnly cookies
**Rationale**: Prevents XSS attacks, CSRF protection via SameSite
**Key Files**: src/auth/jwt.ts, src/middleware/auth-check.ts
EOF
}

# Step 3: Clear context
# /clear

# Step 4: Start fresh with next task
start_next_task() {
    echo "Context cleared. Starting API rate limiting..."
}

# Workflow
implement_feature
document_decision
# User manually executes: /clear
# start_next_task

What to Clear vs Keep

# ✅ ALWAYS CLEAR (After Documenting)
Clear:
  - Exploratory debugging sessions
  - "Try this" iteration history
  - Stale error logs
  - Completed task discussions
  - Old file diffs
  - Abandoned approaches

# 📝 DOCUMENT BEFORE CLEARING
Document First:
  - Key architectural decisions
  - Non-obvious implementation choices
  - Failed approaches (why they failed)
  - Performance insights
  - Security considerations

# ❌ NEVER CLEAR (Part of Project Context)
Keep:
  - CLAUDE.md (project guidelines)
  - Active memory files
  - Current task requirements
  - Ongoing conversation
  - Recent (< 3 messages) exchanges

Pattern 3: Memory File Management

Overview

Memory files are read at session start, consuming context tokens. Keep them lean and focused.

Memory File Structure (Best Practices)

.moai/memory/
├── session-summary.md          # < 300 lines (current session state)
├── architectural-decisions.md   # < 400 lines (ADRs)
├── api-contracts.md            # < 200 lines (interface specs)
├── known-issues.md             # < 150 lines (blockers, workarounds)
└── team-conventions.md         # < 200 lines (code style, patterns)

Total Memory Budget: < 1,250 lines (~25K tokens)

Memory File Template

<!-- .moai/memory/session-summary.md -->
# Session Summary

**Last Updated**: 2025-01-12 14:30
**Current Sprint**: Feature/Auth-Refactor
**Active Tasks**: 2 in progress, 3 pending

## Current State

### ✅ Completed This Session
1. JWT authentication implementation (commit: abc123)
2. Password hashing with bcrypt (commit: def456)

### 🔄 In Progress
1. OAuth2 integration (70% complete)
   - Provider setup done
   - Callback handler in progress
   - Files: src/auth/oauth.ts

### 📋 Pending
1. Rate limiting middleware
2. Session management
3. CSRF protection

## Key Decisions

**Auth Strategy**: JWT in httpOnly cookies (XSS prevention)
**Password Min Length**: 12 chars (OWASP 2025 recommendation)

## Blockers

None currently.

## Next Actions

1. Complete OAuth callback handler
2. Add tests for OAuth flow
3. Document OAuth setup in README

Memory File Anti-Patterns

<!-- ❌ BAD: Bloated Memory File (1,200 lines) -->
# Session Summary

## Completed Tasks (Last 3 Weeks)
<!-- 800 lines of old task history -->
<!-- This is what git commit history is for! -->

## All Code Snippets Ever Written
```javascript
// 400 lines of full code snippets
// Should be in git, not memory files

Session Summary

Last Updated: 2025-01-12 14:30

Active Work (This Session)

OAuth integration: 70% (src/auth/oauth.ts)
Blocker: None

Key Decisions (Last 7 Days)

Auth: JWT in httpOnly cookies (XSS prevention)
Hashing: bcrypt, cost factor 12

Next Actions

Complete OAuth callback
Add OAuth tests
Update README


### Memory File Rotation Strategy

```bash
#!/bin/bash
# Rotate memory files when they exceed limits

rotate_memory_file() {
    local file="$1"
    local max_lines=500
    local current_lines=$(wc -l < "$file")
    
    if [[ $current_lines -gt $max_lines ]]; then
        echo "Rotating $file ($current_lines lines > $max_lines limit)"
        
        # Archive old content
        local timestamp=$(date +%Y%m%d)
        local archive_dir=".moai/memory/archive"
        mkdir -p "$archive_dir"
        
        # Keep only recent content (last 300 lines)
        tail -n 300 "$file" > "${file}.tmp"
        
        # Archive full file
        mv "$file" "${archive_dir}/$(basename "$file" .md)-${timestamp}.md"
        
        # Replace with trimmed version
        mv "${file}.tmp" "$file"
        
        echo "✓ Archived to ${archive_dir}/"
    fi
}

# Check all memory files
for file in .moai/memory/*.md; do
    rotate_memory_file "$file"
done

Pattern 4: MCP Server Optimization

Overview

Each enabled MCP server adds tool definitions to system prompt, consuming context tokens. Disable unused servers.

MCP Context Impact

// .claude/mcp.json - Context-optimized configuration

{
  "mcpServers": {
    // ✅ ENABLED: Active development tools
    "context7": {
      "command": "npx",
      "args": ["-y", "@context7/mcp"],
      "env": {
        "CONTEXT7_API_KEY": "your-key"
      }
    },
    
    // ❌ DISABLED: Not needed for current project
    // "playwright": {
    //   "command": "npx",
    //   "args": ["-y", "@playwright/mcp"]
    // },
    
    // ✅ ENABLED: Documentation research
    "sequential-thinking": {
      "command": "npx",
      "args": ["-y", "@sequential-thinking/mcp"]
    }
    
    // ❌ DISABLED: Slackbot not in use
    // "slack": {
    //   "command": "npx",
    //   "args": ["-y", "@slack/mcp"]
    // }
  }
}

Measuring MCP Overhead

# Monitor MCP context usage
/context

# Example output:
# MCP Servers (3 enabled):
# - context7: 847 tokens (tool definitions)
# - sequential-thinking: 412 tokens
# - playwright: 1,234 tokens (DISABLED to save tokens)
#
# Total MCP Overhead: 1,259 tokens

MCP Usage Strategy

class MCPUsageStrategy:
    """Strategic MCP server management for context optimization"""
    
    STRATEGIES = {
        "documentation_heavy": {
            "enable": ["context7"],
            "disable": ["playwright", "slack", "github"],
            "rationale": "Research phase, need API docs access"
        },
        "testing_phase": {
            "enable": ["playwright", "sequential-thinking"],
            "disable": ["context7", "slack"],
            "rationale": "E2E testing, browser automation needed"
        },
        "code_review": {
            "enable": ["github", "sequential-thinking"],
            "disable": ["context7", "playwright", "slack"],
            "rationale": "PR review, need GitHub API access"
        },
        "minimal": {
            "enable": [],
            "disable": ["*"],
            "rationale": "Maximum context availability, no external tools"
        }
    }
    
    @staticmethod
    def optimize_for_phase(phase: str):
        """
        Reconfigure .claude/mcp.json for current development phase
        """
        strategy = MCPUsageStrategy.STRATEGIES.get(phase, "minimal")
        print(f"Optimizing MCP servers for: {phase}")
        print(f"Enable: {strategy['enable']}")
        print(f"Disable: {strategy['disable']}")
        print(f"Rationale: {strategy['rationale']}")
        # Update .claude/mcp.json accordingly

Pattern 5: Strategic Chunking

Overview

Break large tasks into smaller pieces completable within optimal context bounds (< 80% usage).

Task Chunking Strategy

// Task size estimation and chunking

interface Task {
  name: string;
  estimatedTokens: number;
  dependencies: string[];
}

const chunkTask = (largeTask: Task): Task[] => {
  const MAX_CHUNK_TOKENS = 120_000; // 60% of 200K context
  
  if (largeTask.estimatedTokens <= MAX_CHUNK_TOKENS) {
    return [largeTask]; // No chunking needed
  }
  
  // Example: Authentication system (estimated 250K tokens)
  const chunks: Task[] = [
    {
      name: "Chunk 1: User model & password hashing",
      estimatedTokens: 80_000,
      dependencies: []
    },
    {
      name: "Chunk 2: JWT generation & validation",
      estimatedTokens: 70_000,
      dependencies: ["Chunk 1"]
    },
    {
      name: "Chunk 3: Login/logout endpoints",
      estimatedTokens: 60_000,
      dependencies: ["Chunk 2"]
    },
    {
      name: "Chunk 4: Session middleware & guards",
      estimatedTokens: 40_000,
      dependencies: ["Chunk 3"]
    }
  ];
  
  return chunks;
};

// Workflow:
// 1. Complete Chunk 1
// 2. /clear
// 3. Document Chunk 1 results in memory file
// 4. Start Chunk 2 with minimal context

Chunking Anti-Patterns

# ❌ BAD: Mixing Unrelated Tasks
Chunk 1 (200K tokens - OVERLOADED):
  - User authentication
  - Payment processing
  - Email notifications
  - Admin dashboard
  - Analytics integration
# Result: Poor quality on ALL tasks, context overflow

# ✅ GOOD: Focused Chunks
Chunk 1 (60K tokens):
  - User authentication only
  - Complete, test, document
  
Chunk 2 (70K tokens):
  - Payment processing only
  - Builds on auth from Chunk 1
  
Chunk 3 (50K tokens):
  - Email notifications
  - Uses auth + payment data

Pattern 6: Quality Over Quantity Context

Overview

10% context with highly relevant information produces better results than 90% filled with noise.

Context Quality Checklist

## Before Adding to Context

Ask yourself:

1. **Relevance**: Does this directly support current task?
   - ✅ YES: Load file
   - ❌ NO: Skip or summarize

2. **Freshness**: Is this information current?
   - ✅ Current: Keep in context
   - ❌ Stale (>1 hour): Archive or delete

3. **Actionability**: Will Claude use this to generate code?
   - ✅ Actionable: Include
   - ❌ FYI only: Document in memory file, remove from context

4. **Uniqueness**: Is this duplicated elsewhere?
   - ✅ Unique: Keep
   - ❌ Duplicate: Remove duplicates, keep one canonical source

## High-Quality Context Example (30K tokens, 15%)

Context Contents:
1. CLAUDE.md (2K tokens) - Always loaded
2. src/auth/jwt.ts (5K tokens) - Current file being edited
3. src/types/auth.ts (3K tokens) - Type definitions needed
4. .moai/memory/session-summary.md (4K tokens) - Current session state
5. tests/auth.test.ts (8K tokens) - Test file for reference
6. Last 5 messages (8K tokens) - Recent discussion

Total: 30K tokens
Quality: HIGH - Every token is directly relevant to current task

## Low-Quality Context Example (170K tokens, 85%)

Context Contents:
1. CLAUDE.md (2K tokens)
2. Entire src/ directory (80K tokens) - ❌ 90% irrelevant
3. node_modules/ types (40K tokens) - ❌ Never needed
4. 50 previous messages (30K tokens) - ❌ Stale debugging sessions
5. 10 documentation files (18K tokens) - ❌ Only need 1-2

Total: 170K tokens
Quality: LOW - <10% of tokens are actually useful
Result: Poor code generation, missed context, truncated responses

Best Practices Checklist

Context Allocation:

[ ] Context usage maintained below 80%
[ ] System prompt < 2K tokens
[ ] MCP tools < 5K tokens total
[ ] Session history < 30K tokens
[ ] Project context < 40K tokens
[ ] Available response capacity > 100K tokens

Aggressive Clearing:

[ ] /clear executed every 1-3 messages
[ ] Context cleared after each task completion
[ ] Key decisions documented before clearing
[ ] Stale error logs removed immediately
[ ] Exploratory sessions cleared regularly

Memory File Management:

[ ] Each memory file < 500 lines
[ ] Total memory files < 1,250 lines
[ ] session-summary.md updated before task switches
[ ] Old content archived to .moai/memory/archive/
[ ] No raw code stored in memory (summarize instead)

MCP Optimization:

[ ] Unused MCP servers disabled
[ ] /context checked regularly
[ ] MCP overhead < 5K tokens
[ ] Servers enabled/disabled per development phase

Strategic Chunking:

[ ] Large tasks split into < 120K token chunks
[ ] Related work grouped in same chunk
[ ] Chunk dependencies documented
[ ] /clear between chunks
[ ] Previous chunk results in memory file

Quality Over Quantity:

[ ] Only load files needed for current task
[ ] Remove stale information (>1 hour old)
[ ] Eliminate duplicate context
[ ] Summarize instead of including full files
[ ] Verify every loaded item is actionable

Common Pitfalls to Avoid

Pitfall 1: Loading Entire Codebase

# ❌ BAD
# User: "Help me understand this project"
# Claude loads all 200 files in src/

# ✅ GOOD
# User: "Help me understand the authentication flow"
# Claude loads only:
# - src/auth/jwt.ts
# - src/middleware/auth-check.ts
# - tests/auth.test.ts

Pitfall 2: Never Clearing Context

# ❌ BAD: 3-Hour Session Without Clearing
Context: 195K / 200K tokens (97.5%)
  - 80 messages of trial-and-error debugging
  - 15 failed approaches still in context
  - Stale error logs from 2 hours ago
Result: "I need to truncate my response..."

# ✅ GOOD: Clearing Every 5-10 Minutes
Context: 45K / 200K tokens (22.5%)
  - Only last 5 relevant messages
  - Current task files
  - Fresh, high-quality context
Result: Complete, high-quality responses

Pitfall 3: Bloated Memory Files

<!-- ❌ BAD: 2,000-line session-summary.md -->
- Takes 40K tokens just to load
- 90% is outdated information
- Prevents loading actual source files

<!-- ✅ GOOD: 250-line session-summary.md -->
- Takes 5K tokens to load
- 100% current and relevant
- Leaves room for source files

Tool Versions (2025)

References

Claude Code Context Management - Official documentation
Claude Code Best Practices - Community guide
Context Window Optimization - 2025 deep dive
Memory Management Strategies - Advanced patterns

Changelog

v4.0.0 (2025-01-12): Enterprise upgrade with 2025 best practices, aggressive clearing patterns, MCP optimization, strategic chunking
v1.0.0 (2025-03-29): Initial release

Works Well With

moai-alfred-practices - Development best practices
moai-alfred-session-state - Session management
moai-cc-memory - Memory file patterns
moai-alfred-workflow - 4-step workflow optimization

Adoption

ajbcoding/moai-alfred-context-budget

$ install --global

Security Scan Results

SKILL.md

moai-alfred-context-budget

Overview

Quick Reference

When to Use This Skill

Key Principles (2025)

Pattern 1: Context Budget Allocation

Overview

Context Budget Breakdown

Monitoring Context Usage

Context Budget Anti-Patterns

Pattern 2: Aggressive Context Clearing

Overview

When to Clear Context

Clearing Workflow Pattern

What to Clear vs Keep

Pattern 3: Memory File Management

Overview

Memory File Structure (Best Practices)

Memory File Template

Memory File Anti-Patterns

Session Summary

Active Work (This Session)

Key Decisions (Last 7 Days)

Next Actions

Pattern 4: MCP Server Optimization

Overview

MCP Context Impact

Measuring MCP Overhead

MCP Usage Strategy

Pattern 5: Strategic Chunking

Overview

Task Chunking Strategy

Chunking Anti-Patterns

Pattern 6: Quality Over Quantity Context

Overview

Context Quality Checklist

Best Practices Checklist

Common Pitfalls to Avoid

Pitfall 1: Loading Entire Codebase

Pitfall 2: Never Clearing Context

Pitfall 3: Bloated Memory Files

Tool Versions (2025)

References

Changelog

Works Well With

Related Skills

ajbcoding/youtube-transcript

ajbcoding/ship-learn-next

ajbcoding/theme-factory

ajbcoding/template-skill

ajbcoding/moai-alfred-context-budget

$ install --global

Security Scan Results

SKILL.md

moai-alfred-context-budget

Overview

Quick Reference

When to Use This Skill

Key Principles (2025)

Pattern 1: Context Budget Allocation

Overview

Context Budget Breakdown

Monitoring Context Usage

Context Budget Anti-Patterns

Pattern 2: Aggressive Context Clearing

Overview

When to Clear Context

Clearing Workflow Pattern

What to Clear vs Keep

Pattern 3: Memory File Management

Overview

Memory File Structure (Best Practices)

Memory File Template

Memory File Anti-Patterns

Session Summary