skills/token-optimization/SKILL.md
Use when the user says 'token optimization', 'save tokens', 'context window', 'reduce tokens', 'RTK', 'Serena', 'token stack', or asks about extending context window capacity. Covers the 3-layer token optimization stack: Headroom (API compression), RTK (CLI output compression), and Serena (LSP-backed code navigation). Do NOT use for Headroom-only troubleshooting (Compress skill).
npx skillsauth add cwinvestments/memstack token-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Three complementary tools that reduce token consumption by 50-80% across different layers of the Claude Code pipeline.
When this skill activates, output:
Token Optimization Guide — Configuring the 3-layer token stack...
Then execute the protocol below.
| Context | Status | |---------|--------| | User asks about token savings, context optimization | ACTIVE — full guide | | User says "RTK", "Serena", "token stack" | ACTIVE — relevant section | | User wants to install or configure any layer | ACTIVE — install steps | | User asks about context window limits | ACTIVE — explain stack | | Headroom-only troubleshooting (proxy crash, health check) | DORMANT — use Compress skill | | User is actively coding (no optimization discussion) | DORMANT — do not activate |
Claude Code Context Window
==========================
Layer 3: Serena (MCP) Prevents token waste at the SOURCE
───────────────────── Instead of reading entire files,
use LSP to fetch only the symbols
and references you need.
Savings: variable (avoids 1000s of
tokens per file read)
│
▼
Layer 2: RTK (CLI proxy) Compresses tool OUTPUT
──────────────────────── git diff, npm install, build logs —
all compressed 60-90% before they
enter the context window.
│
▼
Layer 1: Headroom (API proxy) Compresses API TRAFFIC
────────────────────────── Compresses the full conversation
payload between CC and the Anthropic
API. ~34% reduction on wire traffic.
│
▼
Anthropic API
Key insight: Each layer operates at a different point in the pipeline, so they multiply rather than overlap. A git diff that produces 5,000 tokens might become 1,000 after RTK, and the full conversation round-trip is further compressed by Headroom.
Headroom is a local proxy that compresses conversation payloads between Claude Code and the Anthropic API using LLMLingua-2.
pip install headroom-ai[code]
The [code] extra includes tree-sitter AST compression for code-aware filtering.
Terminal 1 — Start the proxy:
headroom proxy --llmlingua-device cpu --port 8787
Terminal 2 — Start Claude Code with proxy:
Windows:
set ANTHROPIC_BASE_URL=http://127.0.0.1:8787
claude
macOS/Linux:
ANTHROPIC_BASE_URL=http://127.0.0.1:8787 claude
# Health check
curl http://127.0.0.1:8787/health
# Token savings stats
curl http://127.0.0.1:8787/stats
Expected output from /stats: compression ratio, tokens saved, requests processed.
| Content Type | Compression | |-------------|-------------| | Code files | 30-46% | | Conversation text | 25-35% | | Tool output | 30-40% | | Average | ~34% |
| Issue | Fix |
|-------|-----|
| Compression at 0% | Install with [code] extra: pip install headroom-ai[code] |
| Proxy not reachable | Check curl http://127.0.0.1:8787/health — restart if needed |
| API errors in CC | Headroom may have crashed — unset ANTHROPIC_BASE_URL to bypass |
| Slow first request | Model weights downloading (~500MB) — one-time cost |
RTK (Rust Token Killer) is a Rust binary that sits between shell commands and Claude Code, compressing verbose CLI output before it enters the context window.
Windows (pre-built binary):
# Download from GitHub releases
gh release download --repo rtk-ai/rtk --pattern "rtk-x86_64-pc-windows-msvc.zip" --dir /tmp
unzip /tmp/rtk-x86_64-pc-windows-msvc.zip -d /tmp/rtk-extract
cp /tmp/rtk-extract/rtk.exe ~/.local/bin/rtk.exe
macOS/Linux (pre-built binary):
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/main/install.sh | sh
From source (any platform with Rust):
cargo install --git https://github.com/rtk-ai/rtk
# Global (all projects) — recommended
rtk init -g
# Per-project only
rtk init
Platform behavior:
rtk)Both approaches produce identical token savings.
Prefix any command with rtk:
rtk git status # Compact status (62% savings)
rtk git diff # Ultra-condensed diff
rtk git log # Compact log
rtk npm install # Filtered install output (70-90%)
rtk npm run build # Compressed build output
rtk ls -la # Token-optimized directory listing
rtk docker ps # Compact container list
rtk kubectl get pods # Compressed k8s output
RTK is a transparent proxy — if it has a dedicated filter for the command, it compresses. If not, it passes through unchanged. rtk <anything> is always safe.
rtk --version # Should show version number
rtk gain # Show cumulative token savings
| Command Category | Compression | |-----------------|-------------| | Git (status, log, diff) | 59-80% | | GitHub CLI (pr, run, issue) | 26-87% | | Package managers (npm, pnpm) | 70-90% | | File operations (ls, read) | 60-75% | | Infrastructure (docker, k8s) | 85% | | Network (curl, wget) | 65-70% | | Average | 60-90% |
Serena is an MCP server that provides IDE-like code navigation tools backed by Language Server Protocol. Instead of reading entire files to find a function definition, Serena uses LSP to return only the symbol you need.
pip install uv# Add to Claude Code as a global MCP server
claude mcp add --scope user serena -- \
uvx --from git+https://github.com/oraios/serena \
serena start-mcp-server \
--context=claude-code \
--project-from-cwd
Flags explained:
--context=claude-code disables tools that duplicate CC's built-in capabilities--project-from-cwd auto-detects the project from CC's working directoryclaude mcp list 2>&1 | grep serena
# Should show: serena: ... ✓ Connected
Symbol navigation (the core value):
| Tool | Purpose |
|------|---------|
| find_symbol | Global symbol search via LSP (functions, classes, variables) |
| find_referencing_symbols | Find all references to a symbol across the codebase |
| get_symbols_overview | List top-level symbols in a file (like an IDE outline) |
| rename_symbol | Refactor-safe rename across the entire codebase |
| replace_symbol_body | Replace a function/class definition by name |
| insert_before_symbol | Insert code before a symbol definition |
| insert_after_symbol | Insert code after a symbol definition |
File and search:
| Tool | Purpose |
|------|---------|
| find_file | Find files by name/pattern |
| read_file | Read file contents |
| search_for_pattern | Regex search across project |
| list_dir | Directory listing |
Memory (cross-session project knowledge):
| Tool | Purpose |
|------|---------|
| write_memory | Store project facts for future sessions |
| read_memory | Retrieve stored project knowledge |
| list_memories | List all stored memory files |
Workflow:
| Tool | Purpose |
|------|---------|
| onboarding | Auto-discover project structure |
| activate_project | Switch active project |
| get_current_config | Show current Serena configuration |
Traditional approach (brute force):
Read entire 500-line file → find the one function → 3,000 tokens consumed
Serena approach (surgical):
find_symbol("handleAuth") → returns only that function → 200 tokens consumed
For large codebases, this difference compounds across every file interaction.
Serena supports 40+ languages via LSP, including: TypeScript, JavaScript, Python, Rust, Go, Java, C#, C/C++, Ruby, PHP, Swift, Kotlin, and more.
For a fresh machine, install all three layers in order:
# 1. Headroom (API compression)
pip install headroom-ai[code]
# 2. RTK (CLI compression)
gh release download --repo rtk-ai/rtk --pattern "rtk-x86_64-pc-windows-msvc.zip" --dir /tmp
unzip /tmp/rtk-x86_64-pc-windows-msvc.zip -d /tmp/rtk-extract
cp /tmp/rtk-extract/rtk.exe ~/.local/bin/rtk.exe
rtk init -g
# 3. Serena (LSP navigation)
pip install uv
claude mcp add --scope user serena -- \
uvx --from git+https://github.com/oraios/serena \
serena start-mcp-server \
--context=claude-code \
--project-from-cwd
Start a session with all layers active:
# Terminal 1
headroom proxy --llmlingua-device cpu --port 8787
# Terminal 2
set ANTHROPIC_BASE_URL=http://127.0.0.1:8787 # Windows
claude
RTK and Serena activate automatically (CLAUDE.md injection and MCP server).
# Headroom
curl http://127.0.0.1:8787/health
# RTK
rtk --version
rtk gain
# Serena
claude mcp list 2>&1 | grep serena
| Component | Windows Behavior |
|-----------|-----------------|
| Headroom | Use set ANTHROPIC_BASE_URL=... (not export) |
| RTK | Uses CLAUDE.md injection instead of CC hooks. Download .zip from releases, not the install script. Binary goes in ~/.local/bin/rtk.exe |
| Serena | Works identically — uv/uvx handle Windows natively |
| PATH | Ensure ~/.local/bin is in your PATH for RTK |
| Skill | Scope | When to Use | |-------|-------|-------------| | Token Optimization (this) | Full 3-layer stack setup and reference | Installing, configuring, or understanding the optimization stack | | Compress | Headroom-only troubleshooting | Proxy crashes, health checks, stats monitoring | | Context DB | SQLite fact store | Reducing token waste from repeatedly reading project context |
tools
Use when the user says 'save diary', 'log session', 'wrapping up', or at end of a productive session.
tools
Use when the user says 'submit to marketplace', 'publish my skill', 'share this skill', 'list on marketplace', 'submit plugin', 'publish to community', or needs to submit a skill or plugin to a community marketplace via PR. Do NOT use for building skills or writing plugin code.
development
Use when the user says 'write browser tests', 'test this page', 'playwright test', 'e2e test', 'end to end test', 'browser test', 'test the UI', or needs Playwright-based browser testing for a web application. Do NOT use for unit tests, API tests, or non-browser testing.
development
Use when the user says 'teach me', 'explain as you go', 'mentor mode', 'walk me through', 'help me learn', 'explain why', 'learning mode', or wants real-time plain language narration of decisions and tradeoffs while building. Do NOT use for code review or debugging.