/SKILL.md
Practical guide to reduce token consumption, lower AI costs, and improve Claude Code performance through file organization, context management, and strategic model selection. Backed by real experiment data. Use when user mentions "optimize tokens", "reduce costs", "Claude is slow", "too many tokens", "token budget", "context window full", "organize codebase for AI", or "reduce token consumption". Do NOT use for general coding questions, debugging, or performance optimization unrelated to AI token usage.
npx skillsauth add alexismunoz1/token-optimizer token-optimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A comprehensive toolkit to reduce token consumption, lower AI costs, and improve Claude Code performance. Every recommendation is backed by real experiment data from a controlled comparison of monolithic vs modular code architectures.
npx skills add alexismunoz1/token-optimizer
Or manually:
cp -r token-optimizer ~/.claude/skills/
The single highest-impact optimization. Small, focused files reduce token consumption by 18.2% and noise by 92% on focused tasks (the majority of daily development work).
Core rules:
Real example: Fixing an email validation bug required reading 814 lines in a monolithic file (49,466 tokens) vs only 67 lines in a modular setup (40,447 tokens) — 18.2% savings, 92% less noise.
For naming conventions, avoid/prefer tables, and project structure templates, see
references/file-organization-guide.md
A well-structured CLAUDE.md can reduce token consumption by 50-70%. Most projects have bloated CLAUDE.md files that load unnecessary context on every interaction.
Key principles:
For a ready-to-use optimized template, see
references/claude-md-template.md
Token waste often comes from accumulated irrelevant context, not from individual operations.
Essential commands:
| Command | When to Use | Effect |
|---------|-------------|--------|
| /clear | Switching tasks, after major corrections | Resets context completely |
| /compact | Long conversation (>50 exchanges) | Compresses history, keeps essentials |
| /context | Diagnosing high token use | Shows what's consuming tokens |
Lazy loading: Don't front-load all information. One project achieved 54% reduction in initial tokens (7,584 → 3,434) by keeping only triggers in CLAUDE.md and loading details on demand.
For advanced strategies, subagent patterns, and MCP management, see
references/context-management-guide.md
Choosing the right model per task type is one of the easiest cost savings to implement.
| Task Type | Model | Why | |-----------|-------|-----| | 80% of daily tasks | Sonnet | Best cost/performance ratio | | Complex architecture | Opus | Deeper reasoning needed | | Simple/quick tasks | Haiku | Up to 18x cheaper than Opus |
Default to Sonnet. Escalate to Opus only for genuinely complex problems. Use Haiku for simple tasks, tests, and searches.
MCP Management:
Subagents for verbose tasks: Use the Task tool for operations that generate large output (test runs, builds, searches). The verbose output stays in the subagent's context — only the summary returns to your main conversation.
Apply these in order of impact:
/context first → establishes your baseline before any changes/clear between tasks → eliminates irrelevant context/compact in long conversations → compresses historyResults from our controlled experiment with an 814-line TypeScript e-commerce app:
| Optimization | Impact | |-------------|--------| | Modular files (focused tasks) | -18.2% tokens | | Noise reduction (lines processed) | -92% | | Optimized CLAUDE.md | -50-70% consumption | | Lazy loading context | -54% initial tokens | | Haiku vs Opus (simple tasks) | -94% cost |
Key insight: Focused tasks (bug fixes, specific changes — ~80% of daily work) benefit enormously from modular code. Cross-cutting tasks show minimal difference at small scale (+1-5%) but modular wins decisively at 5,000+ lines.
Note on scale: These results are from a controlled experiment with an 814-line codebase. At larger scales (5,000+ lines), the savings from modular architecture are even more significant because monolithic files start hitting context window limits while modular files maintain constant size (35-146 lines each).
For the complete experiment methodology and raw data, see
references/metrics-report.md
When activated, follow this process:
/context. Without a baseline number, you can't prove any optimization worked. This step is not optional./context to measure improvementImportant guidelines:
/context/context to see current token consumption breakdownutils.ts, helpers.ts, index.ts with logic)/context to identify what's consuming tokens/clear between tasks and /compact for long sessions| Problem | Cause | Solution |
|---------|-------|----------|
| No improvement after optimizations | No baseline measurement taken | Run /context before AND after each change |
| Don't know how many tokens I'm using | Token consumption not visible by default | Use /context to see the full breakdown |
| /compact doesn't reduce enough | Compresses but keeps essentials | Use /clear if prior context is irrelevant |
| Cross-cutting tasks slower after splitting | Multiple reads needed (1-5% more tokens) | Expected and marginal — focused tasks (80% of work) still save 18%+ |
references/file-organization-guide.md — Naming conventions, project structure templates, and implementation checklistreferences/context-management-guide.md — Lazy loading, subagents, MCP management, and model selection strategiesreferences/metrics-report.md — Complete experiment data and methodology with raw numbersreferences/claude-md-template.md — Ready-to-use optimized CLAUDE.md templatedevelopment
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.