.claude/skills/auto-claude-optimization/SKILL.md
Auto-Claude performance optimization and cost management. Use when optimizing token usage, reducing API costs, improving build speed, or tuning agent performance.
npx skillsauth add adaptationio/skrillz auto-claude-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Performance tuning, cost reduction, and efficiency improvements.
| Metric | Impact | Optimization | |--------|--------|--------------| | API latency | Build speed | Model selection, caching | | Token usage | Cost | Prompt efficiency, context limits | | Memory queries | Speed | Embedding model, index tuning | | Build iterations | Time | Spec quality, QA settings |
| Model | Speed | Cost | Quality | Use Case | |-------|-------|------|---------|----------| | claude-opus-4-5-20251101 | Slow | High | Best | Complex features | | claude-sonnet-4-5-20250929 | Fast | Medium | Good | Standard features |
# Override model in .env
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
Configure thinking budget per agent:
| Agent | Default | Recommended | |-------|---------|-------------| | Spec creation | 16000 | Keep default for quality | | Planning | 5000 | Reduce to 3000 for speed | | Coding | 0 | Keep disabled | | QA Review | 10000 | Reduce to 5000 for speed |
# In agent configuration
max_thinking_tokens=5000 # or None to disable
Smaller spec files
# Keep specs concise
# Bad: 5000 word spec
# Good: 500 word spec with clear criteria
Limit codebase scanning
# In context/builder.py
MAX_CONTEXT_FILES = 50 # Reduce from 100
Use targeted searches
# Instead of full codebase scan
# Focus on relevant directories
Optimize system prompts in apps/backend/prompts/:
<!-- Bad: Verbose -->
You are an expert software developer who specializes in building
high-quality, production-ready applications. You have extensive
experience with many programming languages and frameworks...
<!-- Good: Concise -->
Expert full-stack developer. Build production-quality code.
Follow existing patterns. Test thoroughly.
# Use efficient embedding model
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# Or offline with smaller model
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384
# Enable more parallel agents (default: 4)
MAX_PARALLEL_AGENTS=8
# Limit QA loop iterations
MAX_QA_ITERATIONS=10 # Default: 50
# Skip QA for quick iterations
python run.py --spec 001 --skip-qa
# Force simple complexity for quick tasks
python spec_runner.py --task "Fix typo" --complexity simple
# Skip research phase
SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."
# Reduce timeout for faster failure detection
API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes)
# Enable cost tracking
ENABLE_COST_TRACKING=true
# View usage report
python usage_report.py --spec 001
Use cheaper models for simple tasks
# For simple specs
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..."
Limit context window
MAX_CONTEXT_TOKENS=50000 # Reduce from 100000
Batch similar tasks
# Create specs together, run together
python spec_runner.py --task "Add feature A"
python spec_runner.py --task "Add feature B"
python run.py --spec 001
python run.py --spec 002
Use local models for memory
# Ollama for memory (free)
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama
| Operation | Estimated Tokens | Cost (Opus) | Cost (Sonnet) | |-----------|-----------------|-------------|---------------| | Simple spec | 10k | ~$0.30 | ~$0.06 | | Standard spec | 50k | ~$1.50 | ~$0.30 | | Complex spec | 200k | ~$6.00 | ~$1.20 | | Build (simple) | 50k | ~$1.50 | ~$0.30 | | Build (standard) | 200k | ~$6.00 | ~$1.20 | | Build (complex) | 500k | ~$15.00 | ~$3.00 |
# Faster embeddings
OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 1536 dim, fast
# Higher quality (slower)
OPENAI_EMBEDDING_MODEL=text-embedding-3-large # 3072 dim
# Offline (fastest, free)
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384
# Limit search results
memory.search("query", limit=10) # Instead of 100
# Use semantic caching
ENABLE_MEMORY_CACHE=true
# Compact database periodically
python -c "from integrations.graphiti.memory import compact_database; compact_database()"
# Clear old episodes
python query_memory.py --cleanup --older-than 30d
High-quality specs reduce iterations:
# Good spec (fewer iterations)
## Acceptance Criteria
- [ ] User can log in with email/password
- [ ] Invalid credentials show error message
- [ ] Successful login redirects to /dashboard
- [ ] Session persists for 24 hours
# Bad spec (more iterations)
## Acceptance Criteria
- [ ] Login works
Optimal subtask size:
Let agents spawn subagents for parallel execution:
Main Coder
├── Subagent 1: Frontend (parallel)
├── Subagent 2: Backend (parallel)
└── Subagent 3: Tests (parallel)
# Performance-focused configuration
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
API_TIMEOUT_MS=180000
MAX_PARALLEL_AGENTS=6
# Memory optimization
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama
OLLAMA_LLM_MODEL=llama3.2:3b
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384
# Reduce verbosity
DEBUG=false
ENABLE_FANCY_UI=false
# Limit Python memory
export PYTHONMALLOC=malloc
# Set max file descriptors
ulimit -n 4096
# Time a build
time python run.py --spec 001
# Compare models
time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001
time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001
# Monitor memory
watch -n 1 'ps aux | grep python | head -5'
# Profile script
python -m cProfile -o profile.stats run.py --spec 001
python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"
Switch to Sonnet for most tasks
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
Use Ollama for memory
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama
Skip QA for prototypes
python run.py --spec 001 --skip-qa
Force simple complexity for small tasks
python spec_runner.py --task "..." --complexity simple
apps/backend/prompts/development
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.