claude-code-framework/essential/skills/operations/cost-optimizer/SKILL.md
Monitor and reduce costs for Claude API, cloud infrastructure, and multi-agent operations. Use when costs are rising, before scaling agents, or for monthly cost reviews. Triggers on "costs too high", "API spending", "token usage", "optimize costs", "budget", "reduce spend".
npx skillsauth add tokenized2027/claude-initilization-v7 cost-optimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Track, analyze, and reduce costs for your AI agent infrastructure — Claude API, cloud services, and mini PC operations.
Running 15 autonomous agents burns through API credits fast. Without visibility and controls, you can blow through hundreds of dollars in a day on bad prompts or stuck agents.
Before optimizing, know what you're spending.
Claude API cost check:
# Check recent API usage (if using Anthropic API directly)
# Estimated cost per model:
# Claude Sonnet 4.5: $3/M input, $15/M output
# Claude Haiku 4.5: $0.80/M input, $4/M output
# Count tokens in recent agent logs
grep -rn "tokens_used" ~/claude-multi-agent/logs/ | \
awk -F'tokens_used:' '{sum+=$2} END {print "Total tokens:", sum}'
Docker resource check:
# See what's eating resources on the mini PC
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Optimize in this order (biggest savings first):
| Task Type | Model | Est. Cost/1K tasks | |-----------|-------|-------------------| | Routing/classification | Haiku | $0.05 | | Code generation | Sonnet | $1.50 | | Architecture decisions | Opus | $5.00 | | Simple formatting | Haiku | $0.02 |
Rule: Default to Haiku for everything. Escalate to Sonnet only for code gen. Use Opus only for complex architecture/planning.
System prompts for agents are repeated on every call. Cache them:
# In orchestrator.py — use prompt caching for agent system prompts
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=4096,
system=[
{
"type": "text",
"text": AGENT_SYSTEM_PROMPT, # Long system prompt
"cache_control": {"type": "ephemeral"}
}
],
messages=messages
)
# Cached input tokens cost 90% less
Each agent call includes context. Trim it:
# In orchestrator.py
MAX_TOKENS_BY_TASK = {
"routing": 200,
"code_review": 1000,
"code_generation": 4000,
"architecture": 2000,
"formatting": 500,
}
Stop runaway agents:
# Per-agent daily budget (in estimated USD)
AGENT_DAILY_BUDGET = {
"frontend-developer": 5.00,
"backend-developer": 5.00,
"system-architect": 3.00,
"system-tester": 2.00,
"devops-engineer": 2.00,
}
# Per-task iteration limit
MAX_ITERATIONS_PER_TASK = 10 # If agent needs >10 LLM calls, flag for human review
Run this on the 1st of every month:
## Monthly Cost Review — [Month Year]
### API Costs
- Total API spend: $___
- Breakdown by agent: [list]
- Highest-cost task: [describe]
- Tokens wasted on failed/retried tasks: ___
### Infrastructure Costs
- Mini PC electricity: ~$___ (est. 40W × 24h × 30d)
- Domain/DNS: $___
- Any cloud services: $___
### Optimization Actions
- [ ] Review top 5 most expensive tasks — can any use Haiku?
- [ ] Check for stuck/looping agents in logs
- [ ] Verify prompt caching is active
- [ ] Review context window sizes
- [ ] Update token limits if needed
### Budget vs Actual
- Budget: $___/month
- Actual: $___/month
- Variance: ___
| Action | Savings | Effort | |--------|---------|--------| | Switch routing to Haiku | 80-90% on routing calls | Low | | Enable prompt caching | 60-90% on system prompts | Low | | Set max_tokens per task type | 20-40% on over-generation | Low | | Add iteration circuit breakers | Prevents runaway costs | Medium | | Trim context to relevant files only | 30-50% on input tokens | Medium |
✅ Use cost-optimizer when:
❌ Don't use for:
development
Methodical debugging using reproducible steps, instrumentation, and root-cause analysis. Use when something is broken and you don't know why. Triggers on "bug", "broken", "not working", "error", "fails intermittently", "regression", "unexpected behavior".
development
Optimize prompts for Claude Code agents, API calls, and multi-agent orchestration. Use when writing system prompts, agent instructions, or refining LLM interactions. Triggers on "improve prompt", "write a prompt", "agent instructions", "system prompt", "prompt not working", "LLM output quality".
tools
Structured ideation and design review before any creative or constructive work. Use before building features, components, architecture, dashboards, or automation workflows. Triggers on "plan this", "design this", "brainstorm", "think through", "what should we build", "how should I approach".
testing
Generates test files for components and functions with setup, basic tests, and mocks. Use when user says "add tests", "create test", "test this component", or mentions testing.