.claude/skills/claude-cost-optimization/SKILL.md
Comprehensive cost tracking and optimization for production Claude deployments. Covers Admin API usage tracking, efficiency measurement, ROI calculation, optimization patterns (caching, batching, model selection, context editing, effort parameter), and cost prediction. Use when tracking costs, optimizing token usage, measuring efficiency, calculating ROI, reducing production expenses, or implementing cost-effective Claude integrations.
npx skillsauth add adaptationio/skrillz claude-cost-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Cost optimization is critical for production Claude deployments. A single inefficiently-designed agent can cost hundreds or thousands of dollars monthly, while optimized implementations cost 10-90% less for identical functionality. This skill provides a comprehensive workflow for measuring, analyzing, and optimizing token costs.
Why This Matters:
Key Savings Available:
Use claude-cost-optimization when you need to:
Establish your current cost baseline before optimization.
What to Measure:
- Total monthly tokens (input + output)
- Cost breakdown by model
- Top 10 most expensive operations
- Average tokens per request
- Peak usage times and patterns
How to Measure (using Admin API):
from anthropic import Anthropic
client = Anthropic()
# Get monthly usage
response = client.beta.admin.usage_metrics.list(
limit=30,
sort_by="date",
)
total_input_tokens = sum(m.input_tokens for m in response.data)
total_output_tokens = sum(m.output_tokens for m in response.data)
total_cost = (total_input_tokens * 0.000005) + (total_output_tokens * 0.000025)
print(f"Monthly cost: ${total_cost:.2f}")
Where to Start: See references/usage-tracking.md for detailed Admin API integration
Understand where your costs actually come from.
Identify Expensive Patterns:
Create Cost Breakdown (example):
Agent reasoning loops: 45% of costs
File analysis: 25% of costs
Web search: 15% of costs
Classification tasks: 10% of costs
Other: 5% of costs
Key Metrics to Calculate:
Apply targeted optimizations to your biggest cost drivers.
Effort Parameter (if using Opus 4.5):
Context Editing (for long conversations):
Tool Optimization (for large tool sets):
Prompt Caching (for repeated content):
Model Selection:
Monitor cost reductions and efficiency gains after optimizations.
Metrics to Track:
Measurement Period: Track for 1-2 weeks per optimization to see impact
Example Impact:
Optimization: Client-side compaction on long research tasks
Before: 450K tokens/request, $11.25 cost
After: 180K tokens/request, $4.50 cost
Savings: 60% cost reduction
Calculate business value of your optimizations.
ROI Calculation:
Monthly Savings = (Daily Cost × 30) - (Optimized Cost × 30)
Implementation Hours = Time to implement optimizations
Cost per Hour = $100-300 (your eng cost)
Payback Period = (Implementation Hours × Cost per Hour) / Monthly Savings
ROI Example:
Monthly savings: $500/month
Implementation: 8 hours
Cost per hour: $150
Implementation cost: $1,200
Payback period: 2.4 months
First year ROI: 400%
Get started with Admin API cost tracking in 5 minutes:
import anthropic
from datetime import datetime, timedelta
client = anthropic.Anthropic()
def get_monthly_costs():
"""Get current month's token costs"""
# Get usage for last 30 days
now = datetime.now()
thirty_days_ago = now - timedelta(days=30)
response = client.beta.admin.usage_metrics.list(
limit=30,
sort_by="date",
)
total_input = sum(m.input_tokens for m in response.data)
total_output = sum(m.output_tokens for m in response.data)
# Opus 4.5 pricing: $5/M input, $25/M output
input_cost = total_input * 0.000005
output_cost = total_output * 0.000025
total_cost = input_cost + output_cost
print(f"Last 30 days:")
print(f" Input tokens: {total_input:,}")
print(f" Output tokens: {total_output:,}")
print(f" Input cost: ${input_cost:.2f}")
print(f" Output cost: ${output_cost:.2f}")
print(f" Total cost: ${total_cost:.2f}")
return {
"input_tokens": total_input,
"output_tokens": total_output,
"input_cost": input_cost,
"output_cost": output_cost,
"total_cost": total_cost
}
# Run the function
costs = get_monthly_costs()
Calculate the business value of your Claude implementation:
def calculate_roi(
monthly_cost: float,
monthly_transactions: int,
cost_before_claude: float = None,
quality_improvement: float = 1.0
) -> dict:
"""Calculate ROI metrics for Claude implementation"""
cost_per_transaction = monthly_cost / monthly_transactions
metrics = {
"monthly_cost": monthly_cost,
"monthly_transactions": monthly_transactions,
"cost_per_transaction": cost_per_transaction,
}
# If you had costs before Claude (manual process, previous tool, etc)
if cost_before_claude:
savings = cost_before_claude - monthly_cost
roi_percentage = (savings / cost_before_claude) * 100
metrics["previous_cost"] = cost_before_claude
metrics["monthly_savings"] = savings
metrics["roi_percentage"] = roi_percentage
# Account for quality improvements
effective_cost = monthly_cost / quality_improvement
metrics["quality_adjusted_cost"] = effective_cost
return metrics
# Example: Research agent replacing manual research
result = calculate_roi(
monthly_cost=500, # Claude costs
monthly_transactions=1000, # Requests processed
cost_before_claude=3000, # Manual research was $3k/month
quality_improvement=1.5 # Claude results are 50% better
)
print(f"Cost per transaction: ${result['cost_per_transaction']:.4f}")
print(f"Monthly savings: ${result['monthly_savings']:.2f}")
print(f"ROI: {result['roi_percentage']:.0f}%")
Current Claude Model Pricing (as of November 2025):
| Model | Input | Output | Best For | |-------|-------|--------|----------| | Opus 4.5 | $5/M | $25/M | Complex reasoning, agents, coding | | Sonnet 4.5 | $3/M | $15/M | Balanced performance/cost | | Haiku 4.5 | $0.80/M | $4/M | Simple tasks, high volume |
Cost Impact of Optimization Techniques:
| Technique | Savings | Implementation Difficulty | |-----------|---------|--------------------------| | Effort parameter (medium) | 20-40% | Easy (add 2 lines) | | Effort parameter (low) | 50-70% | Easy (add 2 lines) | | Context editing | 60-90% | Medium (requires setup) | | Tool optimization | 37-85% | Medium (architecture change) | | Prompt caching | 90% | Hard (infrastructure) | | Model selection | 50-75% | Hard (architecture change) |
Example Cost Comparison (1M transactions/month):
Scenario: Classification task
Opus 4.5, high effort:
- Input: 50M tokens @ $5/M = $250
- Output: 10M tokens @ $25/M = $250
- Total: $500/month
Opus 4.5, low effort:
- Input: 50M tokens @ $5/M = $250
- Output: 5M tokens @ $25/M = $125
- Total: $375/month (25% savings)
Haiku 4.5, high effort:
- Input: 50M tokens @ $0.80/M = $40
- Output: 10M tokens @ $4/M = $40
- Total: $80/month (84% savings)
START: Have high costs?
↓
Q1: Do you know what's causing the costs?
NO → Go to Step 2: Analyze Cost Drivers
YES → Q2: Have you tried effort parameter (Opus 4.5)?
NO → Apply effort parameter (medium/low)
Expect 20-70% savings, 2-4 hours implementation
YES → Q3: Do you have long conversations (>50K tokens)?
NO → Q4: Do you have 10+ tools in your agents?
NO → Q5: Can you cache repeated content?
YES → Implement prompt caching
Expect 90% savings on cached
NO → Consider model selection
Expect 2-5x cost reduction
YES → Implement tool search + deferred loading
Expect 85% context savings
YES → Implement context editing
Expect 60-90% savings on long tasks
Before:
Optimizations (in order of impact):
Timeline: 20-30 hours implementation
Before:
Optimizations:
Timeline: 5-10 hours implementation
For deeper dives into specific optimization areas, see:
For complete optimization strategies, cost prediction models, and ROI measurement frameworks, see references/ directory.
development
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.