/SKILL.md
Save 50-90% on Claude API costs with Batch API, Prompt Caching & Extended Thinking. Official techniques, verified.
npx skillsauth add louishin/claude-api-cost-optimization claude-api-cost-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Save 50-90% on Claude API costs with three officially verified techniques
| Technique | Savings | Use When | |-----------|---------|----------| | Batch API | 50% | Tasks can wait up to 24h | | Prompt Caching | 90% | Repeated system prompts (>1K tokens) | | Extended Thinking | ~80% | Complex reasoning tasks | | Batch + Cache | ~95% | Bulk tasks with shared context |
import anthropic
client = anthropic.Anthropic()
batch = client.messages.batches.create(
requests=[
{
"custom_id": "task-001",
"params": {
"model": "claude-sonnet-4-5",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Task 1"}]
}
}
]
)
# Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id):
print(f"{result.custom_id}: {result.result.message.content[0].text}")
| Batch Size | Time/Request | |------------|--------------| | Large (294) | 0.45 min | | Small (10) | 9.84 min |
22x efficiency difference! Always batch 100+ requests together.
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=[{
"type": "text",
"text": "Your long system prompt here...",
"cache_control": {"type": "ephemeral"} # Enable caching!
}],
messages=[{"role": "user", "content": "User question"}]
)
# First call: +25% (cache write)
# Subsequent: -90% (cache read!)
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{"role": "user", "content": "Design architecture for..."}]
)
Can wait 24h? → Yes → Batch API (50% off)
↓ No
Repeated prompts >1K? → Yes → Prompt Caching (90% off)
↓ No
Complex reasoning? → Yes → Extended Thinking
↓ No
Use normal API
Made with 🐾 by Washin Village - Verified against official Anthropic documentation
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.