Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

louishin/claude-api-cost-optimization

Name: claude-api-cost-optimization
Author: louishin

/SKILL.md

npx skillsauth add louishin/claude-api-cost-optimization claude-api-cost-optimization

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Claude API Cost Optimization

Save 50-90% on Claude API costs with three officially verified techniques

Quick Reference

| Technique | Savings | Use When | |-----------|---------|----------| | Batch API | 50% | Tasks can wait up to 24h | | Prompt Caching | 90% | Repeated system prompts (>1K tokens) | | Extended Thinking | ~80% | Complex reasoning tasks | | Batch + Cache | ~95% | Bulk tasks with shared context |

1. Batch API (50% Off)

When to Use

Bulk translations
Daily content generation
Overnight report processing
NOT for real-time chat

Code Example

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "task-001",
            "params": {
                "model": "claude-sonnet-4-5",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Task 1"}]
            }
        }
    ]
)

# Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id):
    print(f"{result.custom_id}: {result.result.message.content[0].text}")

Key Finding: Bigger Batches = Faster!

| Batch Size | Time/Request | |------------|--------------| | Large (294) | 0.45 min | | Small (10) | 9.84 min |

22x efficiency difference! Always batch 100+ requests together.

2. Prompt Caching (90% Off)

When to Use

Long system prompts (>1K tokens)
Repeated instructions
RAG with large context

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Your long system prompt here...",
        "cache_control": {"type": "ephemeral"}  # Enable caching!
    }],
    messages=[{"role": "user", "content": "User question"}]
)
# First call: +25% (cache write)
# Subsequent: -90% (cache read!)

Cache Rules

Minimum: 1,024 tokens (Sonnet)
TTL: 5 minutes (refreshes on use)

3. Extended Thinking (~80% Off)

When to Use

Complex code architecture
Strategic planning
Mathematical reasoning

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{"role": "user", "content": "Design architecture for..."}]
)

Decision Flowchart

Can wait 24h? → Yes → Batch API (50% off)
                 ↓ No
Repeated prompts >1K? → Yes → Prompt Caching (90% off)
                         ↓ No
Complex reasoning? → Yes → Extended Thinking
                      ↓ No
Use normal API

Official Docs

Batch Processing
Prompt Caching
Extended Thinking

Made with 🐾 by Washin Village - Verified against official Anthropic documentation

louishin/claude-api-cost-optimization

/SKILL.md

Save 50-90% on Claude API costs with Batch API, Prompt Caching & Extended Thinking. Official techniques, verified.

1 stars

development

Updated Apr 6, 2026

$ install --global

skillsauth

npx skillsauth add louishin/claude-api-cost-optimization claude-api-cost-optimization

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 6, 2026, 9:10 PM29.0s16 files scanned

SKILL.md

name:: claude-api-cost-optimization
description:: Save 50-90% on Claude API costs with Batch API, Prompt Caching & Extended Thinking. Official techniques, verified.

Claude API Cost Optimization

Save 50-90% on Claude API costs with three officially verified techniques

Quick Reference

1. Batch API (50% Off)

When to Use

Bulk translations
Daily content generation
Overnight report processing
NOT for real-time chat

Code Example

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "task-001",
            "params": {
                "model": "claude-sonnet-4-5",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Task 1"}]
            }
        }
    ]
)

# Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id):
    print(f"{result.custom_id}: {result.result.message.content[0].text}")

Key Finding: Bigger Batches = Faster!

| Batch Size | Time/Request | |------------|--------------| | Large (294) | 0.45 min | | Small (10) | 9.84 min |

22x efficiency difference! Always batch 100+ requests together.

2. Prompt Caching (90% Off)

When to Use

Long system prompts (>1K tokens)
Repeated instructions
RAG with large context

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Your long system prompt here...",
        "cache_control": {"type": "ephemeral"}  # Enable caching!
    }],
    messages=[{"role": "user", "content": "User question"}]
)
# First call: +25% (cache write)
# Subsequent: -90% (cache read!)

Cache Rules

Minimum: 1,024 tokens (Sonnet)
TTL: 5 minutes (refreshes on use)

3. Extended Thinking (~80% Off)

When to Use

Complex code architecture
Strategic planning
Mathematical reasoning

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{"role": "user", "content": "Design architecture for..."}]
)

Decision Flowchart

Can wait 24h? → Yes → Batch API (50% off)
                 ↓ No
Repeated prompts >1K? → Yes → Prompt Caching (90% off)
                         ↓ No
Complex reasoning? → Yes → Extended Thinking
                      ↓ No
Use normal API

Official Docs

Batch Processing
Prompt Caching
Extended Thinking

Made with 🐾 by Washin Village - Verified against official Anthropic documentation

Related Skills

openclaw/openclaw-secret-scanning-maintainer

development

VerifiedTrustedCommunity

Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.

357,764SKILL.mdUpdated Apr 15, 2026

openclaw/openclaw-secret-scanning-maintainer

openclaw/openclaw-release-maintainer

development

VerifiedTrustedCommunity

Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-release-maintainer

openclaw/openclaw-qa-testing

development

VerifiedTrustedCommunity

Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-qa-testing

openclaw/openclaw-parallels-smoke

development

VerifiedTrustedCommunity

End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-parallels-smoke

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/louishin/claude-api-cost-optimization.git

# Copy into Claude Code skills folder (global)
cp -r claude-api-cost-optimization/ ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

louishin/claude-api-cost-optimization

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT