skills/cost-optimized-llm/SKILL.md
Implement cost-optimized LLM routing with NO OpenAI. Use tiered model selection (DeepSeek, Haiku, Sonnet) to achieve 70-90% cost savings. Triggers on "LLM costs", "model selection", "cost optimization", "which model", "DeepSeek", "Claude pricing", "reduce AI costs".
npx skillsauth add scientiacapital/scientia-superpowers cost-optimized-llmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Achieve 70-90% cost savings with intelligent model routing. NO OpenAI allowed.
NEVER use OpenAI models in this ecosystem.
Allowed providers:
| Model | Cost per 1M tokens | Use Case | |-------|-------------------|----------| | DeepSeek V3 | $0.14 input / $0.28 output | Simple queries, classification | | Claude Haiku | $0.25 input / $1.25 output | Moderate complexity | | Gemini Flash | FREE (limited) | MVP, prototyping | | Claude Sonnet | $3.00 input / $15.00 output | Complex reasoning | | Claude Opus | $15.00 input / $75.00 output | Expert tasks only |
Use for:
from openai import OpenAI # OpenRouter uses OpenAI SDK
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"]
)
response = client.chat.completions.create(
model="deepseek/deepseek-chat",
messages=[{"role": "user", "content": prompt}],
max_tokens=500
)
Use for:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
Use for:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)
from enum import Enum
from typing import Literal
class TaskComplexity(Enum):
SIMPLE = "simple"
MODERATE = "moderate"
COMPLEX = "complex"
def route_to_model(complexity: TaskComplexity) -> str:
"""Route to appropriate model based on complexity."""
routing = {
TaskComplexity.SIMPLE: "deepseek/deepseek-chat",
TaskComplexity.MODERATE: "claude-3-5-haiku-20241022",
TaskComplexity.COMPLEX: "claude-sonnet-4-20250514"
}
return routing[complexity]
def estimate_complexity(prompt: str) -> TaskComplexity:
"""Estimate task complexity from prompt characteristics."""
# Simple heuristics
word_count = len(prompt.split())
has_code = "```" in prompt or "def " in prompt or "function" in prompt
has_analysis = any(w in prompt.lower() for w in ["analyze", "compare", "evaluate"])
if word_count < 50 and not has_code and not has_analysis:
return TaskComplexity.SIMPLE
elif word_count < 200 or (has_code and not has_analysis):
return TaskComplexity.MODERATE
else:
return TaskComplexity.COMPLEX
def smart_complete(prompt: str, force_model: str = None) -> str:
"""Complete with automatic model routing."""
if force_model:
model = force_model
else:
complexity = estimate_complexity(prompt)
model = route_to_model(complexity)
# Route to appropriate client
if model.startswith("deepseek"):
return call_openrouter(model, prompt)
else:
return call_anthropic(model, prompt)
For MVPs and prototyping, use Gemini Flash (FREE):
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content(prompt)
Limits:
Track costs per project:
import json
from datetime import datetime
from pathlib import Path
COST_LOG = Path.home() / ".claude" / "llm_costs.jsonl"
def log_cost(project: str, model: str, input_tokens: int, output_tokens: int):
"""Log LLM usage for cost tracking."""
costs = {
"deepseek/deepseek-chat": (0.00014, 0.00028),
"claude-3-5-haiku-20241022": (0.00025, 0.00125),
"claude-sonnet-4-20250514": (0.003, 0.015),
"gemini-1.5-flash": (0, 0) # Free
}
input_cost, output_cost = costs.get(model, (0.01, 0.03))
total = (input_tokens / 1_000_000 * input_cost) + (output_tokens / 1_000_000 * output_cost)
entry = {
"timestamp": datetime.utcnow().isoformat(),
"project": project,
"model": model,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cost_usd": round(total, 6)
}
with open(COST_LOG, "a") as f:
f.write(json.dumps(entry) + "\n")
return total
For voice pipelines (vozlux, solarvoice-ai):
def get_voice_tier(subscription: str) -> dict:
tiers = {
"starter": {
"tts": "polly",
"stt": "deepgram-base",
"llm": "deepseek"
},
"pro": {
"tts": "cartesia",
"stt": "deepgram-nova",
"llm": "haiku"
},
"enterprise": {
"tts": "cartesia",
"stt": "deepgram-nova",
"llm": "sonnet"
}
}
return tiers.get(subscription, tiers["starter"])
For a typical Scientia project:
| Usage Level | DeepSeek Heavy | Mixed Tier | Sonnet Heavy | |-------------|----------------|------------|--------------| | Light (10K queries) | $1.40 | $8 | $90 | | Medium (100K queries) | $14 | $80 | $900 | | Heavy (1M queries) | $140 | $800 | $9,000 |
Recommendation: Use Mixed Tier routing for 90%+ of use cases.
Required in .env:
# Primary (Anthropic)
ANTHROPIC_API_KEY=sk-ant-...
# Cost optimization (OpenRouter for DeepSeek)
OPENROUTER_API_KEY=sk-or-...
# Free tier (Google)
GOOGLE_API_KEY=AIza...
# NEVER set these:
# OPENAI_API_KEY= # FORBIDDEN
lang-core enforces NO OpenAI at runtime:
def validate_environment():
"""Block OpenAI usage."""
if os.environ.get("OPENAI_API_KEY"):
raise EnvironmentError(
"OpenAI is not allowed in Scientia projects. "
"Use ANTHROPIC_API_KEY or OPENROUTER_API_KEY instead."
)
development
Master Supabase patterns for migrations, RLS policies, pgvector, and authentication. Use when creating database schemas, writing migrations, implementing row-level security, setting up auth, or debugging Supabase issues. Triggers on "supabase migration", "RLS policy", "row level security", "pgvector", "supabase auth", "magic link".
testing
GTM workflows for revenue acceleration across Scientia projects. Use for demo preparation, sales outreach, battle cards, pricing strategy, and revenue tracking. Triggers on "revenue focus", "prepare demo", "sales outreach", "battle card", "GTM strategy", "pricing", "tier-1 projects".
development
Deploy projects to Vercel, Railway, or Docker with platform-specific best practices. Use when deploying applications, configuring deployment settings, debugging deployment failures, or setting up CI/CD pipelines. Triggers on "deploy to vercel", "railway deployment", "docker build", "deployment failed", "configure vercel.json".
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".