skills/llm-router/SKILL.md
Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding which model to call, optimizing LLM costs, or building multi-model agent systems. Activate on "which model", "model selection", "route to model", "LLM cost", "model routing", "cheap vs expensive model". NOT for prompt engineering (use prompt-engineer), model fine-tuning, or training custom models.
npx skillsauth add curiositech/windags-skills llm-routerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Selects the optimal LLM model for each task. The single biggest cost lever in multi-agent systems — intelligent routing saves 45-85% while maintaining 95%+ of top-model quality.
✅ Use for:
❌ NOT for:
prompt-engineer)flowchart TD
A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini (~$0.001)"]
A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o (~$0.01)"]
A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]
T1 --> Q1{Quality sufficient?}
Q1 -->|Yes| Done1[Use cheap model]
Q1 -->|No| T2
T2 --> Q2{Quality sufficient?}
Q2 -->|Yes| Done2[Use balanced model]
Q2 -->|No| T3
| Task Type | Tier | Models | Cost/Call | Why This Tier | |-----------|------|--------|-----------|---------------| | Classify input type | 1 | Haiku, GPT-4o-mini | ~$0.001 | Deterministic categorization | | Validate schema/format | 1 | Haiku, GPT-4o-mini | ~$0.001 | Mechanical checking | | Format output / template | 1 | Haiku, GPT-4o-mini | ~$0.001 | Structured transformation | | Extract structured data | 1 | Haiku, GPT-4o-mini | ~$0.001 | Pattern matching | | Summarize text | 1-2 | Haiku → Sonnet | ~$0.001-0.01 | Short summaries: Haiku; nuanced: Sonnet | | Write content/docs | 2 | Sonnet, GPT-4o | ~$0.01 | Creative quality matters | | Implement code | 2 | Sonnet, GPT-4o | ~$0.01 | Correctness + style | | Review code/diffs | 2 | Sonnet, GPT-4o | ~$0.01 | Needs judgment, not just pattern matching | | Research synthesis | 2 | Sonnet, GPT-4o | ~$0.01 | Multi-source reasoning | | Decompose ambiguous problem | 3 | Opus, o1 | ~$0.10 | Requires deep understanding | | Design architecture | 3 | Opus, o1 | ~$0.10 | Complex system reasoning | | Judge output quality | 3 | Opus, o1 | ~$0.10 | Meta-reasoning about quality | | Plan multi-step strategy | 3 | Opus, o1 | ~$0.10 | Long-horizon planning |
Assign model by task type at DAG design time. No runtime logic. Gets 60-70% of possible savings.
nodes:
- id: classify
model: claude-haiku-4-5 # Tier 1: $0.001
- id: implement
model: claude-sonnet-4-5 # Tier 2: $0.01
- id: evaluate
model: claude-opus-4-5 # Tier 3: $0.10
Try the cheap model; if quality is below threshold, escalate. Adds ~1s latency but saves 50-80% on nodes where cheap succeeds.
1. Execute with Tier 1 model
2. Quick quality check (also Tier 1 — costs ~$0.001)
3. If quality ≥ threshold → done
4. If quality < threshold → re-execute with Tier 2
Best for nodes where you're genuinely unsure which tier is needed.
Record success/failure per task type per model. Over time, the router learns:
Gets 75-85% savings after ~100 executions of training data.
Once model tier is chosen, select the provider:
| Model Class | Provider Options | Selection Criteria | |------------|-----------------|-------------------| | Haiku-class | Anthropic, AWS Bedrock | Latency, regional availability | | Sonnet-class | Anthropic, AWS Bedrock, GCP Vertex | Cost, rate limits | | Opus-class | Anthropic | Only provider | | GPT-4o-class | OpenAI, Azure OpenAI | Rate limits, compliance | | Open-source | Ollama (local), Together.ai, Fireworks | Cost ($0), latency, GPU availability |
10-node DAG, "refactor a codebase":
| Strategy | Mix | Cost | Savings | |----------|-----|------|---------| | All Opus | 10× $0.10 | $1.00 | — | | All Sonnet | 10× $0.01 | $0.10 | 90% | | Static tiers | 4× Haiku + 4× Sonnet + 2× Opus | $0.24 | 76% | | Cascading | 6× Haiku + 3× Sonnet + 1× Opus | $0.14 | 86% | | Adaptive (trained) | Dynamic | ~$0.08 | 92% |
Wrong: Route everything to Opus/o1 "for quality." Reality: 60%+ of typical DAG nodes are classification, validation, or formatting — tasks where Haiku performs identically to Opus. You're burning money.
Wrong: Route everything to Haiku "for cost." Reality: Complex reasoning, architecture design, and quality judgment genuinely need stronger models. Haiku will produce plausible-looking but subtly wrong output on hard tasks.
Wrong: Only optimizing for cost, ignoring that Opus takes 5-10x longer than Haiku. Reality: In a 10-node DAG, model choice affects total execution time as much as cost. Route time-critical paths to faster models.
Wrong: Setting model tiers once and never adjusting. Reality: As models improve (Haiku gets smarter every generation), tasks that needed Sonnet last month may work on Haiku today. Record outcomes and adapt.
This skill produces:
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.