src/skills/ai-infrastructure-litellm/SKILL.md
LiteLLM proxy server setup, TypeScript client patterns via OpenAI SDK, model routing, fallbacks, load balancing, spend tracking, virtual keys, and production deployment
npx skillsauth add agents-inc/skills ai-infrastructure-litellmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick Guide: LiteLLM is an OpenAI-compatible proxy (AI gateway) that routes requests to 100+ LLM providers. TypeScript clients connect via the standard OpenAI SDK with
baseURLpointed at the proxy. Configure models, fallbacks, load balancing, and budgets inconfig.yaml. Useprovider/model-nameformat inlitellm_params.model(e.g.,anthropic/claude-sonnet-4-20250514). Themodel_namein config is the user-facing alias clients request. Virtual keys require PostgreSQL. Master key must start withsk-.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use the provider/model-name format in litellm_params.model -- e.g., anthropic/claude-sonnet-4-20250514, openai/gpt-4o, azure/my-deployment -- the provider prefix is how LiteLLM routes to the correct API)
(You MUST set model_name as the user-facing alias that clients request -- this is NOT the provider model ID, it is the name your TypeScript client passes as model)
(You MUST point the OpenAI SDK baseURL at the proxy URL (e.g., http://localhost:4000) and pass the proxy key as apiKey -- do NOT use provider API keys directly in client code)
(You MUST start master keys with sk- -- LiteLLM rejects master keys that do not follow this prefix convention)
(You MUST configure database_url pointing to PostgreSQL before using virtual keys, spend tracking, or team/user management -- these features require persistent storage)
</critical_requirements>
Auto-detection: LiteLLM, litellm, litellm_params, litellm_settings, LLM proxy, LLM gateway, model_list, master_key, virtual keys, model fallback, load balancing LLM, provider/model, anthropic/claude, openai/gpt, azure/, litellm --config, LITELLM_MASTER_KEY, LITELLM_SALT_KEY
When to use:
Key patterns covered:
When NOT to use:
LiteLLM Proxy is an AI gateway -- a single OpenAI-compatible endpoint that routes to 100+ LLM providers. TypeScript applications never talk to providers directly; they talk to the proxy using the standard OpenAI SDK.
Core principles:
baseURL and standard OpenAI SDK. Switching providers means changing config.yaml, not application code.model_name is what clients request (e.g., "claude-sonnet"). litellm_params.model is the actual provider routing (e.g., "anthropic/claude-sonnet-4-20250514"). This decouples client code from provider specifics.config.yaml. No application-level retry logic needed.The proxy needs a config.yaml with at least one model defined. model_name is client-facing; litellm_params.model is the provider route.
# config.yaml
model_list:
- model_name: claude-sonnet # What clients request
litellm_params:
model: anthropic/claude-sonnet-4-20250514 # Provider/model route
api_key: os.environ/ANTHROPIC_API_KEY # Never hardcode keys
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
Why good: Two-layer naming decouples clients from providers, os.environ/ syntax reads secrets from environment at runtime
# BAD: Missing provider prefix, hardcoded key
model_list:
- model_name: claude-sonnet-4-20250514 # Using provider model ID as name
litellm_params:
model: claude-sonnet-4-20250514 # No provider prefix -- routing fails
api_key: sk-ant-abc123 # Hardcoded API key
Why bad: Without anthropic/ prefix, LiteLLM cannot route to the correct provider; hardcoded keys are a security risk; using the provider model ID as model_name couples clients to provider naming
See: examples/core.md for complete config with general_settings, Docker setup
Connect to the proxy using the standard OpenAI SDK. Point baseURL at the proxy, use the proxy key as apiKey.
// lib/llm-client.ts
import OpenAI from "openai";
const PROXY_URL = "http://localhost:4000";
const client = new OpenAI({
baseURL: PROXY_URL,
apiKey: process.env.LITELLM_API_KEY, // Virtual key or master key
});
export { client };
// usage.ts
import { client } from "./lib/llm-client.js";
const completion = await client.chat.completions.create({
model: "claude-sonnet", // model_name from config.yaml, NOT provider model ID
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain TypeScript generics." },
],
});
console.log(completion.choices[0].message.content);
Why good: Standard OpenAI SDK, no custom dependencies; model name matches config.yaml model_name; proxy key keeps provider keys server-side
// BAD: Using provider model ID, provider API key
const client = new OpenAI({
baseURL: "http://localhost:4000",
apiKey: process.env.ANTHROPIC_API_KEY, // Wrong -- use proxy key
});
const completion = await client.chat.completions.create({
model: "anthropic/claude-sonnet-4-20250514", // Wrong -- use model_name alias
messages: [{ role: "user", content: "Hello" }],
});
Why bad: Provider API key bypasses proxy auth and virtual key controls; using provider model ID instead of alias couples client to provider naming and bypasses proxy routing logic
See: examples/core.md for streaming, metadata tagging
Configure model fallbacks so requests automatically retry on a different model when the primary fails.
# config.yaml
model_list:
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
num_retries: 2 # Retries per model before fallback
fallbacks: [{ "claude-sonnet": ["gpt-4o"] }] # General fallback chain
context_window_fallbacks: [{ "gpt-4o": ["claude-sonnet"] }] # Context overflow fallback
default_fallbacks: ["gpt-4o"] # Catch-all for any model failure
Why good: Fallbacks use model_name aliases (not provider IDs), ordered chains tried sequentially, separate chains for context overflow vs general errors
See: examples/routing.md for content policy fallbacks, combining with load balancing
Multiple entries with the same model_name create a load-balanced group. The proxy distributes requests using the configured strategy.
model_list:
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o-eastus
api_base: https://eastus.openai.azure.com/
api_key: os.environ/AZURE_EASTUS_KEY
rpm: 100 # Requests per minute for this deployment
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o-westus
api_base: https://westus.openai.azure.com/
api_key: os.environ/AZURE_WESTUS_KEY
rpm: 100
router_settings:
routing_strategy: usage-based-routing # Route to deployment with lowest RPM/TPM usage
num_retries: 2
timeout: 30
Why good: Same model_name across entries creates automatic load balancing, rpm/tpm limits per deployment enable usage-aware routing
See: examples/routing.md for all five routing strategies, priority routing with order
Virtual keys let you distribute access with per-key budgets, rate limits, and model restrictions. Requires PostgreSQL.
# config.yaml
general_settings:
master_key: sk-litellm-master-key-change-me # Must start with sk-
database_url: os.environ/DATABASE_URL # PostgreSQL required
# Generate a virtual key via API
curl 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer sk-litellm-master-key-change-me' \
-H 'Content-Type: application/json' \
-d '{
"models": ["claude-sonnet", "gpt-4o"],
"max_budget": 50.0,
"duration": "30d",
"metadata": {"team": "backend", "project": "search"}
}'
# Returns: { "key": "sk-generated-key-abc123", ... }
Why good: Per-key model restrictions, budget caps, and expiry; metadata enables tag-based spend tracking; master key authentication protects key generation
See: examples/keys-and-spend.md for team management, spend queries, rate limit tiers
Attach metadata tags to requests for granular cost attribution. The proxy tracks spend automatically per key, user, team, and tag.
// Tag requests for cost attribution
const completion = await client.chat.completions.create({
model: "claude-sonnet",
messages: [{ role: "user", content: "Summarize this document." }],
// LiteLLM-specific: pass metadata for spend tracking
metadata: {
tags: ["project:search", "team:backend"],
trace_user_id: "user-123",
},
} as any); // metadata is a LiteLLM extension, not in OpenAI types
Why good: Tags enable cost attribution by project, team, or feature without changing model routing; cost appears in x-litellm-response-cost response header
When to use: When you need cost visibility across teams, projects, or features
See: examples/keys-and-spend.md for querying spend by tag, user, and team
</patterns><decision_framework>
Do you call multiple LLM providers?
+-- YES -> LiteLLM Proxy adds value (unified API, routing, fallbacks)
+-- NO -> Do you need budgets, rate limits, or virtual keys?
+-- YES -> LiteLLM Proxy (governance layer)
+-- NO -> Do you need fallbacks or load balancing?
+-- YES -> LiteLLM Proxy (reliability layer)
+-- NO -> Use the provider SDK directly (simpler)
What is your priority?
+-- Even distribution -> simple-shuffle (default)
+-- Minimize latency -> latency-based-routing
+-- Respect rate limits -> usage-based-routing
+-- Minimize cost -> cost-based-routing
+-- Handle concurrent load -> least-busy
Do you have multiple teams or users?
+-- YES -> Virtual keys (per-team budgets, model restrictions)
| Requires: PostgreSQL database
+-- NO -> Do you need spend tracking?
+-- YES -> Virtual keys (even for single user, enables spend logs)
| Requires: PostgreSQL database
+-- NO -> Master key only (simplest setup, no database needed)
</decision_framework>
<red_flags>
High Priority Issues:
litellm_params.model (e.g., claude-sonnet-4-20250514 instead of anthropic/claude-sonnet-4-20250514) -- proxy cannot route without the prefixos.environ/VAR_NAME -- security breach riskmodel_name -- couples all clients to provider naming, breaks when you switch providerssk- -- LiteLLM silently rejects itdatabase_url -- key generation failsMedium Priority Issues:
num_retries in litellm_settings -- defaults to 0, no retries on transient failuresmodel_name (client-facing alias) with litellm_params.model (provider route) -- most common config mistakerpm/tpm on deployments when using usage-based-routing -- routing strategy has no data to work withLITELLM_SALT_KEY in production -- virtual key credentials stored without encryptionCommon Mistakes:
anthropic/claude-sonnet-4-20250514 as the model parameter in TypeScript client code -- use the model_name alias insteadmetadata field to be typed in OpenAI SDK -- it is a LiteLLM extension, requires as any or extra_bodymodel_name aliases -- fallbacks reference model names, not provider routesconfig.yaml changes require proxy restart (or use the /config/update API endpoint)Gotchas & Edge Cases:
os.environ/ syntax in config.yaml (no $ prefix) is LiteLLM-specific -- not standard YAML environment variable substitutionmodel_name matching is exact -- "claude-sonnet" and "Claude-Sonnet" are different modelsdefault_fallbacks, they do NOT apply to ContentPolicyViolationError or ContextWindowExceededError -- use specialized fallback types for thoserpm/tpm limits in config are per-deployment, not per-model-group -- a model group with 3 deployments at rpm: 100 each gets 300 RPM totalspend field on a key may lag a few seconds behind actual usage/v1/ prefix on endpoints is optional -- both http://localhost:4000/chat/completions and http://localhost:4000/v1/chat/completions workhttp://localhost:4000/ui when the proxy is running</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use the provider/model-name format in litellm_params.model -- e.g., anthropic/claude-sonnet-4-20250514, openai/gpt-4o, azure/my-deployment -- the provider prefix is how LiteLLM routes to the correct API)
(You MUST set model_name as the user-facing alias that clients request -- this is NOT the provider model ID, it is the name your TypeScript client passes as model)
(You MUST point the OpenAI SDK baseURL at the proxy URL (e.g., http://localhost:4000) and pass the proxy key as apiKey -- do NOT use provider API keys directly in client code)
(You MUST start master keys with sk- -- LiteLLM rejects master keys that do not follow this prefix convention)
(You MUST configure database_url pointing to PostgreSQL before using virtual keys, spend tracking, or team/user management -- these features require persistent storage)
Failure to follow these rules will produce misconfigured proxies with broken routing, security issues, or missing spend data.
</critical_reminders>
development
Material Design component library for Vue 3
development
VitePress 1.x — Vue-powered static site generator for documentation sites, built on Vite
tools
Docusaurus 3.x documentation framework — site configuration, docs/blog plugins, sidebars, versioning, MDX, swizzling, and deployment
development
TanStack Form patterns - useForm, form.Field, validators, arrays, linked fields, createFormHook, type safety