skills/dspy-lm/SKILL.md
Use when you need to configure which language model DSPy uses — setting up providers, API keys, model parameters, or assigning different models to different pipeline stages. Common scenarios - setting up OpenAI or Anthropic API keys, configuring model parameters like temperature and max_tokens, using different models for different pipeline stages, switching between providers, using local models with Ollama or vLLM, or setting up Azure OpenAI. Related - ai-switching-models, ai-cutting-costs, ai-kickoff. Also used for dspy.LM, dspy.configure, configure language model in DSPy, OpenAI API key setup DSPy, Anthropic Claude with DSPy, use Ollama with DSPy, local model DSPy, Azure OpenAI DSPy setup, model temperature and max_tokens, different models per module, multi-model DSPy pipeline, vLLM with DSPy, change provider without changing code, model configuration DSPy.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-lmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
dspy.LM is DSPy's unified interface for calling language models. It wraps LiteLLM so any provider -- OpenAI, Anthropic, Google, Together AI, Ollama, vLLM, and 100+ others -- works through one consistent API. You configure a model once, then every DSPy module uses it automatically.
import dspy
# Create an LM instance with a provider/model string
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
# Set it as the default for all DSPy modules
dspy.configure(lm=lm)
# Now any module uses this LM automatically
classify = dspy.ChainOfThought("text -> label")
result = classify(text="DSPy is great")
print(result.label)
The pattern is always: dspy.LM("provider/model") then dspy.configure(lm=lm).
DSPy uses the LiteLLM "provider/model-name" format. Here are the most common providers:
| Provider | Example string | Notes |
|----------|---------------|-------|
| OpenAI | "openai/gpt-4o" | Default provider, auto-detected |
| OpenAI | "openai/gpt-4o-mini" | Cheaper, faster |
| Anthropic | "anthropic/claude-sonnet-4-5-20250929" | |
| Anthropic | "anthropic/claude-haiku-4-5-20251001" | Fast and cheap |
| Google | "gemini/gemini-2.0-flash" | |
| Together AI | "together_ai/meta-llama/Llama-3-70b-chat-hf" | Open-source models |
| Groq | "groq/llama-3.1-70b-versatile" | Fast inference |
| Ollama (local) | "ollama_chat/llama3.1" | Requires api_base |
| Azure OpenAI | "azure/my-gpt4-deployment" | Requires api_base + api_key |
| OpenAI-compatible | "openai/my-model" | Any server with api_base |
See LiteLLM provider docs for the full list.
lm = dspy.LM(
model="openai/gpt-4o", # Required: "provider/model-name"
model_type="chat", # "chat" (default), "text", or "responses"
temperature=0.7, # Sampling temperature (default: provider default)
max_tokens=1000, # Max output tokens (default: provider default)
cache=True, # Enable built-in caching (default: True)
num_retries=3, # Retry on transient failures (default: 3)
use_developer_role=False, # Use developer/system role (default: False)
# Plus any extra kwargs passed to LiteLLM
)
model (required) -- The provider/model string. This is the only required argument.temperature -- Controls randomness. Lower = more deterministic. Set to 0.0 for reproducible outputs. Reasoning models (o1, o3) require temperature=1.0 or None.max_tokens -- Maximum tokens in the response. Reasoning models require max_tokens >= 16000 or None.cache -- When True (the default), DSPy caches LM responses to reduce costs and speed up repeated calls. Set to False to disable.num_retries -- Number of retries with exponential backoff on transient failures.model_type -- Usually leave as "chat". Use "text" for completion-only models. Use "responses" for OpenAI responses API.You do not have to use the same model for every step. Assign different LMs to different modules with set_lm():
expensive_lm = dspy.LM("openai/gpt-4o")
cheap_lm = dspy.LM("openai/gpt-4o-mini")
# Set a default
dspy.configure(lm=cheap_lm)
class MyPipeline(dspy.Module):
def __init__(self):
self.classify = dspy.Predict("text -> category")
self.generate = dspy.ChainOfThought("text, category -> summary")
def forward(self, text):
category = self.classify(text=text)
return self.generate(text=text, category=category.category)
pipeline = MyPipeline()
# Route: cheap model for classification, expensive for generation
pipeline.classify.set_lm(cheap_lm)
pipeline.generate.set_lm(expensive_lm)
dspy.contextUse dspy.context to temporarily switch LMs for a block of code:
with dspy.context(lm=expensive_lm):
# Everything inside uses expensive_lm
result = pipeline(text="important document")
# Back to the default LM outside the block
You can call an LM instance directly for one-off prompts outside of DSPy modules:
lm = dspy.LM("openai/gpt-4o-mini")
# Pass a string prompt
response = lm("What is the capital of France?")
print(response) # returns a list of strings
# Pass a messages list (chat format)
response = lm(messages=[
{"role": "user", "content": "What is the capital of France?"}
])
print(response) # returns a list of strings
Direct calls are useful for quick tests, but for structured tasks use DSPy modules and signatures -- they give you type checking, optimization, and caching.
Set API keys as environment variables. Never hardcode them.
# OpenAI
export OPENAI_API_KEY=sk-...
# Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Together AI
export TOGETHER_API_KEY=...
# Google
export GEMINI_API_KEY=...
# Groq
export GROQ_API_KEY=...
# Azure OpenAI
export AZURE_API_KEY=...
export AZURE_API_BASE=https://your-resource.openai.azure.com/
DSPy (via LiteLLM) reads these automatically. You can also pass api_key directly to dspy.LM() if needed, but environment variables are preferred.
DSPy caches LM responses by default. This means:
# Caching enabled (default)
lm = dspy.LM("openai/gpt-4o-mini", cache=True)
# Disable caching for this LM
lm = dspy.LM("openai/gpt-4o-mini", cache=False)
# Configure cache settings globally
dspy.configure_cache(
enable=True, # Toggle caching on/off
)
Cache is stored locally. If you need different responses for the same prompt (e.g., generating diverse examples), disable caching or use different temperature values.
| Method | Purpose |
|--------|---------|
| lm("prompt") | Direct call -- returns list of strings |
| lm.copy(**kwargs) | Deep copy with updated parameters |
| lm.inspect_history() | View recent request/response history |
| lm.dump_state() | Serialize config (excludes API keys) |
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
classify = dspy.Predict("text -> label")
classify(text="Hello world")
# See what was sent to the LM
dspy.inspect_history(n=1)
# Start Ollama: ollama serve
# Pull a model: ollama pull llama3.1
lm = dspy.LM(
"ollama_chat/llama3.1",
api_base="http://localhost:11434",
api_key="",
temperature=0.7,
num_ctx=8192, # set context window explicitly — Ollama defaults to 4096
)
dspy.configure(lm=lm)
For full Ollama setup (model selection, GPU tuning, context window gotchas, optimization tips), see /dspy-ollama.
# Start vLLM: vllm serve meta-llama/Llama-3.1-8B-Instruct
lm = dspy.LM(
"openai/meta-llama/Llama-3.1-8B-Instruct",
api_base="http://localhost:8000/v1",
api_key="none",
)
dspy.configure(lm=lm)
For any server that exposes an OpenAI-compatible /v1/chat/completions endpoint, use the "openai/model-name" provider string with api_base pointing to your server.
For full vLLM setup (tensor parallelism, GPU sizing, quantization, production deployment), see /dspy-vllm.
dspy.LM("gpt-4o-mini") instead of dspy.LM("openai/gpt-4o-mini"). While some models auto-detect the provider, the explicit "provider/model" format is required for reliable routing through LiteLLM. Always include the provider prefix.temperature=0 for reasoning models. OpenAI reasoning models (o1, o3, o4, gpt-5 families) require temperature=1.0 or None. Setting temperature=0 raises an error. Similarly, max_tokens must be >= 16000 or None for these models.dspy.configure(lm=lm) inside forward(). Configuration should happen once at the top of your script, not per-call. Calling dspy.configure inside forward() resets global state on every invocation and breaks caching. Use set_lm() or dspy.context() for per-module or temporary overrides instead.api_base for local models. Ollama and vLLM require api_base pointing to the local server (http://localhost:11434 for Ollama, http://localhost:8000/v1 for vLLM). Without it, DSPy tries to reach the cloud API and fails with an authentication error.OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.), never passed directly to dspy.LM(). DSPy reads them automatically via LiteLLM.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/dspy-signatures -- Define what your LM should do (inputs, outputs, types)/dspy-modules -- Wrap signatures with inference strategies (Predict, ChainOfThought, ReAct)/ai-switching-models -- Safely migrate between providers with re-optimization/ai-cutting-costs -- Reduce LM costs with per-module assignment and cheaper models/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.