skills/dspy-miprov2/SKILL.md
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-miprov2Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through using dspy.MIPROv2, DSPy's most powerful prompt optimizer. MIPROv2 jointly optimizes instructions and few-shot demonstrations to maximize a metric on your training data.
MIPROv2 (Multi-prompt Instruction PRoposal Optimizer v2) is DSPy's recommended optimizer for prompt optimization. Unlike simpler optimizers that only tune few-shot examples, MIPROv2 jointly optimizes:
It works by proposing candidate instructions, bootstrapping demonstrations, and searching over combinations using Bayesian optimization. The result is a program with better prompts that produce higher-quality outputs.
If you have fewer than 50 examples or need a quick first pass, start with BootstrapFewShot (see /dspy-bootstrap-few-shot), then upgrade to MIPROv2.
import dspy
from dspy.evaluate import Evaluate
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)
# 1. Your program
qa = dspy.ChainOfThought("question -> answer")
# 2. Your data (mark which fields are inputs)
trainset = [
dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
dspy.Example(question="What is 2+2?", answer="4").with_inputs("question"),
# 50-200+ examples recommended
]
devset = [
dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare").with_inputs("question"),
# 20-50 held-out examples for evaluation
]
# 3. Your metric
def metric(example, prediction, trace=None):
return prediction.answer.strip().lower() == example.answer.strip().lower()
# 4. Optimize with MIPROv2
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized = optimizer.compile(qa, trainset=trainset)
# 5. Evaluate improvement
evaluator = Evaluate(devset=devset, metric=metric, num_threads=4, display_progress=True)
score = evaluator(optimized)
print(f"Optimized score: {score:.1f}%")
# 6. Save
optimized.save("optimized_qa.json")
The auto parameter controls how much computation MIPROv2 uses. It sets the number of instruction candidates, demo candidates, and search trials automatically:
| Level | What it does | Typical cost | When to use |
|-------|-------------|-------------|-------------|
| "light" (default) | Fewer candidates, fewer trials | ~$1-2 | Quick experiments, early iteration |
| "medium" | Balanced search | ~$5-10 | Recommended starting point for most tasks |
| "heavy" | More candidates, more trials | ~$15-30 | Production, maximum quality |
# Quick experiment
optimizer = dspy.MIPROv2(metric=metric, auto="light")
# Balanced (recommended starting point)
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
# Maximum quality
optimizer = dspy.MIPROv2(metric=metric, auto="heavy")
Start with "medium". Only move to "heavy" if you have a large trainset (200+), a meaningful metric, and the budget for it. Use "light" for quick sanity checks during development.
MIPROv2 optimizes every dspy.Predict (or dspy.ChainOfThought, etc.) module in your program. For each module, it tunes:
MIPROv2 generates candidate instructions by analyzing your training data and the task structure. It proposes multiple phrasings, then searches for the combination that maximizes your metric.
MIPROv2 bootstraps demonstrations by running your program on training examples and keeping successful traces (where the metric passes). It then selects which demos to include in each module's prompt.
The key advantage over simpler optimizers: MIPROv2 searches over combinations of instructions and demos together. Good instructions may need different demos than mediocre instructions, and MIPROv2 finds the best pairing.
optimizer = dspy.MIPROv2(
metric=metric, # Required: your metric function
auto="medium", # "light", "medium", "heavy" — controls search budget
)
optimized = optimizer.compile(
my_program, # Required: the program to optimize
trainset=trainset, # Required: list of dspy.Example with .with_inputs()
)
If auto does not give you enough control, you can set parameters directly:
optimizer = dspy.MIPROv2(
metric=metric,
auto=None, # Disable auto presets for manual control
num_candidates=10, # Number of instruction candidates per module
max_bootstrapped_demos=4, # Max bootstrapped demos per module
max_labeled_demos=4, # Max labeled demos per module
)
optimized = optimizer.compile(
my_program,
trainset=trainset,
num_trials=30, # Bayesian optimization trials (passed to compile, not constructor)
)
Most users should stick with auto. Manual configuration is useful when you want to fine-tune the search budget or when you have domain-specific constraints (e.g., limiting demo count to keep prompts short).
MIPROv2 makes many LM calls during optimization. The cost depends on:
"heavy" makes roughly 5-10x more calls than "light""light", ship with "medium" or "heavy" — iterate cheaply, then invest in the final optimization"light" and "medium"; scale up for "heavy"num_threads in your evaluator to parallelize evaluation calls| auto level | 50 examples | 200 examples |
|-----------|------------|-------------|
| "light" | 2-5 min | 5-15 min |
| "medium" | 10-20 min | 20-40 min |
| "heavy" | 30-60 min | 1-3 hours |
Times vary significantly based on model latency, number of modules, and thread count.
| | MIPROv2 | BootstrapFewShot | SIMBA | BetterTogether | GEPA | |---|---------|-----------------|-------|----------------|------| | Tunes instructions | Yes | No | Yes | Yes | Yes | | Tunes demos | Yes | Yes | Yes | Yes | No | | Joint optimization | Yes | No | Yes | Yes (alternating) | No | | Min examples | ~50 | ~10 | ~50 | ~50 | ~20 | | Typical improvement | 15-35% | 5-20% | 15-35% | 15-35% | 10-25% | | Cost | Medium-High | Low | Medium-High | High | Low | | Best for | Production prompts | Quick start | Iterative refinement | Multi-strategy | Few examples, instruction-only, feedback-driven |
A common pattern is to run BootstrapFewShot first, then MIPROv2 on the result. Bootstrap finds good demonstrations quickly, then MIPRO refines the instructions around them:
# Step 1: Quick bootstrap
bootstrap = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
bootstrapped = bootstrap.compile(my_program, trainset=trainset)
# Step 2: Refine with MIPROv2
mipro = dspy.MIPROv2(metric=metric, auto="medium")
final = mipro.compile(bootstrapped, trainset=trainset)
This often beats running either optimizer alone.
# Save the optimized program
optimized.save("optimized_program.json")
# Load later
from my_module import MyProgram # your program class
loaded = MyProgram()
loaded.load("optimized_program.json")
# Use it
result = loaded(question="What is DSPy?")
Optimized prompts are model-specific. If you switch LM providers or models, re-run the optimizer. See /ai-switching-models.
Always measure the baseline before optimizing so you know the improvement:
from dspy.evaluate import Evaluate
evaluator = Evaluate(devset=devset, metric=metric, num_threads=4, display_table=5)
# Baseline
baseline_score = evaluator(my_program)
# Optimize
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized = optimizer.compile(my_program, trainset=trainset)
# Compare
optimized_score = evaluator(optimized)
print(f"Baseline: {baseline_score:.1f}%")
print(f"Optimized: {optimized_score:.1f}%")
print(f"Delta: {optimized_score - baseline_score:+.1f}%")
Use the trace parameter to require stricter quality during optimization. This makes MIPROv2 select higher-quality demonstrations:
def metric(example, prediction, trace=None):
correct = prediction.answer.strip().lower() == example.answer.strip().lower()
if trace is not None:
# During optimization: require reasoning too
has_reasoning = len(getattr(prediction, "reasoning", "")) > 50
return correct and has_reasoning
return correct
MIPROv2 optimizes all modules in your program. For a multi-step pipeline, each module gets its own optimized instructions and demos:
class RAGPipeline(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.generate(context=context, question=question)
rag = RAGPipeline()
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized_rag = optimizer.compile(rag, trainset=trainset)
auto="heavy" by default for production. The auto parameter defaults to "light", and "medium" is the recommended starting point. Heavy is 5-10x more expensive and only justified with 200+ examples and a well-validated metric. Start with "medium" and upgrade only if the score plateaus.trainset as a positional argument to compile(). The trainset parameter is keyword-only in MIPROv2: optimizer.compile(program, trainset=trainset), not optimizer.compile(program, trainset). Passing it positionally raises a TypeError..with_inputs() on training examples. Every dspy.Example in the trainset must call .with_inputs("field1", "field2") to mark which fields are inputs vs labels. Without this, MIPROv2 cannot distinguish inputs from expected outputs and optimization silently underperforms.num_candidates without also setting num_trials. When using manual configuration (no auto), both num_candidates and num_trials must be set. Setting only one produces suboptimal search — more candidates without enough trials to evaluate them is wasted compute.requires_permission_to_run parameter. This parameter has been removed from MIPROv2. Passing True raises a ValueError. Remove it entirely from compile() calls.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/ai-watching-optimization/dspy-data/dspy-evaluate/dspy-bootstrap-few-shot/dspy-bootstrap-rs/ai-improving-accuracy/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.
development
Measure and improve how well your AI works. Use when AI gives wrong answers, accuracy is bad, responses are unreliable, you need to test AI quality, evaluate your AI, write metrics, benchmark performance, optimize prompts, improve results, or systematically make your AI better. Also used for spent hours tweaking prompts, trial and error prompt engineering is not working, quality plateaued early, stale prompts everywhere in your codebase, my AI is only 60% accurate, how to measure AI quality, AI evaluation framework, benchmark my LLM, prompt optimization not working, systematic way to improve AI, AI accuracy plateaued, DSPy optimizer tutorial, MIPROv2 optimization, how to go from 70% to 90% accuracy.