skills/dspy-copro/SKILL.md
Use when you want to optimize instructions by generating many candidates and picking the best — useful when few-shot demos alone are not enough and you want to tune the task description itself. Common scenarios - your current task instructions produce mediocre results, you want to automatically generate and test many instruction variants, the task is hard to describe in one sentence, or few-shot examples alone are not improving quality enough. Related - ai-improving-accuracy, dspy-gepa, dspy-miprov2. Also used for dspy.COPRO, instruction optimization, optimize task description, generate better prompts automatically, prompt engineering automation, find the best instruction for my task, automatic prompt generation, instruction tuning without fine-tuning, COPRO vs MIPROv2, when to optimize instructions vs demos, instruction search, prompt optimization by generating candidates, systematic prompt improvement.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-coproInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through using dspy.COPRO to automatically generate, evaluate, and select the best instructions for their DSPy program's signatures.
dspy.COPRO (Collaborative Prompting) is a DSPy optimizer that improves your program by finding better instructions for each signature. Instead of you hand-writing prompt instructions, COPRO generates many candidate instructions, evaluates each one against your metric, and keeps the best.
Key properties:
Use dspy.COPRO when:
breadth)Do not use COPRO when:
dspy.MIPROv2 instead (it tunes both instructions and demos)dspy.GEPA insteaddspy.BootstrapFinetunedspy.BootstrapFewShotThree things are needed: a program, a metric, and training data.
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or "anthropic/claude-sonnet-4-5-20250929", etc.
# 1. Define a program
classify = dspy.ChainOfThought("text -> label")
# 2. Define a metric
def metric(example, prediction, trace=None):
return prediction.label.lower() == example.label.lower()
# 3. Prepare training data
trainset = [
dspy.Example(text="Love this product!", label="positive").with_inputs("text"),
dspy.Example(text="Terrible experience.", label="negative").with_inputs("text"),
# ... 20-200 examples
]
# 4. Optimize with COPRO
optimizer = dspy.COPRO(
metric=metric,
breadth=10,
depth=3,
)
optimized = optimizer.compile(
classify,
trainset=trainset,
eval_kwargs=dict(num_threads=4, display_progress=True),
)
# 5. Use the optimized program
result = optimized(text="The quality exceeded my expectations.")
print(result.label)
dspy.COPRO(
prompt_model=None, # LM for generating candidates (defaults to configured LM)
metric=None, # Evaluation function (required)
breadth=10, # Number of candidates per iteration (must be >1)
depth=3, # Number of optimization iterations
init_temperature=1.4, # Temperature for candidate generation
track_stats=False, # Collect optimization statistics
)
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| prompt_model | dspy.LM | None | LM used to generate instruction candidates. If None, uses the globally configured LM |
| metric | Callable | None | Scoring function with signature (example, prediction, trace=None) -> float/bool. Required |
| breadth | int | 10 | Number of candidate instructions generated per iteration. Higher = wider search, more LM calls |
| depth | int | 3 | Number of optimization rounds. Each round refines candidates from the previous round |
| init_temperature | float | 1.4 | Temperature for generating candidates. Higher = more diverse candidates |
| track_stats | bool | False | When True, collects per-iteration statistics (max, average, min, std dev of scores) |
optimized = optimizer.compile(
student, # Program to optimize (modified in-place)
trainset=trainset, # Training examples
eval_kwargs={}, # Extra kwargs for dspy.Evaluate
)
| Parameter | Type | Description |
|-----------|------|-------------|
| student | dspy.Module | The program to optimize. COPRO modifies it in-place and also returns it |
| trainset | list[dspy.Example] | Training examples for evaluating candidates |
| eval_kwargs | dict | Passed to dspy.Evaluate -- commonly num_threads, display_progress, display_table |
The returned program has additional metadata:
optimized.candidate_programs -- dict of all evaluated candidates with their scoresoptimized.total_calls -- total LM API calls made during optimizationbreadth controls how many instruction candidates COPRO generates per iteration. It is the most important tuning knob.
| Breadth | Candidates per round | Total candidates (depth=3) | Use case | |---------|---------------------|---------------------------|----------| | 5 | 4 new + 1 base | ~15 | Quick test, cheap | | 10 (default) | 9 new + 1 base | ~30 | Good balance | | 20 | 19 new + 1 base | ~60 | Thorough search | | 50 | 49 new + 1 base | ~150 | Exhaustive, expensive |
The first iteration generates breadth - 1 new candidates from the base instruction. Subsequent iterations generate new candidates informed by the best performers so far.
Cost note: Each candidate is evaluated on the full trainset, so total LM calls scale as breadth * depth * len(trainset). With breadth=10, depth=3, and 100 training examples, expect roughly 3,000 evaluation calls plus candidate generation calls.
COPRO follows a seeding-and-refinement loop:
Seed phase (iteration 0): Takes the existing instruction from each signature. Generates breadth - 1 alternative instructions using temperature-controlled sampling from the prompt model.
Evaluate phase: Scores every candidate instruction by swapping it into the program and running the metric against the full training set. Duplicate (instruction, prefix) pairs are skipped.
Refine phase (iterations 1 through depth-1): Takes the top-performing candidates from the previous round. Generates new candidates informed by what worked and what did not.
Multi-predictor handling: When a program has multiple predictors, COPRO optimizes them sequentially. It locks in the best instruction for predictor 1 before moving to predictor 2, so later predictors benefit from earlier improvements.
Selection: After all iterations, the instruction with the highest metric score is selected for each predictor.
Enable track_stats=True to see how candidates perform across iterations:
optimizer = dspy.COPRO(
metric=metric,
breadth=15,
depth=3,
track_stats=True,
)
optimized = optimizer.compile(
my_program,
trainset=trainset,
eval_kwargs=dict(num_threads=4),
)
When track_stats is enabled, COPRO logs per-iteration statistics including max, average, min, and standard deviation of candidate scores. This helps you understand whether the search is converging or whether more breadth/depth would help.
You can use a stronger (or cheaper) LM specifically for generating instruction candidates:
# Use a strong model to generate candidates, evaluate with the production model
candidate_lm = dspy.LM("openai/gpt-4o") # or "anthropic/claude-sonnet-4-5-20250929", etc.
production_lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-haiku-4-5-20251001", etc.
dspy.configure(lm=production_lm)
optimizer = dspy.COPRO(
prompt_model=candidate_lm,
metric=metric,
breadth=10,
depth=3,
)
optimized = optimizer.compile(my_program, trainset=trainset, eval_kwargs={})
This is useful when you want a capable model to brainstorm instructions but evaluate and run with a cheaper model.
| Aspect | COPRO | GEPA | MIPROv2 |
|--------|-------|------|---------|
| What it tunes | Instructions + prefixes | Instructions | Instructions + few-shot demos |
| Search strategy | Breadth-first candidate generation | Evolutionary (genetic programming) | Bayesian optimization |
| Data needed | 20-200 examples | 20-100 examples | 50-500 examples |
| Key parameter | breadth (candidates per round) | Population/generations | auto ("light"/"medium"/"heavy") |
| Cost | Moderate (breadth * depth * trainset evals) | Low-moderate | Moderate-high |
| Best for | Exploring many instruction variants | Few examples, feedback-driven instruction tuning | Best overall prompt optimization |
When to pick COPRO over alternatives:
When to pick MIPROv2 instead:
auto setting over manual tuning of search parametersWhen to pick GEPA instead:
breadth=1 which silently breaks optimization. breadth must be greater than 1 — with breadth=1 there are no alternative candidates to evaluate. Use at least breadth=5 for a meaningful search.compile() modifies the student in-place. Unlike most optimizers, COPRO mutates the program you pass to compile(). If you need the original program for baseline comparison, clone it first or create a fresh instance before calling compile().eval_kwargs as positional instead of keyword. The compile() signature is compile(student, *, trainset, eval_kwargs) — trainset and eval_kwargs are keyword-only. Always use optimizer.compile(program, trainset=trainset, eval_kwargs={}).eval_kwargs parameter. COPRO requires eval_kwargs to be passed to compile(), even if empty. Omitting it causes a TypeError. Always include eval_kwargs={} or eval_kwargs=dict(num_threads=4).Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/ai-improving-accuracy/ai-improving-accuracy/ai-improving-accuracy/dspy-evaluate/dspy-data/dspy-signatures/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.