skills/dspy-better-together/SKILL.md
Use when you have already tried prompt-only optimization and want the next level — jointly tuning prompts and model weights for maximum quality. Common scenarios - you have maxed out prompt optimization and need the next level, combining instruction tuning with weight tuning for maximum quality, making a small model match a large model through joint optimization, or squeezing the last few percent of accuracy. Related - ai-fine-tuning, ai-improving-accuracy, ai-cutting-costs. Also used for dspy.BetterTogether, joint prompt and weight optimization, beyond prompt engineering, combine fine-tuning with prompt optimization, maximum possible quality from DSPy, hybrid optimization strategy, prompt optimization hit a ceiling, fine-tune and optimize prompts at the same time, advanced DSPy optimization, best possible accuracy, what to try after MIPROv2, next level AI quality.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-better-togetherInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through using dspy.BetterTogether to get the best possible quality by combining prompt optimization and model fine-tuning in alternating rounds. Each round builds on the improvements from the previous one, creating compounding gains that beat either approach alone.
BetterTogether is a DSPy optimizer that alternates between prompt optimization (instructions, few-shot examples) and weight optimization (fine-tuning). Instead of running these independently, it chains them so each phase builds on the previous one's improvements:
Research shows this consistently outperforms either approach alone, with 5-78% gains over individual techniques (arXiv 2407.10930v2). A Databricks case study on IE Bench showed GEPA alone +2.1 points, fine-tuning alone +1.9 points, but combined they achieved +4.8 points over baseline.
/ai-improving-accuracy)/ai-improving-accuracygpt-4o-mini/gpt-4o, or local models)Before starting, confirm:
import dspy
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)
# Define your program
class Classify(dspy.Signature):
"""Classify the support ticket into a category."""
text: str = dspy.InputField()
category: str = dspy.OutputField()
program = dspy.ChainOfThought(Classify)
# IMPORTANT: All predictors must have explicit LMs assigned
program.set_lm(lm)
# Define your metric
def metric(example, prediction, trace=None):
return prediction.category.strip().lower() == example.category.strip().lower()
# Prepare data
trainset = [dspy.Example(text=x["text"], category=x["category"]).with_inputs("text") for x in data]
valset = trainset[800:900]
trainset = trainset[:800]
# Run BetterTogether with defaults
optimizer = dspy.BetterTogether(metric=metric)
compiled = optimizer.compile(program, trainset=trainset, valset=valset)
By default, BetterTogether uses:
p: BootstrapFewShotWithRandomSearch for prompt optimizationw: BootstrapFinetune for weight optimization"p -> w -> p" (prompts, then weights, then prompts again)BetterTogether executes a strategy string that defines the order of optimization phases:
"p -> w -> p"
| | |
| | +-- Re-optimize prompts for the fine-tuned model
| +------- Fine-tune weights using the optimized prompts
+------------ Optimize prompts first (instructions + few-shot)
At each step:
After all steps, BetterTogether returns the best-scoring candidate across all phases (ties broken by earlier position).
Pass your own optimizers as keyword arguments. The keys become identifiers in the strategy string:
from dspy.teleprompt import GEPA, BootstrapFinetune
optimizer = dspy.BetterTogether(
metric=metric,
p=GEPA(metric=metric, auto="medium"),
w=BootstrapFinetune(metric=metric),
)
program.set_lm(lm)
compiled = optimizer.compile(
program,
trainset=trainset,
valset=valset,
strategy="p -> w -> p",
)
You can use any DSPy Teleprompter as an optimizer. Common choices:
| Key | Optimizer | Best for |
|-----|-----------|----------|
| p | GEPA | Instruction tuning, fewer examples |
| p | MIPROv2 | Best general prompt optimization |
| p | BootstrapFewShotWithRandomSearch | Fast prompt optimization (default) |
| w | BootstrapFinetune | Weight optimization (default) |
BetterTogether(metric, **optimizers)| Parameter | Type | Description |
|-----------|------|-------------|
| metric | Callable | Evaluation function (example, prediction, trace=None) -> numeric |
| **optimizers | keyword args | Custom optimizers. Keys become strategy identifiers (e.g., p=GEPA(...), w=BootstrapFinetune(...)) |
optimizer.compile(student, *, trainset, ...)| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| student | Module | required | Program to optimize. All predictors must have LMs via set_lm() |
| trainset | list[Example] | required | Training examples |
| valset | list[Example] | None | Validation set. If None, splits from trainset |
| valset_ratio | float | 0.1 | Fraction of trainset to use as valset when valset=None |
| strategy | str | "p -> w -> p" | Optimizer execution order using keys from constructor |
| teacher | Module or list[Module] | None | Optional teacher program(s) for distillation |
| num_threads | int | None | Parallel threads for evaluation |
| shuffle_trainset_between_steps | bool | True | Shuffle trainset before each step |
| seed | int | None | Random seed for reproducibility |
| optimizer_compile_args | dict | None | Per-optimizer custom compile arguments |
The compiled program has two extra attributes:
candidate_programs: List of dicts with 'program', 'score', 'strategy' keys, sorted by score descendingflag_compilation_error_occurred: Boolean indicating if any step failed| Strategy | Rounds | Use case |
|----------|--------|----------|
| "p -> w -> p" | 3 | Default. Best balance of quality and cost |
| "p -> w" | 2 | Simpler, cheaper. Good starting point |
| "w -> p" | 2 | When your model needs weight tuning first |
| "p -> w -> p -> w" | 4 | Maximum quality, highest cost |
BetterTogether runs multiple optimization rounds, so it costs more than individual optimizers:
| Strategy | Approximate cost | Time |
|----------|-----------------|------|
| "p -> w" | 1x prompt opt + 1x fine-tune | Hours |
| "p -> w -> p" (default) | 2x prompt opt + 1x fine-tune | Hours to half a day |
| "p -> w -> p -> w" | 2x prompt opt + 2x fine-tune | Half a day to a day |
Fine-tuning is the expensive part. Each fine-tuning round involves:
"p -> w" to see if two rounds are enoughoptimizer_compile_args to limit individual optimizer budgets| Approach | Data needed | Quality | Cost | When to use | |----------|-------------|---------|------|-------------| | MIPROv2 alone | 200+ | Good | Low | First optimization attempt | | BootstrapFinetune alone | 500+ | Better | Medium | When prompts hit a ceiling | | BetterTogether | 500+ | Best | High | When you need maximum quality |
Rule of thumb: Try MIPROv2 first. If you're still short of your quality target, try BootstrapFinetune. If you need more, use BetterTogether.
set_lm(). Global dspy.configure(lm=...) is not enough for BetterTogether.program = dspy.ChainOfThought(MySignature)
program.set_lm(lm) # Required
Fine-tunable model: The weight optimizer needs a model that supports fine-tuning (OpenAI, Databricks, or local models with GPU).
Validation data: Provide either an explicit valset or set valset_ratio > 0. Without validation data, BetterTogether returns the latest program instead of the best one.
Strategy keys must match: Keys in the strategy string must match the keyword argument names from the constructor.
After compilation, examine all candidate programs:
compiled = optimizer.compile(program, trainset=trainset, valset=valset)
# See all candidates ranked by score
for candidate in compiled.candidate_programs:
print(f"Strategy step: {candidate['strategy']}, Score: {candidate['score']:.1f}%")
# Check if any errors occurred
if compiled.flag_compilation_error_occurred:
print("Warning: one or more optimization steps failed")
BetterTogether has built-in resilience. If any optimization step fails:
flag_compilation_error_occurred = True on the resultAlways check this flag in production workflows.
set_lm() and relies on global dspy.configure(). BetterTogether requires every predictor to have an explicit LM assignment via program.set_lm(lm). Without it, the weight optimizer cannot identify which model to fine-tune and raises an error. Always call set_lm() on the program before compile().valset (or valset_ratio > 0), BetterTogether returns the latest program instead of the best-scoring one across all phases. Always provide a valset or leave valset_ratio=0.1 so the optimizer can select the best candidate.valset overlaps with trainset, the optimizer selects based on inflated scores. Use a held-out split or let BetterTogether auto-split via valset_ratio.flag_compilation_error_occurred after compile. If a fine-tuning step fails silently (API timeout, quota exceeded), BetterTogether returns the best program found before the failure. Always check compiled.flag_compilation_error_occurred and inspect compiled.candidate_programs to verify which steps completed.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/ai-fine-tuning/ai-improving-accuracy/dspy-evaluate/dspy-data/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.