Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lebsral/dspy-miprov2

Name: dspy-miprov2
Author: lebsral

skills/dspy-miprov2/SKILL.md

npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-miprov2

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Optimize Prompts with MIPROv2

Guide the user through using dspy.MIPROv2, DSPy's most powerful prompt optimizer. MIPROv2 jointly optimizes instructions and few-shot demonstrations to maximize a metric on your training data.

What is MIPROv2

MIPROv2 (Multi-prompt Instruction PRoposal Optimizer v2) is DSPy's recommended optimizer for prompt optimization. Unlike simpler optimizers that only tune few-shot examples, MIPROv2 jointly optimizes:

Instructions — the natural-language task descriptions in each module's prompt
Few-shot demonstrations — the input-output examples included in each module's prompt

It works by proposing candidate instructions, bootstrapping demonstrations, and searching over combinations using Bayesian optimization. The result is a program with better prompts that produce higher-quality outputs.

When to use MIPROv2

Production optimization — you want the best prompt quality DSPy can deliver
50+ training examples — MIPROv2 needs enough data to search effectively
Both instructions and demos matter — you want the optimizer to tune everything, not just examples
You have budget for multiple LM calls — MIPROv2 is more expensive than BootstrapFewShot but produces better results

If you have fewer than 50 examples or need a quick first pass, start with BootstrapFewShot (see /dspy-bootstrap-few-shot), then upgrade to MIPROv2.

Basic usage

import dspy
from dspy.evaluate import Evaluate

lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

# 1. Your program
qa = dspy.ChainOfThought("question -> answer")

# 2. Your data (mark which fields are inputs)
trainset = [
    dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
    dspy.Example(question="What is 2+2?", answer="4").with_inputs("question"),
    # 50-200+ examples recommended
]

devset = [
    dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare").with_inputs("question"),
    # 20-50 held-out examples for evaluation
]

# 3. Your metric
def metric(example, prediction, trace=None):
    return prediction.answer.strip().lower() == example.answer.strip().lower()

# 4. Optimize with MIPROv2
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized = optimizer.compile(qa, trainset=trainset)

# 5. Evaluate improvement
evaluator = Evaluate(devset=devset, metric=metric, num_threads=4, display_progress=True)
score = evaluator(optimized)
print(f"Optimized score: {score:.1f}%")

# 6. Save
optimized.save("optimized_qa.json")

The auto parameter

The auto parameter controls how much computation MIPROv2 uses. It sets the number of instruction candidates, demo candidates, and search trials automatically:

| Level | What it does | Typical cost | When to use | |-------|-------------|-------------|-------------| | "light" (default) | Fewer candidates, fewer trials | ~$1-2 | Quick experiments, early iteration | | "medium" | Balanced search | ~$5-10 | Recommended starting point for most tasks | | "heavy" | More candidates, more trials | ~$15-30 | Production, maximum quality |

# Quick experiment
optimizer = dspy.MIPROv2(metric=metric, auto="light")

# Balanced (recommended starting point)
optimizer = dspy.MIPROv2(metric=metric, auto="medium")

# Maximum quality
optimizer = dspy.MIPROv2(metric=metric, auto="heavy")

Start with "medium". Only move to "heavy" if you have a large trainset (200+), a meaningful metric, and the budget for it. Use "light" for quick sanity checks during development.

What MIPROv2 tunes

MIPROv2 optimizes every dspy.Predict (or dspy.ChainOfThought, etc.) module in your program. For each module, it tunes:

Instructions

MIPROv2 generates candidate instructions by analyzing your training data and the task structure. It proposes multiple phrasings, then searches for the combination that maximizes your metric.

Few-shot demonstrations

MIPROv2 bootstraps demonstrations by running your program on training examples and keeping successful traces (where the metric passes). It then selects which demos to include in each module's prompt.

Joint optimization

The key advantage over simpler optimizers: MIPROv2 searches over combinations of instructions and demos together. Good instructions may need different demos than mediocre instructions, and MIPROv2 finds the best pairing.

Key parameters

optimizer = dspy.MIPROv2(
    metric=metric,          # Required: your metric function
    auto="medium",          # "light", "medium", "heavy" — controls search budget
)

optimized = optimizer.compile(
    my_program,             # Required: the program to optimize
    trainset=trainset,      # Required: list of dspy.Example with .with_inputs()
)

Manual configuration (advanced)

If auto does not give you enough control, you can set parameters directly:

optimizer = dspy.MIPROv2(
    metric=metric,
    auto=None,                      # Disable auto presets for manual control
    num_candidates=10,              # Number of instruction candidates per module
    max_bootstrapped_demos=4,       # Max bootstrapped demos per module
    max_labeled_demos=4,            # Max labeled demos per module
)

optimized = optimizer.compile(
    my_program,
    trainset=trainset,
    num_trials=30,                  # Bayesian optimization trials (passed to compile, not constructor)
)

Most users should stick with auto. Manual configuration is useful when you want to fine-tune the search budget or when you have domain-specific constraints (e.g., limiting demo count to keep prompts short).

Computational cost

MIPROv2 makes many LM calls during optimization. The cost depends on:

auto level — "heavy" makes roughly 5-10x more calls than "light"
Number of modules — programs with multiple Predict/ChainOfThought modules cost more
Trainset size — more examples means more bootstrapping runs
Model cost — using GPT-4o costs more per call than GPT-4o-mini

Cost management tips

Develop with "light", ship with "medium" or "heavy" — iterate cheaply, then invest in the final optimization
Use a cheaper model for optimization, then evaluate on the target model — if your production model is expensive, optimize with a cheaper one first to validate the approach
Start with fewer training examples — 50-100 examples is enough for "light" and "medium"; scale up for "heavy"
Set num_threads in your evaluator to parallelize evaluation calls

Typical wall-clock time

| auto level | 50 examples | 200 examples | |-----------|------------|-------------| | "light" | 2-5 min | 5-15 min | | "medium" | 10-20 min | 20-40 min | | "heavy" | 30-60 min | 1-3 hours |

Times vary significantly based on model latency, number of modules, and thread count.

Comparison with other optimizers

| | MIPROv2 | BootstrapFewShot | SIMBA | BetterTogether | GEPA | |---|---------|-----------------|-------|----------------|------| | Tunes instructions | Yes | No | Yes | Yes | Yes | | Tunes demos | Yes | Yes | Yes | Yes | No | | Joint optimization | Yes | No | Yes | Yes (alternating) | No | | Min examples | ~50 | ~10 | ~50 | ~50 | ~20 | | Typical improvement | 15-35% | 5-20% | 15-35% | 15-35% | 10-25% | | Cost | Medium-High | Low | Medium-High | High | Low | | Best for | Production prompts | Quick start | Iterative refinement | Multi-strategy | Few examples, instruction-only, feedback-driven |

When to use what

BootstrapFewShot — first optimization pass, quick iteration, small datasets
MIPROv2 — best prompt optimization, production use, 50+ examples
SIMBA — iterative refinement with support for minibatching; good alternative to MIPROv2
BetterTogether — alternates between prompt optimization and fine-tuning for maximum quality
GEPA — instruction-only tuning with textual feedback, 20-100 examples
BootstrapFinetune — fine-tuning model weights (different category entirely)

Stacking optimizers

A common pattern is to run BootstrapFewShot first, then MIPROv2 on the result. Bootstrap finds good demonstrations quickly, then MIPRO refines the instructions around them:

# Step 1: Quick bootstrap
bootstrap = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
bootstrapped = bootstrap.compile(my_program, trainset=trainset)

# Step 2: Refine with MIPROv2
mipro = dspy.MIPROv2(metric=metric, auto="medium")
final = mipro.compile(bootstrapped, trainset=trainset)

This often beats running either optimizer alone.

Save and load

# Save the optimized program
optimized.save("optimized_program.json")

# Load later
from my_module import MyProgram  # your program class
loaded = MyProgram()
loaded.load("optimized_program.json")

# Use it
result = loaded(question="What is DSPy?")

Optimized prompts are model-specific. If you switch LM providers or models, re-run the optimizer. See /ai-switching-models.

Common patterns

Evaluate before and after

Always measure the baseline before optimizing so you know the improvement:

from dspy.evaluate import Evaluate

evaluator = Evaluate(devset=devset, metric=metric, num_threads=4, display_table=5)

# Baseline
baseline_score = evaluator(my_program)

# Optimize
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized = optimizer.compile(my_program, trainset=trainset)

# Compare
optimized_score = evaluator(optimized)
print(f"Baseline:  {baseline_score:.1f}%")
print(f"Optimized: {optimized_score:.1f}%")
print(f"Delta:     {optimized_score - baseline_score:+.1f}%")

Trace-aware metric for better demos

Use the trace parameter to require stricter quality during optimization. This makes MIPROv2 select higher-quality demonstrations:

def metric(example, prediction, trace=None):
    correct = prediction.answer.strip().lower() == example.answer.strip().lower()
    if trace is not None:
        # During optimization: require reasoning too
        has_reasoning = len(getattr(prediction, "reasoning", "")) > 50
        return correct and has_reasoning
    return correct

Multi-module programs

MIPROv2 optimizes all modules in your program. For a multi-step pipeline, each module gets its own optimized instructions and demos:

class RAGPipeline(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

rag = RAGPipeline()
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized_rag = optimizer.compile(rag, trainset=trainset)

Gotchas

Claude sets auto="heavy" by default for production. The auto parameter defaults to "light", and "medium" is the recommended starting point. Heavy is 5-10x more expensive and only justified with 200+ examples and a well-validated metric. Start with "medium" and upgrade only if the score plateaus.
Claude passes trainset as a positional argument to compile(). The trainset parameter is keyword-only in MIPROv2: optimizer.compile(program, trainset=trainset), not optimizer.compile(program, trainset). Passing it positionally raises a TypeError.
Claude forgets .with_inputs() on training examples. Every dspy.Example in the trainset must call .with_inputs("field1", "field2") to mark which fields are inputs vs labels. Without this, MIPROv2 cannot distinguish inputs from expected outputs and optimization silently underperforms.
Claude sets num_candidates without also setting num_trials. When using manual configuration (no auto), both num_candidates and num_trials must be set. Setting only one produces suboptimal search — more candidates without enough trials to evaluate them is wasted compute.
Claude uses the deprecated requires_permission_to_run parameter. This parameter has been removed from MIPROv2. Passing True raises a ValueError. Remove it entirely from compile() calls.

Additional resources

dspy.MIPROv2 API docs
For API details, see reference.md
For worked examples, see examples.md

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

Watch MIPROv2 optimization progress -- see /ai-watching-optimization
Need to prepare training data? Use /dspy-data
Want to write and run metrics? Use /dspy-evaluate
Starting with a simpler optimizer first? Use /dspy-bootstrap-few-shot
Want random search over few-shot demos? Use /dspy-bootstrap-rs
For the full measure-improve-verify loop, see /ai-improving-accuracy
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

lebsral/dspy-miprov2

skills/dspy-miprov2/SKILL.md

Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.

6 stars

testing

Updated May 31, 2026

$ install --global

skillsauth

npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-miprov2

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 31, 2026, 6:40 AM36.4s4 files scanned

SKILL.md

name:: dspy-miprov2
description:: Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.

Optimize Prompts with MIPROv2

Guide the user through using dspy.MIPROv2, DSPy's most powerful prompt optimizer. MIPROv2 jointly optimizes instructions and few-shot demonstrations to maximize a metric on your training data.

What is MIPROv2

Instructions — the natural-language task descriptions in each module's prompt
Few-shot demonstrations — the input-output examples included in each module's prompt

When to use MIPROv2

Production optimization — you want the best prompt quality DSPy can deliver
50+ training examples — MIPROv2 needs enough data to search effectively
Both instructions and demos matter — you want the optimizer to tune everything, not just examples
You have budget for multiple LM calls — MIPROv2 is more expensive than BootstrapFewShot but produces better results

If you have fewer than 50 examples or need a quick first pass, start with BootstrapFewShot (see /dspy-bootstrap-few-shot), then upgrade to MIPROv2.

Basic usage

import dspy
from dspy.evaluate import Evaluate

lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

# 1. Your program
qa = dspy.ChainOfThought("question -> answer")

# 2. Your data (mark which fields are inputs)
trainset = [
    dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
    dspy.Example(question="What is 2+2?", answer="4").with_inputs("question"),
    # 50-200+ examples recommended
]

devset = [
    dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare").with_inputs("question"),
    # 20-50 held-out examples for evaluation
]

# 3. Your metric
def metric(example, prediction, trace=None):
    return prediction.answer.strip().lower() == example.answer.strip().lower()

# 4. Optimize with MIPROv2
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized = optimizer.compile(qa, trainset=trainset)

# 5. Evaluate improvement
evaluator = Evaluate(devset=devset, metric=metric, num_threads=4, display_progress=True)
score = evaluator(optimized)
print(f"Optimized score: {score:.1f}%")

# 6. Save
optimized.save("optimized_qa.json")

The auto parameter

The auto parameter controls how much computation MIPROv2 uses. It sets the number of instruction candidates, demo candidates, and search trials automatically:

# Quick experiment
optimizer = dspy.MIPROv2(metric=metric, auto="light")

# Balanced (recommended starting point)
optimizer = dspy.MIPROv2(metric=metric, auto="medium")

# Maximum quality
optimizer = dspy.MIPROv2(metric=metric, auto="heavy")

Start with "medium". Only move to "heavy" if you have a large trainset (200+), a meaningful metric, and the budget for it. Use "light" for quick sanity checks during development.

What MIPROv2 tunes

MIPROv2 optimizes every dspy.Predict (or dspy.ChainOfThought, etc.) module in your program. For each module, it tunes:

Instructions

MIPROv2 generates candidate instructions by analyzing your training data and the task structure. It proposes multiple phrasings, then searches for the combination that maximizes your metric.

Few-shot demonstrations

MIPROv2 bootstraps demonstrations by running your program on training examples and keeping successful traces (where the metric passes). It then selects which demos to include in each module's prompt.

Joint optimization

Key parameters

optimizer = dspy.MIPROv2(
    metric=metric,          # Required: your metric function
    auto="medium",          # "light", "medium", "heavy" — controls search budget
)

optimized = optimizer.compile(
    my_program,             # Required: the program to optimize
    trainset=trainset,      # Required: list of dspy.Example with .with_inputs()
)

Manual configuration (advanced)

If auto does not give you enough control, you can set parameters directly:

optimizer = dspy.MIPROv2(
    metric=metric,
    auto=None,                      # Disable auto presets for manual control
    num_candidates=10,              # Number of instruction candidates per module
    max_bootstrapped_demos=4,       # Max bootstrapped demos per module
    max_labeled_demos=4,            # Max labeled demos per module
)

optimized = optimizer.compile(
    my_program,
    trainset=trainset,
    num_trials=30,                  # Bayesian optimization trials (passed to compile, not constructor)
)

Computational cost

MIPROv2 makes many LM calls during optimization. The cost depends on:

auto level — "heavy" makes roughly 5-10x more calls than "light"
Number of modules — programs with multiple Predict/ChainOfThought modules cost more
Trainset size — more examples means more bootstrapping runs
Model cost — using GPT-4o costs more per call than GPT-4o-mini

Cost management tips

Develop with "light", ship with "medium" or "heavy" — iterate cheaply, then invest in the final optimization
Use a cheaper model for optimization, then evaluate on the target model — if your production model is expensive, optimize with a cheaper one first to validate the approach
Start with fewer training examples — 50-100 examples is enough for "light" and "medium"; scale up for "heavy"
Set num_threads in your evaluator to parallelize evaluation calls

Typical wall-clock time

| auto level | 50 examples | 200 examples | |-----------|------------|-------------| | "light" | 2-5 min | 5-15 min | | "medium" | 10-20 min | 20-40 min | | "heavy" | 30-60 min | 1-3 hours |

Times vary significantly based on model latency, number of modules, and thread count.

Comparison with other optimizers

When to use what

BootstrapFewShot — first optimization pass, quick iteration, small datasets
MIPROv2 — best prompt optimization, production use, 50+ examples
SIMBA — iterative refinement with support for minibatching; good alternative to MIPROv2
BetterTogether — alternates between prompt optimization and fine-tuning for maximum quality
GEPA — instruction-only tuning with textual feedback, 20-100 examples
BootstrapFinetune — fine-tuning model weights (different category entirely)

Stacking optimizers

A common pattern is to run BootstrapFewShot first, then MIPROv2 on the result. Bootstrap finds good demonstrations quickly, then MIPRO refines the instructions around them:

# Step 1: Quick bootstrap
bootstrap = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
bootstrapped = bootstrap.compile(my_program, trainset=trainset)

# Step 2: Refine with MIPROv2
mipro = dspy.MIPROv2(metric=metric, auto="medium")
final = mipro.compile(bootstrapped, trainset=trainset)

This often beats running either optimizer alone.

Save and load

# Save the optimized program
optimized.save("optimized_program.json")

# Load later
from my_module import MyProgram  # your program class
loaded = MyProgram()
loaded.load("optimized_program.json")

# Use it
result = loaded(question="What is DSPy?")

Optimized prompts are model-specific. If you switch LM providers or models, re-run the optimizer. See /ai-switching-models.

Common patterns

Evaluate before and after

Always measure the baseline before optimizing so you know the improvement:

from dspy.evaluate import Evaluate

evaluator = Evaluate(devset=devset, metric=metric, num_threads=4, display_table=5)

# Baseline
baseline_score = evaluator(my_program)

# Optimize
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized = optimizer.compile(my_program, trainset=trainset)

# Compare
optimized_score = evaluator(optimized)
print(f"Baseline:  {baseline_score:.1f}%")
print(f"Optimized: {optimized_score:.1f}%")
print(f"Delta:     {optimized_score - baseline_score:+.1f}%")

Trace-aware metric for better demos

Use the trace parameter to require stricter quality during optimization. This makes MIPROv2 select higher-quality demonstrations:

def metric(example, prediction, trace=None):
    correct = prediction.answer.strip().lower() == example.answer.strip().lower()
    if trace is not None:
        # During optimization: require reasoning too
        has_reasoning = len(getattr(prediction, "reasoning", "")) > 50
        return correct and has_reasoning
    return correct

Multi-module programs

MIPROv2 optimizes all modules in your program. For a multi-step pipeline, each module gets its own optimized instructions and demos:

class RAGPipeline(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

rag = RAGPipeline()
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
optimized_rag = optimizer.compile(rag, trainset=trainset)

Gotchas

Claude sets auto="heavy" by default for production. The auto parameter defaults to "light", and "medium" is the recommended starting point. Heavy is 5-10x more expensive and only justified with 200+ examples and a well-validated metric. Start with "medium" and upgrade only if the score plateaus.
Claude passes trainset as a positional argument to compile(). The trainset parameter is keyword-only in MIPROv2: optimizer.compile(program, trainset=trainset), not optimizer.compile(program, trainset). Passing it positionally raises a TypeError.
Claude forgets .with_inputs() on training examples. Every dspy.Example in the trainset must call .with_inputs("field1", "field2") to mark which fields are inputs vs labels. Without this, MIPROv2 cannot distinguish inputs from expected outputs and optimization silently underperforms.
Claude sets num_candidates without also setting num_trials. When using manual configuration (no auto), both num_candidates and num_trials must be set. Setting only one produces suboptimal search — more candidates without enough trials to evaluate them is wasted compute.
Claude uses the deprecated requires_permission_to_run parameter. This parameter has been removed from MIPROv2. Passing True raises a ValueError. Remove it entirely from compile() calls.

Additional resources

dspy.MIPROv2 API docs
For API details, see reference.md
For worked examples, see examples.md

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

Watch MIPROv2 optimization progress -- see /ai-watching-optimization
Need to prepare training data? Use /dspy-data
Want to write and run metrics? Use /dspy-evaluate
Starting with a simpler optimizer first? Use /dspy-bootstrap-few-shot
Want random search over few-shot demos? Use /dspy-bootstrap-rs
For the full measure-improve-verify loop, see /ai-improving-accuracy
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

Related Skills

lebsral/ai-watching-optimization

tools

VerifiedTrustedCommunity

See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.

6SKILL.mdUpdated May 31, 2026

lebsral/ai-watching-optimization

lebsral/dspy-langwatch

testing

VerifiedTrustedCommunity

Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.

6SKILL.mdUpdated Apr 27, 2026

lebsral/dspy-langwatch

lebsral/dspy-gepa

data-ai

VerifiedTrustedCommunity

Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.

6SKILL.mdUpdated Apr 27, 2026

lebsral/ai-improving-accuracy

development

VerifiedTrustedCommunity

Measure and improve how well your AI works. Use when AI gives wrong answers, accuracy is bad, responses are unreliable, you need to test AI quality, evaluate your AI, write metrics, benchmark performance, optimize prompts, improve results, or systematically make your AI better. Also used for spent hours tweaking prompts, trial and error prompt engineering is not working, quality plateaued early, stale prompts everywhere in your codebase, my AI is only 60% accurate, how to measure AI quality, AI evaluation framework, benchmark my LLM, prompt optimization not working, systematic way to improve AI, AI accuracy plateaued, DSPy optimizer tutorial, MIPROv2 optimization, how to go from 70% to 90% accuracy.

6SKILL.mdUpdated Apr 27, 2026

lebsral/ai-improving-accuracy

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lebsral/dspy-programming-not-prompting-lms-skills.git

# Copy into Claude Code skills folder (global)
cp -r dspy-programming-not-prompting-lms-skills/skills/dspy-miprov2 ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lebsral/dspy-programming-not-prompting-lms-skills

6 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT