skills/dspy-labeled-few-shot/SKILL.md
Use when you have hand-picked high-quality examples and want to use them directly as few-shot demonstrations — no bootstrapping, just your curated demos. Common scenarios - you have expert-curated examples that you trust more than bootstrapped ones, hand-picked demonstrations for high-stakes tasks, using existing labeled data directly without bootstrapping, or when you want full control over which examples appear in the prompt. Related - dspy-bootstrap-few-shot, dspy-knn-few-shot, ai-generating-data. Also used for dspy.LabeledFewShot, hand-picked examples in prompt, curated demonstrations, use my own examples directly, manual few-shot setup, expert-labeled demonstrations, no bootstrapping just my examples, static few-shot with labeled data, gold standard examples, when you trust your examples more than auto-generated ones, controlled few-shot demos, fixed example set in prompt.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-labeled-few-shotInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through using dspy.LabeledFewShot -- the simplest DSPy optimizer. It takes labeled examples you provide and attaches them as few-shot demonstrations to your program's predictors. No bootstrapping, no metric, no LM calls during optimization.
dspy.LabeledFewShot is an optimizer that takes a set of labeled training examples and injects them directly as few-shot demonstrations into every predictor in your DSPy program.
Under the hood, compile() creates a copy of your program, iterates over each predictor, and assigns up to k examples from your training set as that predictor's demos.
| Use LabeledFewShot when... | Use something else when... |
|---|---|
| You have hand-curated, high-quality examples | You want the optimizer to discover good examples (BootstrapFewShot) |
| You want a quick baseline before trying fancier optimizers | You need instruction tuning too (MIPROv2) |
| You need full control over which demonstrations the LM sees | You have enough data to let DSPy search (BootstrapFewShotWithRandomSearch) |
| Your task is simple enough that a few good examples suffice | Quality requires filtering examples by a metric |
| You want deterministic, reproducible behavior | You want the optimizer to explore different combinations |
Rule of thumb: Use LabeledFewShot as your first optimization step. If accuracy is not high enough, upgrade to BootstrapFewShot which evaluates examples against a metric and keeps only the ones that work.
dspy.LabeledFewShot(k=16)
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| k | int | 16 | Maximum number of demonstration examples to include per predictor |
optimizer.compile(student, *, trainset, sample=True)
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| student | dspy.Module | required | The DSPy program to optimize |
| trainset | list[dspy.Example] | required | Labeled examples to use as demonstrations |
| sample | bool | True | True = randomly sample k examples; False = take the first k sequentially |
Returns: A copy of student with demonstrations attached to each predictor.
If trainset is empty, the student is returned unmodified.
import dspy
from typing import Literal
# Configure any LM provider
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)
# 1. Define your signature
class ClassifyIntent(dspy.Signature):
"""Classify the user message into an intent category."""
message: str = dspy.InputField(desc="User message")
intent: Literal["question", "complaint", "praise", "request"] = dspy.OutputField()
# 2. Build your program
classify = dspy.Predict(ClassifyIntent)
# 3. Create hand-picked training examples
trainset = [
dspy.Example(message="How do I reset my password?", intent="question").with_inputs("message"),
dspy.Example(message="This is broken and I want a refund", intent="complaint").with_inputs("message"),
dspy.Example(message="Your team was incredibly helpful!", intent="praise").with_inputs("message"),
dspy.Example(message="Please update my billing address", intent="request").with_inputs("message"),
dspy.Example(message="What formats do you export to?", intent="question").with_inputs("message"),
dspy.Example(message="The app crashes every time I open it", intent="complaint").with_inputs("message"),
]
# 4. Compile with LabeledFewShot
optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(classify, trainset=trainset)
# 5. Use the optimized program -- it now includes few-shot demos in every call
result = optimized(message="Can you send me last month's invoice?")
print(result.intent) # request
When sample=True (the default):
k examples from trainset using a fixed seed (0)When sample=False:
k examples from trainset in orderIf your trainset has fewer than k examples, all examples are used.
The k parameter controls how many demonstrations appear in the prompt.
Keep in mind that each demonstration adds tokens to every LM call. For long input/output fields, use a smaller k to stay within context limits.
When you want precise control over which examples appear, disable sampling:
# Place your best, most representative examples first
trainset = [
dspy.Example(message="What's your return policy?", intent="question").with_inputs("message"),
dspy.Example(message="This product is defective", intent="complaint").with_inputs("message"),
dspy.Example(message="Love the new feature!", intent="praise").with_inputs("message"),
dspy.Example(message="Please cancel my subscription", intent="request").with_inputs("message"),
# ... more examples, ordered by importance
]
optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(classify, trainset=trainset, sample=False)
# The first 4 examples are used as demos, in order
After compilation, save the optimized program so you can reuse it without recompiling:
# Save
optimized.save("intent_classifier.json")
# Load later
loaded = dspy.Predict(ClassifyIntent)
loaded.load("intent_classifier.json")
result = loaded(message="How do I upgrade my plan?")
LabeledFewShot attaches demos to every predictor in your program. This works with multi-step pipelines too:
class SupportRouter(dspy.Module):
def __init__(self):
self.classify = dspy.Predict(ClassifyIntent)
self.respond = dspy.ChainOfThought("message, intent -> response")
def forward(self, message):
intent = self.classify(message=message).intent
return self.respond(message=message, intent=intent)
router = SupportRouter()
# Both self.classify and self.respond get demos from the same trainset
optimizer = dspy.LabeledFewShot(k=3)
optimized_router = optimizer.compile(router, trainset=trainset)
Note: every predictor receives demos from the same trainset. If your predictors have different signatures, make sure your training examples include all fields needed across all predictors, or consider compiling predictors separately.
LabeledFewShot is a great starting point, but it has limitations:
dspy.BootstrapFewShot addresses all three. It runs your program on each training example, evaluates with a metric, and keeps only the demonstrations that led to correct outputs. The upgrade is straightforward:
# Before: LabeledFewShot (no metric needed)
optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(program, trainset=trainset)
# After: BootstrapFewShot (needs a metric)
def metric(example, prediction, trace=None):
return prediction.intent == example.intent
optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(program, trainset=trainset)
.with_inputs() on training examples. Without .with_inputs("field_name"), DSPy does not know which fields are inputs vs labels. The demonstrations appear malformed in the prompt — the LM sees all fields as input, which confuses it. Always call .with_inputs() on every dspy.Example in your trainset.BootstrapFewShot instead — it evaluates examples against a metric and keeps only the ones that help.k larger than the trainset without explaining the behavior. When k exceeds len(trainset), DSPy silently uses all available examples. This is fine, but Claude should tell the user: "You have 5 examples and k=16, so all 5 will be used as demos."LabeledFewShot assigns the same demos to every predictor. If predictors have different signatures, the examples need all fields across all signatures, or the user should compile predictors separately. Claude sometimes splits the trainset incorrectly — explain the shared-demo behavior.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/dspy-data/dspy-signatures/ai-improving-accuracy/dspy-evaluate/dspy-modules/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.