Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lebsral/dspy-labeled-few-shot

Name: dspy-labeled-few-shot
Author: lebsral

skills/dspy-labeled-few-shot/SKILL.md

npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-labeled-few-shot

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Hand-Picked Demonstrations with dspy.LabeledFewShot

Guide the user through using dspy.LabeledFewShot -- the simplest DSPy optimizer. It takes labeled examples you provide and attaches them as few-shot demonstrations to your program's predictors. No bootstrapping, no metric, no LM calls during optimization.

What is LabeledFewShot

dspy.LabeledFewShot is an optimizer that takes a set of labeled training examples and injects them directly as few-shot demonstrations into every predictor in your DSPy program.

No metric required -- unlike other optimizers, it does not evaluate or filter examples
No LM calls during compilation -- it just copies your examples into the prompt
Deterministic -- uses a fixed random seed (0) for reproducible example selection
Fast -- compilation is instant because there is no search or bootstrapping step

Under the hood, compile() creates a copy of your program, iterates over each predictor, and assigns up to k examples from your training set as that predictor's demos.

When to use LabeledFewShot

| Use LabeledFewShot when... | Use something else when... | |---|---| | You have hand-curated, high-quality examples | You want the optimizer to discover good examples (BootstrapFewShot) | | You want a quick baseline before trying fancier optimizers | You need instruction tuning too (MIPROv2) | | You need full control over which demonstrations the LM sees | You have enough data to let DSPy search (BootstrapFewShotWithRandomSearch) | | Your task is simple enough that a few good examples suffice | Quality requires filtering examples by a metric | | You want deterministic, reproducible behavior | You want the optimizer to explore different combinations |

Rule of thumb: Use LabeledFewShot as your first optimization step. If accuracy is not high enough, upgrade to BootstrapFewShot which evaluates examples against a metric and keeps only the ones that work.

API reference

Constructor

dspy.LabeledFewShot(k=16)

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | k | int | 16 | Maximum number of demonstration examples to include per predictor |

compile()

optimizer.compile(student, *, trainset, sample=True)

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | student | dspy.Module | required | The DSPy program to optimize | | trainset | list[dspy.Example] | required | Labeled examples to use as demonstrations | | sample | bool | True | True = randomly sample k examples; False = take the first k sequentially |

Returns: A copy of student with demonstrations attached to each predictor.

If trainset is empty, the student is returned unmodified.

Basic usage

import dspy
from typing import Literal

# Configure any LM provider
lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

# 1. Define your signature
class ClassifyIntent(dspy.Signature):
    """Classify the user message into an intent category."""
    message: str = dspy.InputField(desc="User message")
    intent: Literal["question", "complaint", "praise", "request"] = dspy.OutputField()

# 2. Build your program
classify = dspy.Predict(ClassifyIntent)

# 3. Create hand-picked training examples
trainset = [
    dspy.Example(message="How do I reset my password?", intent="question").with_inputs("message"),
    dspy.Example(message="This is broken and I want a refund", intent="complaint").with_inputs("message"),
    dspy.Example(message="Your team was incredibly helpful!", intent="praise").with_inputs("message"),
    dspy.Example(message="Please update my billing address", intent="request").with_inputs("message"),
    dspy.Example(message="What formats do you export to?", intent="question").with_inputs("message"),
    dspy.Example(message="The app crashes every time I open it", intent="complaint").with_inputs("message"),
]

# 4. Compile with LabeledFewShot
optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(classify, trainset=trainset)

# 5. Use the optimized program -- it now includes few-shot demos in every call
result = optimized(message="Can you send me last month's invoice?")
print(result.intent)  # request

How example selection works

When sample=True (the default):

DSPy randomly selects k examples from trainset using a fixed seed (0)
Every predictor in your program gets the same set of demos
The selection is reproducible across runs because of the fixed seed

When sample=False:

DSPy takes the first k examples from trainset in order
Use this when the order of your examples matters or you want exact control

If your trainset has fewer than k examples, all examples are used.

Choosing k

The k parameter controls how many demonstrations appear in the prompt.

Smaller k (2-4): Lower token cost, faster inference. Good when your examples are diverse and high-quality.
Larger k (8-16): More context for the LM. Good when the task has many edge cases or subtle distinctions.
Default (16): A reasonable starting point. Reduce if you hit token limits or want faster responses.

Keep in mind that each demonstration adds tokens to every LM call. For long input/output fields, use a smaller k to stay within context limits.

Using sample=False for ordered examples

When you want precise control over which examples appear, disable sampling:

# Place your best, most representative examples first
trainset = [
    dspy.Example(message="What's your return policy?", intent="question").with_inputs("message"),
    dspy.Example(message="This product is defective", intent="complaint").with_inputs("message"),
    dspy.Example(message="Love the new feature!", intent="praise").with_inputs("message"),
    dspy.Example(message="Please cancel my subscription", intent="request").with_inputs("message"),
    # ... more examples, ordered by importance
]

optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(classify, trainset=trainset, sample=False)
# The first 4 examples are used as demos, in order

Saving and loading an optimized program

After compilation, save the optimized program so you can reuse it without recompiling:

# Save
optimized.save("intent_classifier.json")

# Load later
loaded = dspy.Predict(ClassifyIntent)
loaded.load("intent_classifier.json")
result = loaded(message="How do I upgrade my plan?")

Multi-predictor programs

LabeledFewShot attaches demos to every predictor in your program. This works with multi-step pipelines too:

class SupportRouter(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(ClassifyIntent)
        self.respond = dspy.ChainOfThought("message, intent -> response")

    def forward(self, message):
        intent = self.classify(message=message).intent
        return self.respond(message=message, intent=intent)

router = SupportRouter()

# Both self.classify and self.respond get demos from the same trainset
optimizer = dspy.LabeledFewShot(k=3)
optimized_router = optimizer.compile(router, trainset=trainset)

Note: every predictor receives demos from the same trainset. If your predictors have different signatures, make sure your training examples include all fields needed across all predictors, or consider compiling predictors separately.

When to upgrade to BootstrapFewShot

LabeledFewShot is a great starting point, but it has limitations:

No quality filtering -- it uses your examples as-is, even if some are misleading or ambiguous
No metric evaluation -- it cannot tell which examples actually help the LM perform better
Same demos for all predictors -- it does not tailor demonstrations per predictor

dspy.BootstrapFewShot addresses all three. It runs your program on each training example, evaluates with a metric, and keeps only the demonstrations that led to correct outputs. The upgrade is straightforward:

# Before: LabeledFewShot (no metric needed)
optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(program, trainset=trainset)

# After: BootstrapFewShot (needs a metric)
def metric(example, prediction, trace=None):
    return prediction.intent == example.intent

optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(program, trainset=trainset)

Gotchas

Claude forgets .with_inputs() on training examples. Without .with_inputs("field_name"), DSPy does not know which fields are inputs vs labels. The demonstrations appear malformed in the prompt — the LM sees all fields as input, which confuses it. Always call .with_inputs() on every dspy.Example in your trainset.
Claude uses LabeledFewShot when the user needs metric-driven selection. LabeledFewShot uses examples as-is with no quality filtering. If the user mentions "accuracy is low" or "some examples are noisy," recommend BootstrapFewShot instead — it evaluates examples against a metric and keeps only the ones that help.
Claude sets k larger than the trainset without explaining the behavior. When k exceeds len(trainset), DSPy silently uses all available examples. This is fine, but Claude should tell the user: "You have 5 examples and k=16, so all 5 will be used as demos."
Claude creates separate trainsets for multi-predictor programs. LabeledFewShot assigns the same demos to every predictor. If predictors have different signatures, the examples need all fields across all signatures, or the user should compile predictors separately. Claude sometimes splits the trainset incorrectly — explain the shared-demo behavior.

Additional resources

dspy.LabeledFewShot API docs
For API details, see reference.md
For worked examples, see examples.md

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

Creating training examples (dspy.Example, with_inputs, datasets) -- see /dspy-data
Defining signatures (inline and class-based, typed fields) -- see /dspy-signatures
BootstrapFewShot for metric-driven demo selection -- see /ai-improving-accuracy
Evaluating your program to measure if LabeledFewShot is enough -- see /dspy-evaluate
Building modules with multiple predictors -- see /dspy-modules
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

lebsral/dspy-labeled-few-shot

skills/dspy-labeled-few-shot/SKILL.md

Use when you have hand-picked high-quality examples and want to use them directly as few-shot demonstrations — no bootstrapping, just your curated demos. Common scenarios - you have expert-curated examples that you trust more than bootstrapped ones, hand-picked demonstrations for high-stakes tasks, using existing labeled data directly without bootstrapping, or when you want full control over which examples appear in the prompt. Related - dspy-bootstrap-few-shot, dspy-knn-few-shot, ai-generating-data. Also used for dspy.LabeledFewShot, hand-picked examples in prompt, curated demonstrations, use my own examples directly, manual few-shot setup, expert-labeled demonstrations, no bootstrapping just my examples, static few-shot with labeled data, gold standard examples, when you trust your examples more than auto-generated ones, controlled few-shot demos, fixed example set in prompt.

5 stars

development

Updated May 7, 2026

$ install --global

skillsauth

npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-labeled-few-shot

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 7, 2026, 6:58 AM118.1s4 files scanned

SKILL.md

name:: dspy-labeled-few-shot
description:: Use when you have hand-picked high-quality examples and want to use them directly as few-shot demonstrations — no bootstrapping, just your curated demos. Common scenarios - you have expert-curated examples that you trust more than bootstrapped ones, hand-picked demonstrations for high-stakes tasks, using existing labeled data directly without bootstrapping, or when you want full control over which examples appear in the prompt. Related - dspy-bootstrap-few-shot, dspy-knn-few-shot, ai-generating-data. Also used for dspy.LabeledFewShot, hand-picked examples in prompt, curated demonstrations, use my own examples directly, manual few-shot setup, expert-labeled demonstrations, no bootstrapping just my examples, static few-shot with labeled data, gold standard examples, when you trust your examples more than auto-generated ones, controlled few-shot demos, fixed example set in prompt.

Hand-Picked Demonstrations with dspy.LabeledFewShot

What is LabeledFewShot

dspy.LabeledFewShot is an optimizer that takes a set of labeled training examples and injects them directly as few-shot demonstrations into every predictor in your DSPy program.

No metric required -- unlike other optimizers, it does not evaluate or filter examples
No LM calls during compilation -- it just copies your examples into the prompt
Deterministic -- uses a fixed random seed (0) for reproducible example selection
Fast -- compilation is instant because there is no search or bootstrapping step

Under the hood, compile() creates a copy of your program, iterates over each predictor, and assigns up to k examples from your training set as that predictor's demos.

When to use LabeledFewShot

API reference

Constructor

dspy.LabeledFewShot(k=16)

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | k | int | 16 | Maximum number of demonstration examples to include per predictor |

compile()

optimizer.compile(student, *, trainset, sample=True)

Returns: A copy of student with demonstrations attached to each predictor.

If trainset is empty, the student is returned unmodified.

Basic usage

import dspy
from typing import Literal

# Configure any LM provider
lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

# 1. Define your signature
class ClassifyIntent(dspy.Signature):
    """Classify the user message into an intent category."""
    message: str = dspy.InputField(desc="User message")
    intent: Literal["question", "complaint", "praise", "request"] = dspy.OutputField()

# 2. Build your program
classify = dspy.Predict(ClassifyIntent)

# 3. Create hand-picked training examples
trainset = [
    dspy.Example(message="How do I reset my password?", intent="question").with_inputs("message"),
    dspy.Example(message="This is broken and I want a refund", intent="complaint").with_inputs("message"),
    dspy.Example(message="Your team was incredibly helpful!", intent="praise").with_inputs("message"),
    dspy.Example(message="Please update my billing address", intent="request").with_inputs("message"),
    dspy.Example(message="What formats do you export to?", intent="question").with_inputs("message"),
    dspy.Example(message="The app crashes every time I open it", intent="complaint").with_inputs("message"),
]

# 4. Compile with LabeledFewShot
optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(classify, trainset=trainset)

# 5. Use the optimized program -- it now includes few-shot demos in every call
result = optimized(message="Can you send me last month's invoice?")
print(result.intent)  # request

How example selection works

When sample=True (the default):

DSPy randomly selects k examples from trainset using a fixed seed (0)
Every predictor in your program gets the same set of demos
The selection is reproducible across runs because of the fixed seed

When sample=False:

DSPy takes the first k examples from trainset in order
Use this when the order of your examples matters or you want exact control

If your trainset has fewer than k examples, all examples are used.

Choosing k

The k parameter controls how many demonstrations appear in the prompt.

Smaller k (2-4): Lower token cost, faster inference. Good when your examples are diverse and high-quality.
Larger k (8-16): More context for the LM. Good when the task has many edge cases or subtle distinctions.
Default (16): A reasonable starting point. Reduce if you hit token limits or want faster responses.

Keep in mind that each demonstration adds tokens to every LM call. For long input/output fields, use a smaller k to stay within context limits.

Using sample=False for ordered examples

When you want precise control over which examples appear, disable sampling:

# Place your best, most representative examples first
trainset = [
    dspy.Example(message="What's your return policy?", intent="question").with_inputs("message"),
    dspy.Example(message="This product is defective", intent="complaint").with_inputs("message"),
    dspy.Example(message="Love the new feature!", intent="praise").with_inputs("message"),
    dspy.Example(message="Please cancel my subscription", intent="request").with_inputs("message"),
    # ... more examples, ordered by importance
]

optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(classify, trainset=trainset, sample=False)
# The first 4 examples are used as demos, in order

Saving and loading an optimized program

After compilation, save the optimized program so you can reuse it without recompiling:

# Save
optimized.save("intent_classifier.json")

# Load later
loaded = dspy.Predict(ClassifyIntent)
loaded.load("intent_classifier.json")
result = loaded(message="How do I upgrade my plan?")

Multi-predictor programs

LabeledFewShot attaches demos to every predictor in your program. This works with multi-step pipelines too:

class SupportRouter(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(ClassifyIntent)
        self.respond = dspy.ChainOfThought("message, intent -> response")

    def forward(self, message):
        intent = self.classify(message=message).intent
        return self.respond(message=message, intent=intent)

router = SupportRouter()

# Both self.classify and self.respond get demos from the same trainset
optimizer = dspy.LabeledFewShot(k=3)
optimized_router = optimizer.compile(router, trainset=trainset)

When to upgrade to BootstrapFewShot

LabeledFewShot is a great starting point, but it has limitations:

No quality filtering -- it uses your examples as-is, even if some are misleading or ambiguous
No metric evaluation -- it cannot tell which examples actually help the LM perform better
Same demos for all predictors -- it does not tailor demonstrations per predictor

# Before: LabeledFewShot (no metric needed)
optimizer = dspy.LabeledFewShot(k=4)
optimized = optimizer.compile(program, trainset=trainset)

# After: BootstrapFewShot (needs a metric)
def metric(example, prediction, trace=None):
    return prediction.intent == example.intent

optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(program, trainset=trainset)

Gotchas

Claude forgets .with_inputs() on training examples. Without .with_inputs("field_name"), DSPy does not know which fields are inputs vs labels. The demonstrations appear malformed in the prompt — the LM sees all fields as input, which confuses it. Always call .with_inputs() on every dspy.Example in your trainset.
Claude uses LabeledFewShot when the user needs metric-driven selection. LabeledFewShot uses examples as-is with no quality filtering. If the user mentions "accuracy is low" or "some examples are noisy," recommend BootstrapFewShot instead — it evaluates examples against a metric and keeps only the ones that help.
Claude sets k larger than the trainset without explaining the behavior. When k exceeds len(trainset), DSPy silently uses all available examples. This is fine, but Claude should tell the user: "You have 5 examples and k=16, so all 5 will be used as demos."
Claude creates separate trainsets for multi-predictor programs. LabeledFewShot assigns the same demos to every predictor. If predictors have different signatures, the examples need all fields across all signatures, or the user should compile predictors separately. Claude sometimes splits the trainset incorrectly — explain the shared-demo behavior.

Additional resources

dspy.LabeledFewShot API docs
For API details, see reference.md
For worked examples, see examples.md

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

Creating training examples (dspy.Example, with_inputs, datasets) -- see /dspy-data
Defining signatures (inline and class-based, typed fields) -- see /dspy-signatures
BootstrapFewShot for metric-driven demo selection -- see /ai-improving-accuracy
Evaluating your program to measure if LabeledFewShot is enough -- see /dspy-evaluate
Building modules with multiple predictors -- see /dspy-modules
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

Related Skills

lebsral/ai-watching-optimization

tools

VerifiedTrustedCommunity

See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.

6SKILL.mdUpdated May 31, 2026

lebsral/ai-watching-optimization

lebsral/dspy-miprov2

testing

VerifiedTrustedCommunity

Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.

6SKILL.mdUpdated Apr 27, 2026

lebsral/dspy-langwatch

testing

VerifiedTrustedCommunity

Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.

6SKILL.mdUpdated Apr 27, 2026

lebsral/dspy-langwatch

lebsral/dspy-gepa

data-ai

VerifiedTrustedCommunity

Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.

6SKILL.mdUpdated Apr 27, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lebsral/dspy-programming-not-prompting-lms-skills.git

# Copy into Claude Code skills folder (global)
cp -r dspy-programming-not-prompting-lms-skills/skills/dspy-labeled-few-shot ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lebsral/dspy-programming-not-prompting-lms-skills

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT