skills/dspy-data/SKILL.md
Use when you need to prepare training/dev data for DSPy optimizers — loading from CSV/JSON/HuggingFace, creating Examples, setting input keys, or building train/dev splits. Common scenarios - loading a CSV of labeled examples for optimization, converting HuggingFace datasets to DSPy format, creating train/dev/test splits, building Examples with proper input keys, converting JSON data for DSPy, or preparing evaluation datasets. Related - ai-generating-data, dspy-evaluate. Also used for dspy.Example, dspy.Dataset, load training data for DSPy, CSV to DSPy examples, HuggingFace dataset in DSPy, prepare data for optimization, input_keys in DSPy, train dev split for DSPy, how to format data for DSPy optimizer, labeled examples format, create examples from JSON, what format does DSPy expect, dataset preparation for DSPy, with_inputs in DSPy Example, build evaluation dataset.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-dataInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through creating, loading, and managing data for DSPy programs. Data is the fuel for DSPy optimizers — getting it right is the difference between a program that works and one that doesn't.
dspy.Example is DSPy's data container. Think of it as a dictionary with one extra feature: you can mark which fields are inputs and which are outputs. This distinction is critical because optimizers need to know what to feed into your program (inputs) and what to compare against (outputs).
import dspy
# An Example holds named fields — like a dict
example = dspy.Example(question="What is DSPy?", answer="A framework for programming LMs")
# Access fields with dot notation or bracket notation
print(example.question) # "What is DSPy?"
print(example["answer"]) # "A framework for programming LMs"
Every DSPy optimizer, evaluator, and metric function expects data as a list of dspy.Example objects.
Create examples with keyword arguments. Every keyword becomes a field.
# Simple question-answer pair
ex = dspy.Example(question="What color is the sky?", answer="Blue")
# Classification example with more fields
ex = dspy.Example(
text="The product broke after one day",
label="negative",
category="quality",
)
# Fields can be any Python type
ex = dspy.Example(
query="hiking trails near Portland",
results=["Forest Park", "Eagle Creek"],
count=2,
)
You can also create an Example from a dictionary:
data = {"question": "What is Python?", "answer": "A programming language"}
ex = dspy.Example(**data)
with_inputs() tells DSPy which fields are inputs (what the program receives) and which are outputs (what the program should produce). This is required for optimizers and evaluation.
# Mark "question" as input — "answer" becomes the expected output
ex = dspy.Example(question="What is DSPy?", answer="A framework").with_inputs("question")
# Multiple input fields
ex = dspy.Example(
context="DSPy is a Python framework...",
question="What is DSPy?",
answer="A framework for programming LMs",
).with_inputs("context", "question")
If you skip with_inputs(), optimizers won't know which fields to pass to your program and which to hold back for scoring. You'll get errors or wrong results. Always call it.
After with_inputs("question"):
example.inputs() returns an Example with only {"question": "What is DSPy?"}example.labels() returns an Example with only {"answer": "A framework"}DSPy uses .inputs() to feed data into your module and .labels() to check the output against expected values.
ex = dspy.Example(question="What is DSPy?", answer="A framework").with_inputs("question")
print(ex.inputs()) # Example(question="What is DSPy?")
print(ex.labels()) # Example(answer="A framework")
When you call a DSPy module, it returns a dspy.Prediction, which extends Example. A Prediction has the same dot-access and dict-like behavior.
classify = dspy.ChainOfThought("text -> label")
result = classify(text="Great product!")
# result is a Prediction
print(result.label) # "positive"
print(result.reasoning) # chain-of-thought reasoning (added by ChainOfThought)
# Predictions work like Examples
print(result.keys()) # dict_keys(['reasoning', 'label'])
In metric functions, prediction is always a Prediction and example is always an Example:
def metric(example, prediction, trace=None):
# example has the gold fields you defined
# prediction has the fields your module produced
return prediction.label == example.label
A dataset in DSPy is just a Python list of Example objects. Use list comprehensions to build them.
# From parallel lists
questions = ["What is Python?", "What is DSPy?", "What is an LM?"]
answers = ["A programming language", "A framework for LMs", "A language model"]
trainset = [
dspy.Example(question=q, answer=a).with_inputs("question")
for q, a in zip(questions, answers)
]
# From a list of dicts
raw_data = [
{"text": "Love it!", "label": "positive"},
{"text": "Terrible.", "label": "negative"},
{"text": "It's okay.", "label": "neutral"},
]
trainset = [
dspy.Example(**row).with_inputs("text")
for row in raw_data
]
import csv
def load_csv_as_examples(filepath, input_fields):
"""Load a CSV file into a list of dspy.Example objects."""
examples = []
with open(filepath, "r") as f:
reader = csv.DictReader(f)
for row in reader:
ex = dspy.Example(**row).with_inputs(*input_fields)
examples.append(ex)
return examples
# Usage
trainset = load_csv_as_examples("tickets.csv", input_fields=["message"])
With pandas (if you prefer):
import pandas as pd
df = pd.read_csv("tickets.csv")
trainset = [
dspy.Example(**row.to_dict()).with_inputs("message")
for _, row in df.iterrows()
]
# Skip rows with missing values
trainset = [
dspy.Example(**row).with_inputs("text")
for row in csv.DictReader(open("data.csv"))
if row["text"] and row["label"] # skip blanks
]
# Rename columns to match your signature
trainset = [
dspy.Example(
text=row["customer_message"],
label=row["assigned_category"],
).with_inputs("text")
for row in csv.DictReader(open("data.csv"))
]
import json
def load_json_as_examples(filepath, input_fields):
"""Load a JSON array file into dspy.Example objects."""
with open(filepath, "r") as f:
data = json.load(f)
return [
dspy.Example(**item).with_inputs(*input_fields)
for item in data
]
# Usage — file contains [{"question": "...", "answer": "..."}, ...]
trainset = load_json_as_examples("qa_pairs.json", input_fields=["question"])
For JSON Lines (one JSON object per line):
def load_jsonl_as_examples(filepath, input_fields):
"""Load a JSONL file into dspy.Example objects."""
examples = []
with open(filepath, "r") as f:
for line in f:
if line.strip():
item = json.loads(line)
examples.append(dspy.Example(**item).with_inputs(*input_fields))
return examples
The HuggingFace datasets library gives you access to thousands of ready-to-use datasets.
pip install datasets
from datasets import load_dataset
# Load a dataset
dataset = load_dataset("hotpotqa", "fullwiki")
# Convert to DSPy Examples
trainset = [
dspy.Example(
question=x["question"],
answer=x["answer"],
).with_inputs("question")
for x in dataset["train"]
]
# Limit the number of examples (large datasets)
trainset = [
dspy.Example(question=x["question"], answer=x["answer"]).with_inputs("question")
for x in list(dataset["train"])[:500]
]
# Rename fields to match your signature
trainset = [
dspy.Example(
text=x["sentence"],
label="positive" if x["label"] == 1 else "negative",
).with_inputs("text")
for x in dataset["train"]
]
# Filter rows
trainset = [
dspy.Example(question=x["question"], answer=x["answer"]).with_inputs("question")
for x in dataset["train"]
if len(x["answer"]) > 0 # skip empty answers
]
DSPy ships with a few standard datasets for prototyping and benchmarking. These return pre-built dspy.Example objects — no HuggingFace dependency needed.
from dspy.datasets import HotPotQA
# Multi-hop question answering
dataset = HotPotQA(train_seed=1, train_size=200, dev_size=50, test_size=0)
trainset = dataset.train
devset = dataset.dev
| Dataset | Import | Task |
|---------|--------|------|
| HotPotQA | from dspy.datasets import HotPotQA | Multi-hop QA over Wikipedia |
| GSM8k | from dspy.datasets import GSM8k | Grade-school math word problems |
| Colors | from dspy.datasets import Colors | Simple color identification |
Constructor parameters (all optional): train_seed, train_size, dev_size, test_size.
Note: You still need .with_inputs() if you're passing these to an optimizer:
trainset = [ex.with_inputs("question") for ex in dataset.train]
Optimizers train on trainset and you evaluate on devset. Keep them separate to measure real performance.
import random
def train_dev_split(examples, train_ratio=0.8, seed=42):
"""Split a list of examples into train and dev sets."""
random.seed(seed)
shuffled = list(examples)
random.shuffle(shuffled)
split_idx = int(len(shuffled) * train_ratio)
return shuffled[:split_idx], shuffled[split_idx:]
# Usage
all_examples = load_csv_as_examples("data.csv", input_fields=["text"])
trainset, devset = train_dev_split(all_examples)
print(f"Train: {len(trainset)}, Dev: {len(devset)}")
Use this when your categories are imbalanced (e.g., 90% "general", 10% "urgent").
from collections import defaultdict
def stratified_split(examples, label_field, train_ratio=0.8, seed=42):
"""Split examples while preserving the distribution of a label field."""
random.seed(seed)
buckets = defaultdict(list)
for ex in examples:
buckets[ex[label_field]].append(ex)
trainset, devset = [], []
for label, items in buckets.items():
random.shuffle(items)
split_idx = int(len(items) * train_ratio)
trainset.extend(items[:split_idx])
devset.extend(items[split_idx:])
random.shuffle(trainset)
random.shuffle(devset)
return trainset, devset
# Usage
trainset, devset = stratified_split(all_examples, label_field="category")
Many HuggingFace datasets come pre-split:
dataset = load_dataset("hotpotqa", "fullwiki")
trainset = [dspy.Example(**x).with_inputs("question") for x in dataset["train"][:500]]
devset = [dspy.Example(**x).with_inputs("question") for x in dataset["validation"][:200]]
ex = dspy.Example(question="What?", answer="That", source="wiki")
# Dot access
ex.question
# Dict-style access
ex["question"]
# Get all field names
ex.keys() # dict_keys(['question', 'answer', 'source'])
# Check if a field exists
"question" in ex # True
# Example to dict
d = dict(ex) # {"question": "What?", "answer": "That", "source": "wiki"}
# Dict to Example
ex = dspy.Example(**d).with_inputs("question")
# Keep only examples where the answer is short
short_answers = [ex for ex in trainset if len(ex.answer.split()) < 20]
# Keep only a specific category
urgent_only = [ex for ex in trainset if ex.category == "urgent"]
# Remove duplicates (by a field)
seen = set()
unique = []
for ex in trainset:
if ex.question not in seen:
seen.add(ex.question)
unique.append(ex)
# Quick summary
print(f"Total examples: {len(trainset)}")
print(f"Fields: {trainset[0].keys()}")
print(f"First example: {trainset[0]}")
# Label distribution
from collections import Counter
labels = Counter(ex.label for ex in trainset)
print(f"Label distribution: {labels}")
with_inputs() on every Example. Without it, optimizers cannot distinguish inputs from expected outputs. Every example passed to an optimizer or evaluator must have with_inputs() called. Claude often creates examples and only calls with_inputs() on the first one or skips it entirely when building lists inline.with_inputs() with output field names. Mark only the fields your module receives as input — not the fields it should produce. If your signature is question -> answer, call .with_inputs("question"), not .with_inputs("question", "answer"). Including output fields means the optimizer has nothing to score against.Literal[list] instead of Literal[tuple(list)] for dynamic categories. When building categories from data (CATEGORIES = list(set(...))), the type annotation must be Literal[tuple(CATEGORIES)], not Literal[CATEGORIES]. The latter silently fails to constrain the output.dspy.Example objects. DSPy optimizers and evaluators require dspy.Example objects, not plain Python dicts. Always convert with dspy.Example(**row).with_inputs(...).Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/dspy-evaluate — evaluate your program on a devset with metrics/ai-generating-data — generate synthetic training data when you have none/ai-improving-accuracy — use optimizers that consume your trainset to boost quality/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.