skills/dspy-finetune-bootstrap/SKILL.md
This skill should be used when the user asks to "fine-tune a DSPy model", "distill a program into weights", "use BootstrapFinetune", "create a student model", "reduce inference costs with fine-tuning", mentions "model distillation", "teacher-student training", or wants to deploy a DSPy program as fine-tuned weights for production efficiency.
npx skillsauth add omidzamani/dspy-skills dspy-finetune-bootstrapInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Distill a DSPy program into fine-tuned model weights for efficient production deployment.
| Input | Type | Description |
|-------|------|-------------|
| program | dspy.Module | Teacher program to distill |
| trainset | list[dspy.Example] | Training examples |
| metric | callable | Validation metric (optional) |
| train_kwargs | dict | Training hyperparameters |
| Output | Type | Description |
|--------|------|-------------|
| finetuned_program | dspy.Module | Program with fine-tuned weights |
| model_path | str | Path to saved model |
import dspy
# Configure with strong teacher model
dspy.configure(lm=dspy.LM("openai/gpt-4o"))
class TeacherQA(dspy.Module):
def __init__(self):
self.cot = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.cot(question=question)
Assign the LM directly to predictors before fine-tuning:
import dspy
from dspy.teleprompt import BootstrapFinetune
optimizer = BootstrapFinetune(
metric=lambda gold, pred, trace=None: gold.answer.lower() in pred.answer.lower(),
train_kwargs={
'learning_rate': 5e-5,
'num_train_epochs': 3,
'per_device_train_batch_size': 4,
'warmup_ratio': 0.1
}
)
teacher = TeacherQA()
teacher.set_lm(dspy.settings.lm)
finetuned = optimizer.compile(teacher, trainset=trainset)
# Save the fine-tuned model (saves state-only by default)
finetuned.save("finetuned_qa_model.json")
# Load and use (must recreate architecture first)
loaded = TeacherQA()
loaded.load("finetuned_qa_model.json")
result = loaded(question="What is machine learning?")
import dspy
from dspy.teleprompt import BootstrapFinetune
from dspy.evaluate import Evaluate
import logging
import os
logger = logging.getLogger(__name__)
class ClassificationSignature(dspy.Signature):
"""Classify text into categories."""
text: str = dspy.InputField()
label: str = dspy.OutputField(desc="Category: positive, negative, neutral")
class TextClassifier(dspy.Module):
def __init__(self):
self.classify = dspy.Predict(ClassificationSignature)
def forward(self, text):
return self.classify(text=text)
def classification_metric(gold, pred, trace=None):
"""Exact label match."""
gold_label = gold.label.lower().strip()
pred_label = pred.label.lower().strip() if pred.label else ""
return gold_label == pred_label
def finetune_classifier(trainset, devset, output_dir="./finetuned_model"):
"""Full fine-tuning pipeline."""
# Configure teacher (strong model)
dspy.configure(lm=dspy.LM("openai/gpt-4o"))
teacher = TextClassifier()
teacher.set_lm(dspy.settings.lm)
# Evaluate teacher
evaluator = Evaluate(devset=devset, metric=classification_metric, num_threads=8)
teacher_score = evaluator(teacher)
logger.info(f"Teacher score: {teacher_score:.2%}")
# Fine-tune (train_kwargs passed to constructor)
optimizer = BootstrapFinetune(
metric=classification_metric,
train_kwargs={
'learning_rate': 2e-5,
'num_train_epochs': 3,
'per_device_train_batch_size': 8,
'gradient_accumulation_steps': 2,
'warmup_ratio': 0.1,
'weight_decay': 0.01,
'logging_steps': 10,
'save_strategy': 'epoch',
'output_dir': output_dir
}
)
finetuned = optimizer.compile(
teacher,
trainset=trainset
)
# Evaluate fine-tuned model
student_score = evaluator(finetuned)
logger.info(f"Student score: {student_score:.2%}")
# Save (state-only as JSON)
finetuned.save(os.path.join(output_dir, "final_model.json"))
return {
"teacher_score": teacher_score,
"student_score": student_score,
"model_path": os.path.join(output_dir, "final_model.json")
}
# For RAG fine-tuning
class RAGClassifier(dspy.Module):
"""RAG pipeline that can be fine-tuned."""
def __init__(self, num_passages=3):
self.retrieve = dspy.Retrieve(k=num_passages)
self.classify = dspy.ChainOfThought("context, text -> label")
def forward(self, text):
context = self.retrieve(text).passages
return self.classify(context=context, text=text)
def finetune_rag_classifier(trainset, devset):
"""Fine-tune a RAG-based classifier."""
# Configure retriever and LM
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(
lm=dspy.LM("openai/gpt-4o"),
rm=colbert
)
rag = RAGClassifier()
rag.set_lm(dspy.settings.lm)
# Fine-tune (train_kwargs in constructor)
optimizer = BootstrapFinetune(
metric=classification_metric,
train_kwargs={
'learning_rate': 1e-5,
'num_train_epochs': 5
}
)
finetuned = optimizer.compile(
rag,
trainset=trainset
)
return finetuned
| Argument | Description | Typical Value |
|----------|-------------|---------------|
| learning_rate | Learning rate | 1e-5 to 5e-5 |
| num_train_epochs | Training epochs | 3-5 |
| per_device_train_batch_size | Batch size | 4-16 |
| gradient_accumulation_steps | Gradient accumulation | 2-8 |
| warmup_ratio | Warmup proportion | 0.1 |
| weight_decay | L2 regularization | 0.01 |
| max_grad_norm | Gradient clipping | 1.0 |
tools
This skill should be used when the user asks to "optimize with SIMBA", "use mini-batch introspective optimization", "generate self-reflective rules", mentions "SIMBA optimizer", "stochastic mini-batch ascent", "output variability", or needs an alternative to MIPROv2/GEPA that evolves rules and demonstrations from numeric metrics.
data-ai
This skill should be used when the user asks to "create a DSPy signature", "define inputs and outputs", "design a signature", "use InputField or OutputField", "add type hints to DSPy", mentions "signature class", "type-safe DSPy", "Pydantic models in DSPy", or needs to define what a DSPy module should do with structured inputs and outputs.
development
This skill should be used when the user asks to "use DSPy RLM", "process a very long context", "use ProgramOfThought", "use CodeAct", "run DSPy modules in parallel", mentions Recursive Language Models, sandboxed Python execution, Deno, `dspy.RLM`, `dspy.ProgramOfThought`, `dspy.CodeAct`, or `dspy.Parallel`, or needs to choose a DSPy reasoning module beyond Predict, ChainOfThought, and ReAct.
tools
This skill should be used when the user asks to "create a ReAct agent", "build an agent with tools", "implement tool-calling agent", "use dspy.ReAct", mentions "agent with tools", "reasoning and acting", "multi-step agent", "agent optimization with GEPA", or needs to build production agents that use tools to solve complex tasks.