skills/dspy-simba-optimizer/SKILL.md
This skill should be used when the user asks to "optimize with SIMBA", "use mini-batch introspective optimization", "generate self-reflective rules", mentions "SIMBA optimizer", "stochastic mini-batch ascent", "output variability", or needs an alternative to MIPROv2/GEPA that evolves rules and demonstrations from numeric metrics.
npx skillsauth add omidzamani/dspy-skills dspy-simba-optimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Optimize DSPy programs using stochastic mini-batch sampling, output variability, self-reflective rules, and successful demonstrations.
| Input | Type | Description |
|-------|------|-------------|
| program | dspy.Module | Program to optimize |
| trainset | list[dspy.Example] | Training examples |
| metric | callable | Returns a numeric score |
| max_steps | int | Number of optimization steps |
| bsize | int | Mini-batch size |
| Output | Type | Description |
|--------|------|-------------|
| optimized_program | dspy.Module | SIMBA-optimized program |
SIMBA (Stochastic Introspective Mini-Batch Ascent):
prompt_model for introspectionComparison:
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
# Program to optimize
class QAPipeline(dspy.Module):
def __init__(self):
self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.generate(question=question)
# Metric returns a numeric score
def qa_metric(example, pred, trace=None):
correct = example.answer.lower() in pred.answer.lower()
return 1.0 if correct else 0.0
# SIMBA optimizer
optimizer = dspy.SIMBA(
metric=qa_metric,
max_steps=10, # Optimization iterations
bsize=5 # Mini-batch size
)
program = QAPipeline()
compiled = optimizer.compile(program, trainset=trainset)
compiled.save("qa_simba.json")
Use a graded numeric metric when exact match is too coarse:
import dspy
def detailed_metric(example, pred, trace=None):
"""Return a graded numeric score."""
expected = example.answer.lower()
actual = pred.answer.lower()
if expected == actual:
return 1.0
elif expected in actual:
return 0.7
else:
overlap = len(set(expected.split()) & set(actual.split()))
if overlap > 0:
return 0.3
return 0.0
optimizer = dspy.SIMBA(
metric=detailed_metric,
max_steps=20, # Optimization iterations
bsize=8 # Mini-batch size
)
compiled = optimizer.compile(program, trainset=trainset)
import dspy
from dspy.evaluate import Evaluate
import logging
logger = logging.getLogger(__name__)
# Define tools as functions
def search(query: str) -> str:
"""Search knowledge base for relevant information."""
retriever = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
results = retriever(query, k=3)
return "\n".join([r['text'] for r in results])
def calculate(expr: str) -> str:
"""Evaluate Python expressions safely."""
try:
with dspy.PythonInterpreter() as interp:
return str(interp.execute(expr))
except Exception as e:
return f"Error: {e}"
class ResearchAgent(dspy.Module):
def __init__(self):
self.agent = dspy.ReAct(
"question -> answer",
tools=[search, calculate]
)
def forward(self, question):
return self.agent(question=question)
def agent_metric(example, pred, trace=None):
"""Numeric metric for agent optimization."""
expected = example.answer.lower().strip()
actual = pred.answer.lower().strip() if pred.answer else ""
# Exact match
if expected == actual:
return 1.0
# Partial match
if expected in actual:
return 0.7
# Check key terms
expected_terms = set(expected.split())
actual_terms = set(actual.split())
overlap = len(expected_terms & actual_terms)
if overlap >= len(expected_terms) * 0.5:
return 0.5
return 0.0
def optimize_agent(trainset, devset):
"""Full SIMBA optimization pipeline."""
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
agent = ResearchAgent()
# Baseline evaluation
evaluator = dspy.Evaluate(devset=devset, metric=agent_metric, num_threads=4)
baseline = evaluator(agent)
logger.info(f"Baseline: {baseline:.2%}")
# SIMBA optimization
optimizer = dspy.SIMBA(
metric=agent_metric,
max_steps=25, # Optimization iterations
bsize=6 # Mini-batch size
)
compiled = optimizer.compile(agent, trainset=trainset)
# Evaluate optimized
optimized = evaluator(compiled)
logger.info(f"SIMBA optimized: {optimized:.2%}")
compiled.save("research_agent_simba.json")
return compiled
optimizer = dspy.SIMBA(
metric=metric_fn,
max_steps=20, # Optimization iterations
bsize=32, # Mini-batch size (default: 32)
num_candidates=6, # Candidates per iteration (default: 6)
max_demos=4, # Max demos per predictor (default: 4)
temperature_for_sampling=0.2, # Sampling temperature (default: 0.2)
temperature_for_candidates=0.2 # Candidate selection temperature (default: 0.2)
)
bsize (default 32) and max_steps (default 8) based on dataset sizedata-ai
This skill should be used when the user asks to "create a DSPy signature", "define inputs and outputs", "design a signature", "use InputField or OutputField", "add type hints to DSPy", mentions "signature class", "type-safe DSPy", "Pydantic models in DSPy", or needs to define what a DSPy module should do with structured inputs and outputs.
development
This skill should be used when the user asks to "use DSPy RLM", "process a very long context", "use ProgramOfThought", "use CodeAct", "run DSPy modules in parallel", mentions Recursive Language Models, sandboxed Python execution, Deno, `dspy.RLM`, `dspy.ProgramOfThought`, `dspy.CodeAct`, or `dspy.Parallel`, or needs to choose a DSPy reasoning module beyond Predict, ChainOfThought, and ReAct.
tools
This skill should be used when the user asks to "create a ReAct agent", "build an agent with tools", "implement tool-calling agent", "use dspy.ReAct", mentions "agent with tools", "reasoning and acting", "multi-step agent", "agent optimization with GEPA", or needs to build production agents that use tools to solve complex tasks.
development
This skill should be used when the user asks to "build a RAG pipeline", "create retrieval augmented generation", "use ColBERTv2 in DSPy", "set up a retriever in DSPy", mentions "RAG with DSPy", "context retrieval", "multi-hop RAG", or needs to build a DSPy system that retrieves external knowledge to answer questions with grounded, factual responses.