Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

omidzamani/dspy-gepa-reflective

Name: dspy-gepa-reflective
Author: omidzamani

skills/dspy-gepa-reflective/SKILL.md

npx skillsauth add omidzamani/dspy-skills dspy-gepa-reflective

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

DSPy GEPA Optimizer

Goal

Optimize complex agentic systems using LLM reflection on full execution traces with Pareto-based evolutionary search.

When to Use

Agentic systems with tool use
When you have rich textual feedback on failures
Complex multi-step workflows
Instruction-only optimization needed

Related Skills

For non-agentic programs: dspy-miprov2-optimizer, dspy-bootstrap-fewshot
Measure improvements: dspy-evaluation-suite

Inputs

| Input | Type | Description | |-------|------|-------------| | program | dspy.Module | Agent or complex program | | trainset | list[dspy.Example] | Training examples | | metric | callable | Accepts five arguments and returns dspy.Prediction(score=..., feedback=...) | | reflection_lm | dspy.LM | Strong LM for reflection (GPT-4) | | auto | str | "light", "medium", "heavy" |

Outputs

| Output | Type | Description | |--------|------|-------------| | compiled_program | dspy.Module | Reflectively optimized program |

Workflow

Phase 1: Define Feedback Metric

GEPA requires metrics that return textual feedback:

def gepa_metric(example, pred, trace=None, pred_name=None, pred_trace=None):
    """Return score and actionable feedback for GEPA reflection."""
    is_correct = example.answer.lower() in pred.answer.lower()
    
    if is_correct:
        feedback = "Correct. The answer accurately addresses the question."
    else:
        feedback = f"Incorrect. Expected '{example.answer}' but got '{pred.answer}'. The model may have misunderstood the question or retrieved irrelevant information."
    
    return dspy.Prediction(score=float(is_correct), feedback=feedback)

Phase 2: Setup Agent

import dspy

def search(query: str) -> list[str]:
    """Search knowledge base for relevant information."""
    rm = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
    results = rm(query, k=3)
    return results if isinstance(results, list) else [results]

def calculate(expression: str) -> float:
    """Safely evaluate mathematical expressions."""
    with dspy.PythonInterpreter() as interp:
        return interp(expression)

agent = dspy.ReAct("question -> answer", tools=[search, calculate])

Phase 3: Optimize with GEPA

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

optimizer = dspy.GEPA(
    metric=gepa_metric,
    reflection_lm=dspy.LM("openai/gpt-4o"),  # Strong model for reflection
    auto="medium"
)

compiled_agent = optimizer.compile(agent, trainset=trainset)

Production Example

import dspy
from dspy.evaluate import Evaluate
import logging

logger = logging.getLogger(__name__)

class ResearchAgent(dspy.Module):
    def __init__(self):
        self.react = dspy.ReAct(
            "question -> answer",
            tools=[self.search, self.summarize]
        )
    
    def search(self, query: str) -> list[str]:
        """Search for relevant documents."""
        rm = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
        results = rm(query, k=5)
        return results if isinstance(results, list) else [results]
    
    def summarize(self, text: str) -> str:
        """Summarize long text into key points."""
        summarizer = dspy.Predict("text -> summary")
        return summarizer(text=text).summary
    
    def forward(self, question):
        return self.react(question=question)

def detailed_feedback_metric(example, pred, trace=None, pred_name=None, pred_trace=None):
    """Rich feedback for GEPA reflection."""
    expected = example.answer.lower().strip()
    actual = pred.answer.lower().strip() if pred.answer else ""
    
    # Exact match
    if expected == actual:
        return dspy.Prediction(score=1.0, feedback="Perfect match. Answer is correct and concise.")
    
    # Partial match
    if expected in actual or actual in expected:
        return dspy.Prediction(score=0.7, feedback=f"Partial match. Expected '{example.answer}', got '{pred.answer}'. Answer contains correct info but may be verbose or incomplete.")
    
    # Check for key terms
    expected_terms = set(expected.split())
    actual_terms = set(actual.split())
    overlap = len(expected_terms & actual_terms) / max(len(expected_terms), 1)
    
    if overlap > 0.5:
        return dspy.Prediction(score=0.5, feedback=f"Some overlap. Expected '{example.answer}', got '{pred.answer}'. Key terms present but answer structure differs.")
    
    return dspy.Prediction(score=0.0, feedback=f"Incorrect. Expected '{example.answer}', got '{pred.answer}'. The agent may need better search queries or reasoning.")

def optimize_research_agent(trainset, devset):
    """Full GEPA optimization pipeline."""
    
    dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
    
    agent = ResearchAgent()
    
    # Convert metric for evaluation (just score)
    def eval_metric(example, pred, trace=None):
        return detailed_feedback_metric(example, pred, trace).score
    
    evaluator = Evaluate(devset=devset, num_threads=8, metric=eval_metric)
    baseline = evaluator(agent)
    logger.info(f"Baseline: {baseline:.2%}")
    
    # GEPA optimization
    optimizer = dspy.GEPA(
        metric=detailed_feedback_metric,
        reflection_lm=dspy.LM("openai/gpt-4o"),
        auto="medium"
    )
    
    compiled = optimizer.compile(agent, trainset=trainset)
    optimized = evaluator(compiled)
    logger.info(f"Optimized: {optimized:.2%}")
    
    compiled.save("research_agent_gepa.json")
    return compiled

Metric Contract

GEPA metrics must accept (gold, pred, trace, pred_name, pred_trace). Return dspy.Prediction(score=..., feedback=...) when textual feedback is available. Do not pass enable_tool_optimization; it is not a DSPy 3.2.1 GEPA constructor argument.

Best Practices

Rich feedback - More detailed feedback = better reflection
Strong reflection LM - Use GPT-4 or Claude for reflection
Agentic focus - Best for ReAct and multi-tool systems
Trace analysis - GEPA analyzes full execution trajectories

Limitations

Requires custom feedback metrics (not just scores)
Expensive: uses strong LM for reflection
Newer optimizer, less battle-tested than MIPROv2
Best for instruction optimization, less for demos

Official Documentation

DSPy Documentation: https://dspy.ai/
DSPy GitHub: https://github.com/stanfordnlp/dspy
GEPA Optimizer: https://dspy.ai/api/optimizers/GEPA/
Agents Guide: https://dspy.ai/tutorials/agents/

omidzamani/dspy-gepa-reflective

skills/dspy-gepa-reflective/SKILL.md

This skill should be used when the user asks to "optimize an agent with GEPA", "use reflective optimization", "optimize ReAct agents", "provide feedback metrics", mentions "GEPA optimizer", "LLM reflection", "execution trajectories", "agentic systems optimization", or needs to optimize complex multi-step agents using textual feedback on execution traces.

78 stars

development

Updated Jun 3, 2026

$ install --global

skillsauth

npx skillsauth add omidzamani/dspy-skills dspy-gepa-reflective

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 3, 2026, 3:10 AM17.2s1 file scanned

SKILL.md

name:: dspy-gepa-reflective
version:: 1.0.0
dspy-compatibility:: 3.2.1
description:: This skill should be used when the user asks to "optimize an agent with GEPA", "use reflective optimization", "optimize ReAct agents", "provide feedback metrics", mentions "GEPA optimizer", "LLM reflection", "execution trajectories", "agentic systems optimization", or needs to optimize complex multi-step agents using textual feedback on execution traces.

DSPy GEPA Optimizer

Goal

Optimize complex agentic systems using LLM reflection on full execution traces with Pareto-based evolutionary search.

When to Use

Agentic systems with tool use
When you have rich textual feedback on failures
Complex multi-step workflows
Instruction-only optimization needed

Related Skills

For non-agentic programs: dspy-miprov2-optimizer, dspy-bootstrap-fewshot
Measure improvements: dspy-evaluation-suite

Inputs

Outputs

| Output | Type | Description | |--------|------|-------------| | compiled_program | dspy.Module | Reflectively optimized program |

Workflow

Phase 1: Define Feedback Metric

GEPA requires metrics that return textual feedback:

def gepa_metric(example, pred, trace=None, pred_name=None, pred_trace=None):
    """Return score and actionable feedback for GEPA reflection."""
    is_correct = example.answer.lower() in pred.answer.lower()
    
    if is_correct:
        feedback = "Correct. The answer accurately addresses the question."
    else:
        feedback = f"Incorrect. Expected '{example.answer}' but got '{pred.answer}'. The model may have misunderstood the question or retrieved irrelevant information."
    
    return dspy.Prediction(score=float(is_correct), feedback=feedback)

Phase 2: Setup Agent

import dspy

def search(query: str) -> list[str]:
    """Search knowledge base for relevant information."""
    rm = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
    results = rm(query, k=3)
    return results if isinstance(results, list) else [results]

def calculate(expression: str) -> float:
    """Safely evaluate mathematical expressions."""
    with dspy.PythonInterpreter() as interp:
        return interp(expression)

agent = dspy.ReAct("question -> answer", tools=[search, calculate])

Phase 3: Optimize with GEPA

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

optimizer = dspy.GEPA(
    metric=gepa_metric,
    reflection_lm=dspy.LM("openai/gpt-4o"),  # Strong model for reflection
    auto="medium"
)

compiled_agent = optimizer.compile(agent, trainset=trainset)

Production Example

import dspy
from dspy.evaluate import Evaluate
import logging

logger = logging.getLogger(__name__)

class ResearchAgent(dspy.Module):
    def __init__(self):
        self.react = dspy.ReAct(
            "question -> answer",
            tools=[self.search, self.summarize]
        )
    
    def search(self, query: str) -> list[str]:
        """Search for relevant documents."""
        rm = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
        results = rm(query, k=5)
        return results if isinstance(results, list) else [results]
    
    def summarize(self, text: str) -> str:
        """Summarize long text into key points."""
        summarizer = dspy.Predict("text -> summary")
        return summarizer(text=text).summary
    
    def forward(self, question):
        return self.react(question=question)

def detailed_feedback_metric(example, pred, trace=None, pred_name=None, pred_trace=None):
    """Rich feedback for GEPA reflection."""
    expected = example.answer.lower().strip()
    actual = pred.answer.lower().strip() if pred.answer else ""
    
    # Exact match
    if expected == actual:
        return dspy.Prediction(score=1.0, feedback="Perfect match. Answer is correct and concise.")
    
    # Partial match
    if expected in actual or actual in expected:
        return dspy.Prediction(score=0.7, feedback=f"Partial match. Expected '{example.answer}', got '{pred.answer}'. Answer contains correct info but may be verbose or incomplete.")
    
    # Check for key terms
    expected_terms = set(expected.split())
    actual_terms = set(actual.split())
    overlap = len(expected_terms & actual_terms) / max(len(expected_terms), 1)
    
    if overlap > 0.5:
        return dspy.Prediction(score=0.5, feedback=f"Some overlap. Expected '{example.answer}', got '{pred.answer}'. Key terms present but answer structure differs.")
    
    return dspy.Prediction(score=0.0, feedback=f"Incorrect. Expected '{example.answer}', got '{pred.answer}'. The agent may need better search queries or reasoning.")

def optimize_research_agent(trainset, devset):
    """Full GEPA optimization pipeline."""
    
    dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
    
    agent = ResearchAgent()
    
    # Convert metric for evaluation (just score)
    def eval_metric(example, pred, trace=None):
        return detailed_feedback_metric(example, pred, trace).score
    
    evaluator = Evaluate(devset=devset, num_threads=8, metric=eval_metric)
    baseline = evaluator(agent)
    logger.info(f"Baseline: {baseline:.2%}")
    
    # GEPA optimization
    optimizer = dspy.GEPA(
        metric=detailed_feedback_metric,
        reflection_lm=dspy.LM("openai/gpt-4o"),
        auto="medium"
    )
    
    compiled = optimizer.compile(agent, trainset=trainset)
    optimized = evaluator(compiled)
    logger.info(f"Optimized: {optimized:.2%}")
    
    compiled.save("research_agent_gepa.json")
    return compiled

Metric Contract

Best Practices

Rich feedback - More detailed feedback = better reflection
Strong reflection LM - Use GPT-4 or Claude for reflection
Agentic focus - Best for ReAct and multi-tool systems
Trace analysis - GEPA analyzes full execution trajectories

Limitations

Requires custom feedback metrics (not just scores)
Expensive: uses strong LM for reflection
Newer optimizer, less battle-tested than MIPROv2
Best for instruction optimization, less for demos

Official Documentation

DSPy Documentation: https://dspy.ai/
DSPy GitHub: https://github.com/stanfordnlp/dspy
GEPA Optimizer: https://dspy.ai/api/optimizers/GEPA/
Agents Guide: https://dspy.ai/tutorials/agents/

Related Skills

omidzamani/dspy-simba-optimizer

tools

VerifiedTrustedCommunity

This skill should be used when the user asks to "optimize with SIMBA", "use mini-batch introspective optimization", "generate self-reflective rules", mentions "SIMBA optimizer", "stochastic mini-batch ascent", "output variability", or needs an alternative to MIPROv2/GEPA that evolves rules and demonstrations from numeric metrics.

78SKILL.mdUpdated Jun 3, 2026

omidzamani/dspy-simba-optimizer

omidzamani/dspy-signature-designer

data-ai

VerifiedTrustedCommunity

This skill should be used when the user asks to "create a DSPy signature", "define inputs and outputs", "design a signature", "use InputField or OutputField", "add type hints to DSPy", mentions "signature class", "type-safe DSPy", "Pydantic models in DSPy", or needs to define what a DSPy module should do with structured inputs and outputs.

78SKILL.mdUpdated Jun 3, 2026

omidzamani/dspy-signature-designer

omidzamani/dspy-reasoning-modules

development

VerifiedTrustedCommunity

This skill should be used when the user asks to "use DSPy RLM", "process a very long context", "use ProgramOfThought", "use CodeAct", "run DSPy modules in parallel", mentions Recursive Language Models, sandboxed Python execution, Deno, `dspy.RLM`, `dspy.ProgramOfThought`, `dspy.CodeAct`, or `dspy.Parallel`, or needs to choose a DSPy reasoning module beyond Predict, ChainOfThought, and ReAct.

78SKILL.mdUpdated Jun 3, 2026

omidzamani/dspy-reasoning-modules

omidzamani/dspy-react-agent-builder

tools

VerifiedTrustedCommunity

This skill should be used when the user asks to "create a ReAct agent", "build an agent with tools", "implement tool-calling agent", "use dspy.ReAct", mentions "agent with tools", "reasoning and acting", "multi-step agent", "agent optimization with GEPA", or needs to build production agents that use tools to solve complex tasks.

78SKILL.mdUpdated Jun 3, 2026

omidzamani/dspy-react-agent-builder

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/omidzamani/dspy-skills.git

# Copy into Claude Code skills folder (global)
cp -r dspy-skills/skills/dspy-gepa-reflective ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

omidzamani/dspy-skills

78 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT