skills/ai-choosing-architecture/SKILL.md
Pick the right DSPy module and architecture for your AI feature. Use when you are not sure whether to use Predict, ChainOfThought, ReAct, or a pipeline, need to choose between DSPy patterns, want architecture advice for your AI feature, or are deciding between a single module and a multi-step pipeline. Also use for which DSPy module should I use, Predict vs ChainOfThought, when to use ReAct, single module vs pipeline, DSPy architecture decision, CoT vs PoT vs ReAct, do I need a pipeline, module selection guide, DSPy pattern selection, how to structure my DSPy program.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills ai-choosing-architectureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
/dspy-* skill/ai-fixing-errors/dspy-* skill/ai-planningBefore recommending anything, get answers to these three questions from the user (or infer them from context):
Walk the decision tree:
Does it need tools?
├── Yes: Does it need to write and run code?
│ ├── Yes → CodeAct
│ └── No → ReAct
└── No: How complex is the reasoning?
├── Simple (direct mapping) → Predict
├── Moderate (needs explanation) → ChainOfThought
├── Complex (math/computation) → ProgramOfThought
└── Very complex (compare approaches) → MultiChainComparison
Module tradeoff summary:
| Module | Accuracy | Latency | Cost | Best for | |---|---|---|---|---| | Predict | Baseline | 1x | 1x | Simple classification, extraction, formatting | | ChainOfThought | +10-30% | 1.5-2x | 1.5-2x | Most tasks — default choice when unsure | | ProgramOfThought | +20-40% on math | 2-3x | 2-3x | Math, computation, data manipulation | | ReAct | Varies | 3-10x | 3-10x | Tasks requiring external information or actions | | CodeAct | Varies | 3-10x | 3-10x | Tasks requiring code generation and execution | | MultiChainComparison | +5-15% | 3-5x | 3-5x | When you need the best possible single answer | | BestOfN | +5-10% | Nx | Nx | When you have a reward function and acceptance threshold |
For the full module list including Refine, RLM, and Parallel, see reference.md.
Use this table to decide whether one module is enough or a pipeline is warranted:
| Signal | Single module | Pipeline | |---|---|---| | Input maps directly to output | Yes | -- | | Task has distinct phases (classify then generate) | -- | Yes | | Different parts need different LM capabilities | -- | Yes | | Need to validate intermediate results | -- | Yes | | Simple input-output with clear signature | Yes | -- | | Need to combine retrieval + generation | -- | Yes |
Rule of thumb: start with a single module. Add pipeline stages only when you have measured a quality gap that a single module cannot close.
Verification: After implementing the chosen architecture, run dspy.Evaluate(devset, metric=your_metric) on 20-50 examples to confirm the module choice was correct before optimizing.
| Architecture | First optimizer | Best optimizer | Why | |---|---|---|---| | Single Predict | BootstrapFewShot | MIPROv2 | Simple, fast to optimize | | Single ChainOfThought | BootstrapFewShot | MIPROv2 | Reasoning benefits from good demos | | ReAct agent | BootstrapFewShot | BootstrapFewShot | Agents are hard to optimize, start simple | | Multi-module pipeline | BootstrapFewShot | MIPROv2 | End-to-end optimization tunes all stages | | Pipeline with fine-tuning | BootstrapFinetune | BetterTogether | Weight tuning for max quality |
Output the recommendation in this format:
## Architecture Recommendation
**Module:** dspy.ChainOfThought (or whatever was chosen)
**Why:** [1-2 sentences tying the module to the task]
**Skeleton:**
[minimal code showing the module or pipeline structure]
**Optimizer path:**
1. Start with BootstrapFewShot (quick baseline)
2. Move to MIPROv2 if accuracy needs to improve
**Alternative considered:** [what else was considered and why it was not chosen]
import dspy
class MyTask(dspy.Signature):
"""One sentence describing the task."""
input_text: str = dspy.InputField()
output_label: str = dspy.OutputField()
predictor = dspy.Predict(MyTask)
result = predictor(input_text="...")
print(result.output_label)
import dspy
class MyTask(dspy.Signature):
"""One sentence describing the task."""
question: str = dspy.InputField()
answer: str = dspy.OutputField()
cot = dspy.ChainOfThought(MyTask)
result = cot(question="...")
print(result.answer)
import dspy
def search(query: str) -> str:
"""Search external knowledge base."""
...
def lookup(term: str) -> str:
"""Look up a term in a database."""
...
class MyAgentTask(dspy.Signature):
"""Answer questions using search and lookup tools."""
question: str = dspy.InputField()
answer: str = dspy.OutputField()
agent = dspy.ReAct(MyAgentTask, tools=[search, lookup])
result = agent(question="...")
print(result.answer)
import dspy
class Classify(dspy.Signature):
"""Classify the input into a category."""
text: str = dspy.InputField()
category: str = dspy.OutputField()
class Generate(dspy.Signature):
"""Generate a response given the category and original text."""
text: str = dspy.InputField()
category: str = dspy.InputField()
response: str = dspy.OutputField()
class ClassifyThenGenerate(dspy.Module):
def __init__(self):
self.classify = dspy.Predict(Classify)
self.generate = dspy.ChainOfThought(Generate)
def forward(self, text: str) -> dspy.Prediction:
category = self.classify(text=text).category
response = self.generate(text=text, category=category).response
return dspy.Prediction(category=category, response=response)
import dspy
retriever = dspy.Retrieve(k=3)
class Reason(dspy.Signature):
"""Given context passages, identify the key facts relevant to the question."""
question: str = dspy.InputField()
context: list[str] = dspy.InputField()
key_facts: str = dspy.OutputField()
class Answer(dspy.Signature):
"""Answer the question using the identified key facts."""
question: str = dspy.InputField()
key_facts: str = dspy.InputField()
answer: str = dspy.OutputField()
class RAGPipeline(dspy.Module):
def __init__(self):
self.retrieve = retriever
self.reason = dspy.ChainOfThought(Reason)
self.answer = dspy.ChainOfThought(Answer)
def forward(self, question: str) -> dspy.Prediction:
passages = self.retrieve(question).passages
key_facts = self.reason(question=question, context=passages).key_facts
answer = self.answer(question=question, key_facts=key_facts).answer
return dspy.Prediction(answer=answer, passages=passages)
Defaulting to ChainOfThought for everything. Predict is better for simple classification or extraction where reasoning adds noise, not signal. If the correct output is a fixed label from a known set, CoT can hallucinate reasoning that leads it astray.
Using ReAct when a pipeline suffices. ReAct is for tasks that need dynamic tool selection at runtime. If you know the steps upfront (e.g., always retrieve then answer), use a pipeline — it is cheaper, faster, and easier to optimize.
Over-engineering with MultiChainComparison. MCC runs 3-5x the cost of a single pass. Only reach for it after measuring that single-pass accuracy is insufficient for your use case.
Building a pipeline before proving a single module works. Always start with the simplest module that could work. Measure it on your eval set. Add pipeline stages only when you have a specific, measured quality gap.
Ignoring cost implications early. A ReAct agent with 10 tool calls costs roughly 10x a single Predict call. Factor cost and latency into architecture decisions before you build, not after.
Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/dspy-* skill for your chosen module/ai-building-pipelines/ai-planning/ai-auditing-code/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.