skills/dspy-program-of-thought/SKILL.md
Use when the task requires precise computation, math, or data manipulation — the LM writes Python code that executes in a sandbox instead of reasoning in natural language. Common scenarios - math word problems, data manipulation tasks, precise calculations the LLM gets wrong in natural language, statistical analysis, or any task where writing and executing code gives better results than reasoning in text. Related - ai-reasoning, dspy-chain-of-thought, dspy-codeact. Also used for dspy.ProgramOfThought, LLM writes code to solve problem, code generation for computation, math with LLM via code, execute Python to get answer, when chain of thought gives wrong math, computation via code not text, precise calculations with LLM, data analysis by generating code, sandbox code execution, code-based reasoning, ProgramOfThought vs ChainOfThought, solve with code not words.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-program-of-thoughtInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through using DSPy's ProgramOfThought module, which has the LM write Python code to solve a problem and then executes that code to produce the answer.
Before using ProgramOfThought, clarify:
dspy.ProgramOfThought is a module that asks the LM to express its reasoning as executable Python code instead of natural language. The generated code runs in a sandboxed environment, and the execution result becomes the output.
This is fundamentally different from ChainOfThought:
Think of it as: the LM becomes a programmer that writes a small script to solve your problem, rather than trying to solve it in its head.
Use ProgramOfThought when the task involves:
Do not use it when:
dspy.Predict or dspy.ChainOfThought insteaddspy.ReAct insteadProgramOfThought requires Deno for sandboxed code execution:
# macOS
brew install deno
# Linux / Windows
curl -fsSL https://deno.land/install.sh | sh
Verify: deno --version. The first run will download Pyodide (~30s).
import dspy
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)
# Inline signature
solver = dspy.ProgramOfThought("question -> answer")
result = solver(question="What is 15% tip on a $84.50 dinner bill split 3 ways?")
print(result.answer) # Precise computed result
ProgramOfThought works with any signature -- inline strings or class-based:
class MathProblem(dspy.Signature):
"""Solve the given math problem by writing and executing Python code."""
problem: str = dspy.InputField(desc="A math word problem")
answer: float = dspy.OutputField(desc="The numerical answer")
solver = dspy.ProgramOfThought(MathProblem)
result = solver(problem="A store has a 20% off sale. An item costs $45. What is the sale price after 8% tax?")
print(result.answer)
When you call a ProgramOfThought module, here is what happens:
The LM does not directly produce the answer. It produces code, and the code produces the answer. This means arithmetic is done by Python (exact), not by the LM (approximate).
The generated code runs in a Deno/Pyodide WASM sandbox — isolated from your host filesystem, network, and environment. Pyodide includes Python's standard library (math, datetime, collections, itertools, re, json, statistics) plus some scientific packages. External packages not bundled with Pyodide are not available by default.
dspy.ProgramOfThought(
signature, # str | type[Signature] -- required
max_iters=3, # int -- max code generation/retry attempts
interpreter=None, # PythonInterpreter | None -- custom sandbox config
)
If the generated code raises an exception, ProgramOfThought retries by generating new code. The LM sees the error traceback from the previous attempt, which helps it self-correct. Control retries with max_iters (default: 3).
Wrap ProgramOfThought in a custom module to combine computation with other reasoning steps:
import dspy
class FinancialAnalyzer(dspy.Module):
def __init__(self):
self.compute = dspy.ProgramOfThought("scenario, question -> result: float")
self.explain = dspy.ChainOfThought("scenario, question, result -> explanation")
def forward(self, scenario, question):
# Step 1: Compute the exact numerical answer
computed = self.compute(scenario=scenario, question=question)
# Step 2: Explain the result in plain language
explained = self.explain(
scenario=scenario,
question=question,
result=str(computed.result),
)
return dspy.Prediction(
result=computed.result,
explanation=explained.explanation,
)
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)
analyzer = FinancialAnalyzer()
result = analyzer(
scenario="Revenue was $1.2M in Q1, $1.5M in Q2, $1.1M in Q3, $1.8M in Q4.",
question="What is the year-over-year growth rate if last year's total was $4.8M?",
)
print(result.result)
print(result.explanation)
This pattern -- compute first, explain second -- gives you both precision and readability.
ProgramOfThought modules work with DSPy optimizers just like any other module. The optimizer tunes the instructions and few-shot examples that guide code generation:
def metric(example, prediction, trace=None):
return abs(float(prediction.answer) - float(example.answer)) < 0.01
optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
optimized_solver = optimizer.compile(solver, trainset=trainset)
Optimization improves the quality of the generated code by showing the LM examples of good code-generation patterns.
PythonInterpreter with enable_read_paths, enable_network_access, etc. if needed.ChainOfThought is faster and cheaper.| Scenario | Use | Why |
|----------|-----|-----|
| "What is 17% of $234.89?" | ProgramOfThought | Arithmetic -- code is exact |
| "Summarize this article" | ChainOfThought | No computation needed |
| "How many days between March 3 and November 17?" | ProgramOfThought | Date math -- code handles edge cases |
| "Classify this support ticket" | ChainOfThought | Qualitative judgment |
| "Given these 50 data points, what is the standard deviation?" | ProgramOfThought | Statistical computation |
| "Explain why this code has a bug" | ChainOfThought | Reasoning about code, not running code |
| "Sort these 20 items by priority score and return the top 5" | ProgramOfThought | Data manipulation |
Rule of thumb: if you would reach for a calculator or a spreadsheet, use ProgramOfThought.
exec() call. Without Deno installed, the module crashes with a subprocess error. Always check deno --version or include Deno installation in setup instructions.max_iters=5 without justification. The default is 3, which handles most retry scenarios. Only increase if you have evidence that code generation is failing due to complex logic. Higher values burn more tokens on retries.interpreter parameter. For tasks that need file access, network access, or environment variables, pass a custom PythonInterpreter(enable_read_paths=[...], enable_network_access=[...]) instead of trying to work around sandbox restrictions.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/dspy-predict/dspy-chain-of-thought/dspy-modules/ai-reasoning/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.