Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lebsral/dspy-program-of-thought

Name: dspy-program-of-thought
Author: lebsral

skills/dspy-program-of-thought/SKILL.md

npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-program-of-thought

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Solve Problems by Generating and Executing Code with dspy.ProgramOfThought

Guide the user through using DSPy's ProgramOfThought module, which has the LM write Python code to solve a problem and then executes that code to produce the answer.

Step 1: Understand the task

Before using ProgramOfThought, clarify:

Does the task involve computation? ProgramOfThought shines for math, data manipulation, date reasoning — anything where running code gives a more reliable answer than verbal reasoning. If the task is purely qualitative (classification, summarization), use ChainOfThought instead.
Is Deno installed? ProgramOfThought requires Deno to run generated code in a WASM sandbox. Without it, the module will crash.
What should the output look like? A single number, a list, a formatted string? This determines your signature.

What is ProgramOfThought

dspy.ProgramOfThought is a module that asks the LM to express its reasoning as executable Python code instead of natural language. The generated code runs in a sandboxed environment, and the execution result becomes the output.

This is fundamentally different from ChainOfThought:

ChainOfThought -- the LM reasons in natural language, then produces an answer. Good for qualitative reasoning but prone to arithmetic and counting errors.
ProgramOfThought -- the LM writes Python code that computes the answer. The code runs, and the result is exact. Good for anything where computation produces a more reliable answer than verbal reasoning.

Think of it as: the LM becomes a programmer that writes a small script to solve your problem, rather than trying to solve it in its head.

When to use ProgramOfThought

Use ProgramOfThought when the task involves:

Math and arithmetic -- compound interest, tax calculations, unit conversions, statistics
Counting and aggregation -- "how many items match this condition", tallying, grouping
Data manipulation -- sorting, filtering, transforming structured data
Date/time reasoning -- days between dates, business day calculations, timezone math
Precise string operations -- regex matching, character counting, formatting
Logic puzzles -- constraint satisfaction, combinatorics, permutations

Do not use it when:

The task is purely qualitative (summarization, classification, creative writing)
No computation is needed -- use dspy.Predict or dspy.ChainOfThought instead
You need tool use or external API calls -- use dspy.ReAct instead

Setup

ProgramOfThought requires Deno for sandboxed code execution:

# macOS
brew install deno

# Linux / Windows
curl -fsSL https://deno.land/install.sh | sh

Verify: deno --version. The first run will download Pyodide (~30s).

Basic usage

import dspy

lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

# Inline signature
solver = dspy.ProgramOfThought("question -> answer")

result = solver(question="What is 15% tip on a $84.50 dinner bill split 3 ways?")
print(result.answer)  # Precise computed result

ProgramOfThought works with any signature -- inline strings or class-based:

class MathProblem(dspy.Signature):
    """Solve the given math problem by writing and executing Python code."""
    problem: str = dspy.InputField(desc="A math word problem")
    answer: float = dspy.OutputField(desc="The numerical answer")

solver = dspy.ProgramOfThought(MathProblem)
result = solver(problem="A store has a 20% off sale. An item costs $45. What is the sale price after 8% tax?")
print(result.answer)

How it works

When you call a ProgramOfThought module, here is what happens:

Code generation -- the LM receives the signature and inputs, then generates Python code that computes the answer
Sandbox execution -- DSPy executes the generated code in a restricted Python environment
Result extraction -- the output of the code execution is captured and returned as the prediction

The LM does not directly produce the answer. It produces code, and the code produces the answer. This means arithmetic is done by Python (exact), not by the LM (approximate).

What the sandbox provides

The generated code runs in a Deno/Pyodide WASM sandbox — isolated from your host filesystem, network, and environment. Pyodide includes Python's standard library (math, datetime, collections, itertools, re, json, statistics) plus some scientific packages. External packages not bundled with Pyodide are not available by default.

Constructor

dspy.ProgramOfThought(
    signature,                    # str | type[Signature] -- required
    max_iters=3,                  # int -- max code generation/retry attempts
    interpreter=None,             # PythonInterpreter | None -- custom sandbox config
)

Retry on execution failure

If the generated code raises an exception, ProgramOfThought retries by generating new code. The LM sees the error traceback from the previous attempt, which helps it self-correct. Control retries with max_iters (default: 3).

Using ProgramOfThought in a module

Wrap ProgramOfThought in a custom module to combine computation with other reasoning steps:

import dspy


class FinancialAnalyzer(dspy.Module):
    def __init__(self):
        self.compute = dspy.ProgramOfThought("scenario, question -> result: float")
        self.explain = dspy.ChainOfThought("scenario, question, result -> explanation")

    def forward(self, scenario, question):
        # Step 1: Compute the exact numerical answer
        computed = self.compute(scenario=scenario, question=question)

        # Step 2: Explain the result in plain language
        explained = self.explain(
            scenario=scenario,
            question=question,
            result=str(computed.result),
        )

        return dspy.Prediction(
            result=computed.result,
            explanation=explained.explanation,
        )


lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

analyzer = FinancialAnalyzer()
result = analyzer(
    scenario="Revenue was $1.2M in Q1, $1.5M in Q2, $1.1M in Q3, $1.8M in Q4.",
    question="What is the year-over-year growth rate if last year's total was $4.8M?",
)
print(result.result)
print(result.explanation)

This pattern -- compute first, explain second -- gives you both precision and readability.

Optimizing ProgramOfThought

ProgramOfThought modules work with DSPy optimizers just like any other module. The optimizer tunes the instructions and few-shot examples that guide code generation:

def metric(example, prediction, trace=None):
    return abs(float(prediction.answer) - float(example.answer)) < 0.01

optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
optimized_solver = optimizer.compile(solver, trainset=trainset)

Optimization improves the quality of the generated code by showing the LM examples of good code-generation patterns.

Limitations

Requires Deno -- the sandbox uses Deno/Pyodide WASM. Install Deno first or the module crashes.
Sandboxed by default -- no host filesystem, network, or environment access. Use a custom PythonInterpreter with enable_read_paths, enable_network_access, etc. if needed.
Code generation cost -- generating code takes more tokens than a direct answer. For trivial arithmetic (2 + 2), ChainOfThought is faster and cheaper.
LM capability matters -- weaker models generate buggier code. Use a capable model (GPT-4o, Claude Sonnet, etc.) for complex computations.
First run is slow -- Deno downloads and caches Pyodide (~30s) on the first execution. Subsequent runs are fast.

ProgramOfThought vs ChainOfThought -- when to use which

| Scenario | Use | Why | |----------|-----|-----| | "What is 17% of $234.89?" | ProgramOfThought | Arithmetic -- code is exact | | "Summarize this article" | ChainOfThought | No computation needed | | "How many days between March 3 and November 17?" | ProgramOfThought | Date math -- code handles edge cases | | "Classify this support ticket" | ChainOfThought | Qualitative judgment | | "Given these 50 data points, what is the standard deviation?" | ProgramOfThought | Statistical computation | | "Explain why this code has a bug" | ChainOfThought | Reasoning about code, not running code | | "Sort these 20 items by priority score and return the top 5" | ProgramOfThought | Data manipulation |

Rule of thumb: if you would reach for a calculator or a spreadsheet, use ProgramOfThought.

Gotchas

Claude forgets Deno is required. ProgramOfThought uses a Deno/Pyodide WASM sandbox — not a simple exec() call. Without Deno installed, the module crashes with a subprocess error. Always check deno --version or include Deno installation in setup instructions.
Claude uses ProgramOfThought for tasks that do not need computation. Classification, summarization, and extraction are qualitative — ProgramOfThought adds code-generation overhead with no benefit. Use ChainOfThought or Predict for non-computational tasks.
Claude sets max_iters=5 without justification. The default is 3, which handles most retry scenarios. Only increase if you have evidence that code generation is failing due to complex logic. Higher values burn more tokens on retries.
Claude ignores the interpreter parameter. For tasks that need file access, network access, or environment variables, pass a custom PythonInterpreter(enable_read_paths=[...], enable_network_access=[...]) instead of trying to work around sandbox restrictions.

Additional resources

dspy.ProgramOfThought API docs
Deno installation
For API details, see reference.md
For worked examples, see examples.md

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

dspy.Predict for simple direct LM calls -- see /dspy-predict
dspy.ChainOfThought for natural language reasoning -- see /dspy-chain-of-thought
Building modules that combine ProgramOfThought with other steps -- see /dspy-modules
Reasoning patterns and when to add structured thinking -- see /ai-reasoning
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

lebsral/dspy-program-of-thought

skills/dspy-program-of-thought/SKILL.md

Use when the task requires precise computation, math, or data manipulation — the LM writes Python code that executes in a sandbox instead of reasoning in natural language. Common scenarios - math word problems, data manipulation tasks, precise calculations the LLM gets wrong in natural language, statistical analysis, or any task where writing and executing code gives better results than reasoning in text. Related - ai-reasoning, dspy-chain-of-thought, dspy-codeact. Also used for dspy.ProgramOfThought, LLM writes code to solve problem, code generation for computation, math with LLM via code, execute Python to get answer, when chain of thought gives wrong math, computation via code not text, precise calculations with LLM, data analysis by generating code, sandbox code execution, code-based reasoning, ProgramOfThought vs ChainOfThought, solve with code not words.

5 stars

development

Updated May 7, 2026

$ install --global

skillsauth

npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-program-of-thought

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 7, 2026, 7:00 AM134.7s4 files scanned

SKILL.md

name:: dspy-program-of-thought
description:: Use when the task requires precise computation, math, or data manipulation — the LM writes Python code that executes in a sandbox instead of reasoning in natural language. Common scenarios - math word problems, data manipulation tasks, precise calculations the LLM gets wrong in natural language, statistical analysis, or any task where writing and executing code gives better results than reasoning in text. Related - ai-reasoning, dspy-chain-of-thought, dspy-codeact. Also used for dspy.ProgramOfThought, LLM writes code to solve problem, code generation for computation, math with LLM via code, execute Python to get answer, when chain of thought gives wrong math, computation via code not text, precise calculations with LLM, data analysis by generating code, sandbox code execution, code-based reasoning, ProgramOfThought vs ChainOfThought, solve with code not words.

Solve Problems by Generating and Executing Code with dspy.ProgramOfThought

Guide the user through using DSPy's ProgramOfThought module, which has the LM write Python code to solve a problem and then executes that code to produce the answer.

Step 1: Understand the task

Before using ProgramOfThought, clarify:

Does the task involve computation? ProgramOfThought shines for math, data manipulation, date reasoning — anything where running code gives a more reliable answer than verbal reasoning. If the task is purely qualitative (classification, summarization), use ChainOfThought instead.
Is Deno installed? ProgramOfThought requires Deno to run generated code in a WASM sandbox. Without it, the module will crash.
What should the output look like? A single number, a list, a formatted string? This determines your signature.

What is ProgramOfThought

This is fundamentally different from ChainOfThought:

ChainOfThought -- the LM reasons in natural language, then produces an answer. Good for qualitative reasoning but prone to arithmetic and counting errors.
ProgramOfThought -- the LM writes Python code that computes the answer. The code runs, and the result is exact. Good for anything where computation produces a more reliable answer than verbal reasoning.

Think of it as: the LM becomes a programmer that writes a small script to solve your problem, rather than trying to solve it in its head.

When to use ProgramOfThought

Use ProgramOfThought when the task involves:

Math and arithmetic -- compound interest, tax calculations, unit conversions, statistics
Counting and aggregation -- "how many items match this condition", tallying, grouping
Data manipulation -- sorting, filtering, transforming structured data
Date/time reasoning -- days between dates, business day calculations, timezone math
Precise string operations -- regex matching, character counting, formatting
Logic puzzles -- constraint satisfaction, combinatorics, permutations

Do not use it when:

The task is purely qualitative (summarization, classification, creative writing)
No computation is needed -- use dspy.Predict or dspy.ChainOfThought instead
You need tool use or external API calls -- use dspy.ReAct instead

Setup

ProgramOfThought requires Deno for sandboxed code execution:

# macOS
brew install deno

# Linux / Windows
curl -fsSL https://deno.land/install.sh | sh

Verify: deno --version. The first run will download Pyodide (~30s).

Basic usage

import dspy

lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

# Inline signature
solver = dspy.ProgramOfThought("question -> answer")

result = solver(question="What is 15% tip on a $84.50 dinner bill split 3 ways?")
print(result.answer)  # Precise computed result

ProgramOfThought works with any signature -- inline strings or class-based:

class MathProblem(dspy.Signature):
    """Solve the given math problem by writing and executing Python code."""
    problem: str = dspy.InputField(desc="A math word problem")
    answer: float = dspy.OutputField(desc="The numerical answer")

solver = dspy.ProgramOfThought(MathProblem)
result = solver(problem="A store has a 20% off sale. An item costs $45. What is the sale price after 8% tax?")
print(result.answer)

How it works

When you call a ProgramOfThought module, here is what happens:

Code generation -- the LM receives the signature and inputs, then generates Python code that computes the answer
Sandbox execution -- DSPy executes the generated code in a restricted Python environment
Result extraction -- the output of the code execution is captured and returned as the prediction

The LM does not directly produce the answer. It produces code, and the code produces the answer. This means arithmetic is done by Python (exact), not by the LM (approximate).

What the sandbox provides

Constructor

dspy.ProgramOfThought(
    signature,                    # str | type[Signature] -- required
    max_iters=3,                  # int -- max code generation/retry attempts
    interpreter=None,             # PythonInterpreter | None -- custom sandbox config
)

Retry on execution failure

Using ProgramOfThought in a module

Wrap ProgramOfThought in a custom module to combine computation with other reasoning steps:

import dspy


class FinancialAnalyzer(dspy.Module):
    def __init__(self):
        self.compute = dspy.ProgramOfThought("scenario, question -> result: float")
        self.explain = dspy.ChainOfThought("scenario, question, result -> explanation")

    def forward(self, scenario, question):
        # Step 1: Compute the exact numerical answer
        computed = self.compute(scenario=scenario, question=question)

        # Step 2: Explain the result in plain language
        explained = self.explain(
            scenario=scenario,
            question=question,
            result=str(computed.result),
        )

        return dspy.Prediction(
            result=computed.result,
            explanation=explained.explanation,
        )


lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

analyzer = FinancialAnalyzer()
result = analyzer(
    scenario="Revenue was $1.2M in Q1, $1.5M in Q2, $1.1M in Q3, $1.8M in Q4.",
    question="What is the year-over-year growth rate if last year's total was $4.8M?",
)
print(result.result)
print(result.explanation)

This pattern -- compute first, explain second -- gives you both precision and readability.

Optimizing ProgramOfThought

ProgramOfThought modules work with DSPy optimizers just like any other module. The optimizer tunes the instructions and few-shot examples that guide code generation:

def metric(example, prediction, trace=None):
    return abs(float(prediction.answer) - float(example.answer)) < 0.01

optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
optimized_solver = optimizer.compile(solver, trainset=trainset)

Optimization improves the quality of the generated code by showing the LM examples of good code-generation patterns.

Limitations

Requires Deno -- the sandbox uses Deno/Pyodide WASM. Install Deno first or the module crashes.
Sandboxed by default -- no host filesystem, network, or environment access. Use a custom PythonInterpreter with enable_read_paths, enable_network_access, etc. if needed.
Code generation cost -- generating code takes more tokens than a direct answer. For trivial arithmetic (2 + 2), ChainOfThought is faster and cheaper.
LM capability matters -- weaker models generate buggier code. Use a capable model (GPT-4o, Claude Sonnet, etc.) for complex computations.
First run is slow -- Deno downloads and caches Pyodide (~30s) on the first execution. Subsequent runs are fast.

ProgramOfThought vs ChainOfThought -- when to use which

Rule of thumb: if you would reach for a calculator or a spreadsheet, use ProgramOfThought.

Gotchas

Claude forgets Deno is required. ProgramOfThought uses a Deno/Pyodide WASM sandbox — not a simple exec() call. Without Deno installed, the module crashes with a subprocess error. Always check deno --version or include Deno installation in setup instructions.
Claude uses ProgramOfThought for tasks that do not need computation. Classification, summarization, and extraction are qualitative — ProgramOfThought adds code-generation overhead with no benefit. Use ChainOfThought or Predict for non-computational tasks.
Claude sets max_iters=5 without justification. The default is 3, which handles most retry scenarios. Only increase if you have evidence that code generation is failing due to complex logic. Higher values burn more tokens on retries.
Claude ignores the interpreter parameter. For tasks that need file access, network access, or environment variables, pass a custom PythonInterpreter(enable_read_paths=[...], enable_network_access=[...]) instead of trying to work around sandbox restrictions.

Additional resources

dspy.ProgramOfThought API docs
Deno installation
For API details, see reference.md
For worked examples, see examples.md

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

dspy.Predict for simple direct LM calls -- see /dspy-predict
dspy.ChainOfThought for natural language reasoning -- see /dspy-chain-of-thought
Building modules that combine ProgramOfThought with other steps -- see /dspy-modules
Reasoning patterns and when to add structured thinking -- see /ai-reasoning
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

Related Skills

lebsral/ai-watching-optimization

tools

VerifiedTrustedCommunity

See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.

6SKILL.mdUpdated May 31, 2026

lebsral/ai-watching-optimization

lebsral/dspy-miprov2

testing

VerifiedTrustedCommunity

Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.

6SKILL.mdUpdated Apr 27, 2026

lebsral/dspy-langwatch

testing

VerifiedTrustedCommunity

Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.

6SKILL.mdUpdated Apr 27, 2026

lebsral/dspy-langwatch

lebsral/dspy-gepa

data-ai

VerifiedTrustedCommunity

Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.

6SKILL.mdUpdated Apr 27, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lebsral/dspy-programming-not-prompting-lms-skills.git

# Copy into Claude Code skills folder (global)
cp -r dspy-programming-not-prompting-lms-skills/skills/dspy-program-of-thought ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lebsral/dspy-programming-not-prompting-lms-skills

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT