skills/dspy-utils/SKILL.md
Use when you need DSPy infrastructure - caching control, debugging with inspect_history, saving/loading optimized programs, or runtime validation with Refine/BestOfN. Common scenarios - controlling the cache to avoid stale results, debugging with inspect_history to see raw prompts, saving and loading optimized programs, or validating outputs with reward functions. For streaming see /dspy-streaming, for async see /dspy-async, for MCP see /dspy-mcp. Related - ai-tracing-requests, ai-serving-apis, ai-monitoring, dspy-streaming, dspy-async, dspy-mcp. Also used for dspy.inspect_history, dspy.settings.configure, cache control in DSPy, save and load DSPy program, debug DSPy prompts, see what DSPy sent to the model, DSPy program serialization, production DSPy utilities, clear DSPy cache, view prompt history.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-utilsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through DSPy's utility functions -- controlling caching, debugging calls, persisting optimized programs, and enforcing runtime constraints with reward functions.
Looking for streaming, async, or MCP? These have dedicated skills now:
- Streaming tokens to a UI -- see
/dspy-streaming- Async execution and FastAPI -- see
/dspy-async- MCP server integration -- see
/dspy-mcp
Ask the user before diving in:
Then jump to the relevant section below.
DSPy caches LM responses by default to reduce costs and speed up development. Use dspy.configure_cache to control this globally.
# Disable caching entirely
dspy.configure_cache(enable=False)
# Re-enable caching
dspy.configure_cache(enable=True)
You can also control caching per LM instance:
# This LM never caches
lm_no_cache = dspy.LM("openai/gpt-4o-mini", cache=False)
# This LM caches (default)
lm_cached = dspy.LM("openai/gpt-4o-mini", cache=True)
Cache is stored locally on disk. Identical calls (same prompt, parameters, model) return cached results with no API call.
When NOT to disable caching: During optimization runs -- optimizers rely heavily on cache to avoid redundant LM calls. Disabling cache globally during optimization dramatically increases cost and time.
dspy.inspect_history shows the raw prompts and responses from recent LM calls. This is the single most useful debugging tool in DSPy.
import dspy
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)
classify = dspy.Predict("text -> label")
classify(text="Great product!")
# See what was actually sent to and received from the LM
dspy.inspect_history(n=1) # Show last 1 call
dspy.inspect_history(n=3) # Show last 3 calls
dspy.inspect_history(n=1) to see the last LM callFor more detailed tracing, configure DSPy with an empty trace list:
dspy.configure(lm=lm, trace=[])
You can also print a module to see its structure:
print(my_program) # Shows module tree with all sub-modules and signatures
After optimizing a DSPy program, save its learned state (few-shot demos, instructions) for production use.
# After optimization
optimized = optimizer.compile(my_program, trainset=trainset)
optimized.save("optimized_program.json")
# In production -- create a fresh instance, then load state
program = MyProgram()
program.load("optimized_program.json")
# Use it
result = program(question="What is DSPy?")
dspy.Predict modulesforward() -- that's your code, it must exist at load timeBootstrapFinetune)dspy.configure() before loadingimport dspy
class MyPipeline(dspy.Module):
def __init__(self):
self.classify = dspy.Predict("text -> category")
self.respond = dspy.ChainOfThought("text, category -> response")
def forward(self, text):
cat = self.classify(text=text)
return self.respond(text=text, category=cat.category)
# --- Optimization (run once) ---
# optimizer = dspy.MIPROv2(metric=metric, auto="medium")
# optimized = optimizer.compile(MyPipeline(), trainset=trainset)
# optimized.save("pipeline_v1.json")
# --- Production (run on every request) ---
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)
pipeline = MyPipeline()
pipeline.load("pipeline_v1.json")
result = pipeline(text="How do I reset my password?")
Use dspy.Refine to wrap any module and retry until a reward function returns a score meeting a threshold. This replaced dspy.Assert/dspy.Suggest in DSPy 3.x:
import dspy
qa = dspy.ChainOfThought("question -> answer")
def answer_reward(args, pred):
"""Score answer quality. Returns 0.0-1.0."""
if not pred.answer.strip():
return 0.0
if len(pred.answer.split()) < 5:
return 0.5 # soft penalty for short answers
return 1.0
validated_qa = dspy.Refine(
module=qa,
N=3,
reward_fn=answer_reward,
threshold=1.0,
)
result = validated_qa(question="What is DSPy?")
dspy.Refine -- retries with feedback from the reward function until threshold is met or N attempts exhausted. Use when later attempts can improve based on earlier failures.dspy.BestOfN -- runs N independent attempts and returns the best-scoring one. Use when attempts are independent and cross-attempt feedback would not help.For detailed patterns and examples, see /dspy-refine and /dspy-best-of-n.
save() does not persist forward() logic -- only learned state (demos, instructions) is saved. The class definition must exist in your production code at load time.dspy.configure() before load() -- loading a saved program before configuring the LM causes silent failures where the program runs but uses no LM (or the wrong one).inspect_history shows cached calls too -- after a cache hit, inspect_history still shows the call, but the prompt may look different from what was originally sent. Disable cache if you need exact prompt inspection.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/dspy-streaming/dspy-async/dspy-mcp/dspy-lm -- Configure language models, per-LM caching, inspect_history on LM instances/dspy-modules -- Build composable programs with dspy.Module, save/load patterns/ai-tracing-requests -- Production observability and tracing for DSPy programs/dspy-refine -- Refine patterns, reward functions, and iterative improvement/dspy-best-of-n -- BestOfN for independent sampling without cross-attempt feedback/ai-serving-apis -- Serve DSPy programs as web APIs/ai-do if you do not have it -- it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.