skills/dspy-production-deployment/SKILL.md
This skill should be used when the user asks to "deploy DSPy", "save and load a DSPy program", "configure DSPy cache", "harden pickle cache", "track DSPy token usage", "run DSPy asynchronously", "stream DSPy output", mentions `configure_cache`, `restrict_pickle`, `track_usage`, `acall`, `asyncify`, `streamify`, `StreamListener`, MLflow deployment, or needs production runtime guidance for a DSPy application.
npx skillsauth add omidzamani/dspy-skills dspy-production-deploymentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Prepare a DSPy program for repeatable, observable, scalable, and safer production execution.
DSPy enables memory and disk caches by default. Disk cache deserialization uses pickle unless restricted. Enable the allowlist mode in production:
import dspy
dspy.configure_cache(restrict_pickle=True)
Register trusted custom cache types only when needed:
dspy.configure_cache(
restrict_pickle=True,
safe_types=[MyResult, Metadata],
)
Disable a cache layer explicitly when a deployment cannot persist data or requires fresh model responses:
dspy.configure_cache(
enable_disk_cache=False,
enable_memory_cache=True,
)
Prefer state-only JSON for readable, safer artifacts:
compiled.save("./artifacts/program.json", save_program=False)
loaded = MyProgram()
loaded.load("./artifacts/program.json")
Use whole-program save only for trusted artifacts. It uses cloudpickle:
compiled.save("./artifacts/program/", save_program=True)
loaded = dspy.load("./artifacts/program/")
Keep the DSPy major version compatible when loading saved programs.
dspy.configure(
lm=dspy.LM("openai/gpt-4o-mini"),
track_usage=True,
)
prediction = program(question="What is DSPy?")
print(prediction.get_lm_usage())
Cached calls return no new token usage.
Most built-in modules support acall():
import asyncio
async def main():
prediction = await program.acall(question="What is DSPy?")
print(prediction.answer)
asyncio.run(main())
Implement aforward() for custom async modules. Use dspy.asyncify(program) only when adapting a synchronous callable is the right boundary.
import asyncio
import dspy
stream_program = dspy.streamify(
dspy.Predict("question -> answer"),
stream_listeners=[
dspy.streaming.StreamListener(signature_field_name="answer"),
],
)
async def main():
async for chunk in stream_program(question="Explain DSPy briefly."):
print(chunk)
asyncio.run(main())
For looped modules such as ReAct, set allow_reuse=True on listeners for repeated fields. Cache hits yield the final Prediction without replaying token chunks.
restrict_pickle=True.tools
This skill should be used when the user asks to "optimize with SIMBA", "use mini-batch introspective optimization", "generate self-reflective rules", mentions "SIMBA optimizer", "stochastic mini-batch ascent", "output variability", or needs an alternative to MIPROv2/GEPA that evolves rules and demonstrations from numeric metrics.
data-ai
This skill should be used when the user asks to "create a DSPy signature", "define inputs and outputs", "design a signature", "use InputField or OutputField", "add type hints to DSPy", mentions "signature class", "type-safe DSPy", "Pydantic models in DSPy", or needs to define what a DSPy module should do with structured inputs and outputs.
development
This skill should be used when the user asks to "use DSPy RLM", "process a very long context", "use ProgramOfThought", "use CodeAct", "run DSPy modules in parallel", mentions Recursive Language Models, sandboxed Python execution, Deno, `dspy.RLM`, `dspy.ProgramOfThought`, `dspy.CodeAct`, or `dspy.Parallel`, or needs to choose a DSPy reasoning module beyond Predict, ChainOfThought, and ReAct.
tools
This skill should be used when the user asks to "create a ReAct agent", "build an agent with tools", "implement tool-calling agent", "use dspy.ReAct", mentions "agent with tools", "reasoning and acting", "multi-step agent", "agent optimization with GEPA", or needs to build production agents that use tools to solve complex tasks.