skills/dspy-adapters-multimodal/SKILL.md
This skill should be used when the user asks to "choose a DSPy adapter", "use JSONAdapter", "use XMLAdapter", "enable native function calling", "send images, audio, or files to DSPy", mentions `dspy.ChatAdapter`, `dspy.JSONAdapter`, `dspy.XMLAdapter`, `dspy.Image`, `dspy.Audio`, `dspy.File`, structured outputs, or multimodal DSPy signatures.
npx skillsauth add omidzamani/dspy-skills dspy-adapters-multimodalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Choose an adapter deliberately and model image, audio, and file inputs with DSPy's typed primitives.
| Adapter | Use it for |
|---------|------------|
| dspy.ChatAdapter() | Default, human-readable field markers, broad model compatibility |
| dspy.JSONAdapter() | Structured JSON output and native function calling where supported |
| dspy.XMLAdapter() | XML-tagged fields when XML is easier for the target LM to follow |
| dspy.TwoStepAdapter() | A separate extraction pass when parsing needs extra help |
Configure globally or for a limited scope:
import dspy
dspy.configure(
lm=dspy.LM("openai/gpt-4o-mini"),
adapter=dspy.JSONAdapter(),
)
with dspy.context(adapter=dspy.XMLAdapter()):
result = dspy.Predict("question -> answer")(question="What is DSPy?")
JSONAdapter enables native function calling by default. ChatAdapter keeps text parsing by default. Override either behavior explicitly:
chat_native = dspy.ChatAdapter(use_native_function_calling=True)
json_manual = dspy.JSONAdapter(use_native_function_calling=False)
DSPy falls back to manual parsing when the configured LM does not support native function calling.
class DescribeImage(dspy.Signature):
image: dspy.Image = dspy.InputField()
description: str = dspy.OutputField()
describe = dspy.Predict(DescribeImage)
result = describe(image=dspy.Image("./diagram.png"))
Pass a local path, HTTP URL, bytes, PIL image, or existing data URI directly to dspy.Image(...).
class SummarizeAudio(dspy.Signature):
audio: dspy.Audio = dspy.InputField()
summary: str = dspy.OutputField()
audio = dspy.Audio.from_file("./meeting.wav")
summary = dspy.Predict(SummarizeAudio)(audio=audio)
class SummarizeFile(dspy.Signature):
file: dspy.File = dspy.InputField()
summary: str = dspy.OutputField()
document = dspy.File.from_path("./research.pdf")
summary = dspy.Predict(SummarizeFile)(file=document)
Provider capabilities vary. Verify that the selected model accepts the media type before deployment.
ChatAdapter; switch only for a measured reason.Image.from_file() and Image.from_url() helpers; call dspy.Image(...).tools
This skill should be used when the user asks to "optimize with SIMBA", "use mini-batch introspective optimization", "generate self-reflective rules", mentions "SIMBA optimizer", "stochastic mini-batch ascent", "output variability", or needs an alternative to MIPROv2/GEPA that evolves rules and demonstrations from numeric metrics.
data-ai
This skill should be used when the user asks to "create a DSPy signature", "define inputs and outputs", "design a signature", "use InputField or OutputField", "add type hints to DSPy", mentions "signature class", "type-safe DSPy", "Pydantic models in DSPy", or needs to define what a DSPy module should do with structured inputs and outputs.
development
This skill should be used when the user asks to "use DSPy RLM", "process a very long context", "use ProgramOfThought", "use CodeAct", "run DSPy modules in parallel", mentions Recursive Language Models, sandboxed Python execution, Deno, `dspy.RLM`, `dspy.ProgramOfThought`, `dspy.CodeAct`, or `dspy.Parallel`, or needs to choose a DSPy reasoning module beyond Predict, ChainOfThought, and ReAct.
tools
This skill should be used when the user asks to "create a ReAct agent", "build an agent with tools", "implement tool-calling agent", "use dspy.ReAct", mentions "agent with tools", "reasoning and acting", "multi-step agent", "agent optimization with GEPA", or needs to build production agents that use tools to solve complex tasks.