skills/dspy-phoenix/SKILL.md
Use Arize Phoenix for DSPy tracing and evaluation. Use when you want to set up Phoenix, arize-phoenix, openinference, DSPyInstrumentor, open-source trace viewer, localhost:6006, or LLM evals. Also used for phoenix setup, arize phoenix, pip install arize-phoenix, phoenix local UI, phoenix evaluations, DSPy trace viewer, open-source LLM observability, phoenix vs langtrace, openinference-instrumentation-dspy, phoenix.otel register.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-phoenixInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through setting up Arize Phoenix for DSPy tracing, visualization, and evaluation.
Phoenix is an open-source LLM observability platform that runs locally or in the cloud. It provides a trace viewer, evaluation tools, and dataset management — all with DSPy auto-instrumentation via the OpenInference plugin.
px.launch_app() starts a UI at http://localhost:6006 — no account needed| Component | Details captured | |-----------|-----------------| | LM calls | Prompts, responses, token counts, latency | | Retrievals | Queries, passages, relevance scores | | Module executions | Input/output per module step | | Full pipeline | Nested spans showing the complete call tree |
Use Phoenix when:
Do NOT use Phoenix when:
/dspy-langtrace/dspy-weave/dspy-mlflowpip install arize-phoenix openinference-instrumentation-dspy openinference-instrumentation-litellm
DSPy uses LiteLLM under the hood — install both instrumentors to get token counts and cost tracking.
import phoenix as px
from phoenix.otel import register
# Launch local UI at http://localhost:6006
px.launch_app()
# Register with auto-instrumentation (instruments DSPy + LiteLLM automatically)
tracer_provider = register(
project_name="my-dspy-project",
auto_instrument=True,
)
# All DSPy calls are now traced
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or any LiteLLM-supported provider
program = dspy.ChainOfThought("question -> answer")
result = program(question="What is DSPy?")
# View traces at http://localhost:6006
For teams that want persistent storage and collaboration:
import os
from phoenix.otel import register
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"
os.environ["PHOENIX_API_KEY"] = "your-api-key"
tracer_provider = register(
project_name="my-dspy-project",
auto_instrument=True,
)
Use using_attributes to attach session, user, and tag metadata:
from phoenix.otel import using_attributes
with using_attributes(
session_id="session-123",
user_id="user-456",
metadata={"environment": "staging"},
tags=["experiment-v2"],
):
result = program(question="What is DSPy?")
# This trace will carry the session/user/tag metadata in Phoenix
import phoenix as px
from phoenix.otel import register
px.launch_app()
register(project_name="rag-pipeline", auto_instrument=True)
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or any LiteLLM-supported provider
class RAGPipeline(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.answer(context=context, question=question)
pipeline = RAGPipeline()
result = pipeline(question="How do refunds work?")
# Open http://localhost:6006 to see the trace tree:
# RAGPipeline
# +-- Retrieve (query, passages, latency)
# +-- ChainOfThought (prompt, response, tokens)
Phoenix includes a built-in evals module for scoring LM outputs:
from phoenix.evals import llm_classify, OpenAIModel
# Define evaluation criteria
eval_model = OpenAIModel(model="gpt-4o-mini")
# Score traces against criteria
eval_results = llm_classify(
dataframe=px.Client().get_spans_dataframe(),
model=eval_model,
template="Is this response helpful and accurate? {output}",
rails=["helpful", "not helpful"],
)
This is useful for:
| Feature | Arize Phoenix | Langtrace | Jaeger |
|---------|---------------|-----------|--------|
| DSPy auto-instrumentation | Yes (plugin) | Yes (built-in) | Manual |
| Setup effort | Two lines + launch | One line | Docker + manual spans |
| Local mode (no cloud) | Yes (px.launch_app()) | Yes (Docker) | Yes (Docker) |
| Cloud option | Yes (Arize platform) | Yes (app.langtrace.ai) | No |
| Built-in evals | Yes (evals module) | Basic | No |
| Dataset management | Yes | No | No |
| LM call details | Prompts, tokens, latency | Prompts, tokens, cost | Custom attributes |
| Best for | Teams wanting evals + traces | DSPy-first teams | Teams already using Jaeger |
Want DSPy tracing?
|
+- Need built-in evals + dataset management? -> Arize Phoenix
+- Want easiest one-line setup? -> Langtrace (/dspy-langtrace)
+- Team already uses W&B? -> W&B Weave (/dspy-weave)
+- Need full ML lifecycle (registry, deploy)? -> MLflow (/dspy-mlflow)
+- Team already uses Jaeger? -> Jaeger (see /ai-tracing-requests)
openinference-instrumentation-dspy but forgets openinference-instrumentation-litellm. Without it, traces show LM calls but token counts and costs are missing. Always install both.DSPyInstrumentor().instrument() pattern instead of register(auto_instrument=True). The register function from phoenix.otel is the current recommended approach — it auto-discovers and instruments all installed OpenInference packages. Manual DSPyInstrumentor().instrument() still works but misses LiteLLM spans.px.launch_app() before register() in local mode. Without px.launch_app(), there is no local collector to receive traces. Call px.launch_app() first, then register(). In cloud mode, set PHOENIX_COLLECTOR_ENDPOINT instead.using_attributes, all traces look identical in the UI. Wrap DSPy calls in using_attributes(session_id=..., user_id=..., tags=[...]) to make traces filterable and attributable.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/dspy-langtrace/dspy-weave/dspy-mlflow/ai-monitoring/ai-tracing-requests/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.