Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lebsral/dspy-phoenix

Name: dspy-phoenix
Author: lebsral

skills/dspy-phoenix/SKILL.md

npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-phoenix

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Arize Phoenix — Open-Source LLM Observability for DSPy

Guide the user through setting up Arize Phoenix for DSPy tracing, visualization, and evaluation.

What is Arize Phoenix

Phoenix is an open-source LLM observability platform that runs locally or in the cloud. It provides a trace viewer, evaluation tools, and dataset management — all with DSPy auto-instrumentation via the OpenInference plugin.

Local mode: px.launch_app() starts a UI at http://localhost:6006 — no account needed
Cloud mode: Hosted on the Arize platform
Open source: github.com/Arize-ai/phoenix

What gets traced

| Component | Details captured | |-----------|-----------------| | LM calls | Prompts, responses, token counts, latency | | Retrievals | Queries, passages, relevance scores | | Module executions | Input/output per module step | | Full pipeline | Nested spans showing the complete call tree |

When to use Phoenix

Use Phoenix when:

You want a local trace viewer with no cloud dependency
You need built-in evaluation tools (evals module)
You want an open-source solution you can self-host
You want to visually inspect what your DSPy pipeline is doing

Do NOT use Phoenix when:

You want the absolute easiest one-line setup — see /dspy-langtrace
Your team already uses W&B — see /dspy-weave
You need the full ML lifecycle (model registry, deployment) — see /dspy-mlflow

Setup

Install

pip install arize-phoenix openinference-instrumentation-dspy openinference-instrumentation-litellm

DSPy uses LiteLLM under the hood — install both instrumentors to get token counts and cost tracking.

Local mode (recommended for development)

import phoenix as px
from phoenix.otel import register

# Launch local UI at http://localhost:6006
px.launch_app()

# Register with auto-instrumentation (instruments DSPy + LiteLLM automatically)
tracer_provider = register(
    project_name="my-dspy-project",
    auto_instrument=True,
)

# All DSPy calls are now traced
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or any LiteLLM-supported provider

program = dspy.ChainOfThought("question -> answer")
result = program(question="What is DSPy?")
# View traces at http://localhost:6006

Cloud mode (Arize platform)

For teams that want persistent storage and collaboration:

import os
from phoenix.otel import register

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"
os.environ["PHOENIX_API_KEY"] = "your-api-key"

tracer_provider = register(
    project_name="my-dspy-project",
    auto_instrument=True,
)

Adding metadata to traces

Use using_attributes to attach session, user, and tag metadata:

from phoenix.otel import using_attributes

with using_attributes(
    session_id="session-123",
    user_id="user-456",
    metadata={"environment": "staging"},
    tags=["experiment-v2"],
):
    result = program(question="What is DSPy?")
    # This trace will carry the session/user/tag metadata in Phoenix

Tracing a DSPy pipeline

import phoenix as px
from phoenix.otel import register

px.launch_app()
register(project_name="rag-pipeline", auto_instrument=True)

import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or any LiteLLM-supported provider

class RAGPipeline(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.answer(context=context, question=question)

pipeline = RAGPipeline()
result = pipeline(question="How do refunds work?")
# Open http://localhost:6006 to see the trace tree:
#   RAGPipeline
#   +-- Retrieve (query, passages, latency)
#   +-- ChainOfThought (prompt, response, tokens)

Evaluations with Phoenix

Phoenix includes a built-in evals module for scoring LM outputs:

from phoenix.evals import llm_classify, OpenAIModel

# Define evaluation criteria
eval_model = OpenAIModel(model="gpt-4o-mini")

# Score traces against criteria
eval_results = llm_classify(
    dataframe=px.Client().get_spans_dataframe(),
    model=eval_model,
    template="Is this response helpful and accurate? {output}",
    rails=["helpful", "not helpful"],
)

This is useful for:

Automated quality checks: score every response in a batch
Finding failure patterns: filter by low-scoring traces
Regression testing: compare eval scores before and after changes

Phoenix vs Langtrace vs Jaeger

| Feature | Arize Phoenix | Langtrace | Jaeger | |---------|---------------|-----------|--------| | DSPy auto-instrumentation | Yes (plugin) | Yes (built-in) | Manual | | Setup effort | Two lines + launch | One line | Docker + manual spans | | Local mode (no cloud) | Yes (px.launch_app()) | Yes (Docker) | Yes (Docker) | | Cloud option | Yes (Arize platform) | Yes (app.langtrace.ai) | No | | Built-in evals | Yes (evals module) | Basic | No | | Dataset management | Yes | No | No | | LM call details | Prompts, tokens, latency | Prompts, tokens, cost | Custom attributes | | Best for | Teams wanting evals + traces | DSPy-first teams | Teams already using Jaeger |

Decision guide

Want DSPy tracing?
|
+- Need built-in evals + dataset management? -> Arize Phoenix
+- Want easiest one-line setup? -> Langtrace (/dspy-langtrace)
+- Team already uses W&B? -> W&B Weave (/dspy-weave)
+- Need full ML lifecycle (registry, deploy)? -> MLflow (/dspy-mlflow)
+- Team already uses Jaeger? -> Jaeger (see /ai-tracing-requests)

Gotchas

Missing LiteLLM instrumentor hides token counts. Claude installs openinference-instrumentation-dspy but forgets openinference-instrumentation-litellm. Without it, traces show LM calls but token counts and costs are missing. Always install both.
Using the old DSPyInstrumentor().instrument() pattern instead of register(auto_instrument=True). The register function from phoenix.otel is the current recommended approach — it auto-discovers and instruments all installed OpenInference packages. Manual DSPyInstrumentor().instrument() still works but misses LiteLLM spans.
Forgetting px.launch_app() before register() in local mode. Without px.launch_app(), there is no local collector to receive traces. Call px.launch_app() first, then register(). In cloud mode, set PHOENIX_COLLECTOR_ENDPOINT instead.
Traces missing metadata for filtering. Without using_attributes, all traces look identical in the UI. Wrap DSPy calls in using_attributes(session_id=..., user_id=..., tags=[...]) to make traces filterable and attributable.

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

Langtrace (easiest DSPy auto-instrumentation) — /dspy-langtrace
W&B Weave (team dashboards, experiment tracking) — /dspy-weave
MLflow (full ML lifecycle) — /dspy-mlflow
Aggregate monitoring (not per-request) — /ai-monitoring
Per-request debugging (inspect_history, JSONL traces) — /ai-tracing-requests
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

Additional resources

Phoenix DSPy integration docs
Phoenix GitHub
For complete setup options and API details, see reference.md
For worked examples, see examples.md

lebsral/dspy-phoenix

skills/dspy-phoenix/SKILL.md

Use Arize Phoenix for DSPy tracing and evaluation. Use when you want to set up Phoenix, arize-phoenix, openinference, DSPyInstrumentor, open-source trace viewer, localhost:6006, or LLM evals. Also used for phoenix setup, arize phoenix, pip install arize-phoenix, phoenix local UI, phoenix evaluations, DSPy trace viewer, open-source LLM observability, phoenix vs langtrace, openinference-instrumentation-dspy, phoenix.otel register.

5 stars

testing

Updated May 7, 2026

$ install --global

skillsauth

npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-phoenix

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 7, 2026, 7:00 AM132.8s4 files scanned

SKILL.md

name:: dspy-phoenix
description:: Use Arize Phoenix for DSPy tracing and evaluation. Use when you want to set up Phoenix, arize-phoenix, openinference, DSPyInstrumentor, open-source trace viewer, localhost:6006, or LLM evals. Also used for phoenix setup, arize phoenix, pip install arize-phoenix, phoenix local UI, phoenix evaluations, DSPy trace viewer, open-source LLM observability, phoenix vs langtrace, openinference-instrumentation-dspy, phoenix.otel register.

Arize Phoenix — Open-Source LLM Observability for DSPy

Guide the user through setting up Arize Phoenix for DSPy tracing, visualization, and evaluation.

What is Arize Phoenix

Local mode: px.launch_app() starts a UI at http://localhost:6006 — no account needed
Cloud mode: Hosted on the Arize platform
Open source: github.com/Arize-ai/phoenix

What gets traced

When to use Phoenix

Use Phoenix when:

You want a local trace viewer with no cloud dependency
You need built-in evaluation tools (evals module)
You want an open-source solution you can self-host
You want to visually inspect what your DSPy pipeline is doing

Do NOT use Phoenix when:

You want the absolute easiest one-line setup — see /dspy-langtrace
Your team already uses W&B — see /dspy-weave
You need the full ML lifecycle (model registry, deployment) — see /dspy-mlflow

Setup

Install

pip install arize-phoenix openinference-instrumentation-dspy openinference-instrumentation-litellm

DSPy uses LiteLLM under the hood — install both instrumentors to get token counts and cost tracking.

Local mode (recommended for development)

import phoenix as px
from phoenix.otel import register

# Launch local UI at http://localhost:6006
px.launch_app()

# Register with auto-instrumentation (instruments DSPy + LiteLLM automatically)
tracer_provider = register(
    project_name="my-dspy-project",
    auto_instrument=True,
)

# All DSPy calls are now traced
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or any LiteLLM-supported provider

program = dspy.ChainOfThought("question -> answer")
result = program(question="What is DSPy?")
# View traces at http://localhost:6006

Cloud mode (Arize platform)

For teams that want persistent storage and collaboration:

import os
from phoenix.otel import register

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"
os.environ["PHOENIX_API_KEY"] = "your-api-key"

tracer_provider = register(
    project_name="my-dspy-project",
    auto_instrument=True,
)

Adding metadata to traces

Use using_attributes to attach session, user, and tag metadata:

from phoenix.otel import using_attributes

with using_attributes(
    session_id="session-123",
    user_id="user-456",
    metadata={"environment": "staging"},
    tags=["experiment-v2"],
):
    result = program(question="What is DSPy?")
    # This trace will carry the session/user/tag metadata in Phoenix

Tracing a DSPy pipeline

import phoenix as px
from phoenix.otel import register

px.launch_app()
register(project_name="rag-pipeline", auto_instrument=True)

import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or any LiteLLM-supported provider

class RAGPipeline(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.answer(context=context, question=question)

pipeline = RAGPipeline()
result = pipeline(question="How do refunds work?")
# Open http://localhost:6006 to see the trace tree:
#   RAGPipeline
#   +-- Retrieve (query, passages, latency)
#   +-- ChainOfThought (prompt, response, tokens)

Evaluations with Phoenix

Phoenix includes a built-in evals module for scoring LM outputs:

from phoenix.evals import llm_classify, OpenAIModel

# Define evaluation criteria
eval_model = OpenAIModel(model="gpt-4o-mini")

# Score traces against criteria
eval_results = llm_classify(
    dataframe=px.Client().get_spans_dataframe(),
    model=eval_model,
    template="Is this response helpful and accurate? {output}",
    rails=["helpful", "not helpful"],
)

This is useful for:

Automated quality checks: score every response in a batch
Finding failure patterns: filter by low-scoring traces
Regression testing: compare eval scores before and after changes

Phoenix vs Langtrace vs Jaeger

Decision guide

Want DSPy tracing?
|
+- Need built-in evals + dataset management? -> Arize Phoenix
+- Want easiest one-line setup? -> Langtrace (/dspy-langtrace)
+- Team already uses W&B? -> W&B Weave (/dspy-weave)
+- Need full ML lifecycle (registry, deploy)? -> MLflow (/dspy-mlflow)
+- Team already uses Jaeger? -> Jaeger (see /ai-tracing-requests)

Gotchas

Missing LiteLLM instrumentor hides token counts. Claude installs openinference-instrumentation-dspy but forgets openinference-instrumentation-litellm. Without it, traces show LM calls but token counts and costs are missing. Always install both.
Using the old DSPyInstrumentor().instrument() pattern instead of register(auto_instrument=True). The register function from phoenix.otel is the current recommended approach — it auto-discovers and instruments all installed OpenInference packages. Manual DSPyInstrumentor().instrument() still works but misses LiteLLM spans.
Forgetting px.launch_app() before register() in local mode. Without px.launch_app(), there is no local collector to receive traces. Call px.launch_app() first, then register(). In cloud mode, set PHOENIX_COLLECTOR_ENDPOINT instead.
Traces missing metadata for filtering. Without using_attributes, all traces look identical in the UI. Wrap DSPy calls in using_attributes(session_id=..., user_id=..., tags=[...]) to make traces filterable and attributable.

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

Langtrace (easiest DSPy auto-instrumentation) — /dspy-langtrace
W&B Weave (team dashboards, experiment tracking) — /dspy-weave
MLflow (full ML lifecycle) — /dspy-mlflow
Aggregate monitoring (not per-request) — /ai-monitoring
Per-request debugging (inspect_history, JSONL traces) — /ai-tracing-requests
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

Additional resources

Phoenix DSPy integration docs
Phoenix GitHub
For complete setup options and API details, see reference.md
For worked examples, see examples.md

Related Skills

lebsral/ai-watching-optimization

tools

VerifiedTrustedCommunity

See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.

6SKILL.mdUpdated May 31, 2026

lebsral/ai-watching-optimization

lebsral/dspy-miprov2

testing

VerifiedTrustedCommunity

Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.

6SKILL.mdUpdated Apr 27, 2026

lebsral/dspy-langwatch

testing

VerifiedTrustedCommunity

Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.

6SKILL.mdUpdated Apr 27, 2026

lebsral/dspy-langwatch

lebsral/dspy-gepa

data-ai

VerifiedTrustedCommunity

Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.

6SKILL.mdUpdated Apr 27, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lebsral/dspy-programming-not-prompting-lms-skills.git

# Copy into Claude Code skills folder (global)
cp -r dspy-programming-not-prompting-lms-skills/skills/dspy-phoenix ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lebsral/dspy-programming-not-prompting-lms-skills

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT