skills/phoenix-cli/SKILL.md
Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, inspect datasets, and query the GraphQL API. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
npx skillsauth add github/awesome-copilot phoenix-cliInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
px <resource> <action> # if installed globally
npx @arizeai/phoenix-cli <resource> <action> # no install required
The CLI uses singular resource commands with subcommands like list and get:
px trace list
px trace get <trace-id>
px span list
px dataset list
px dataset get <name>
export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key # if auth is enabled
Always use --format raw --no-progress when piping to jq.
px trace list --limit 20 --format raw --no-progress | jq .
px trace list --last-n-minutes 60 --limit 20 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'
px trace list --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'
px trace get <trace-id> --format raw | jq .
px trace get <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'
px span list --limit 20 # recent spans (table view)
px span list --last-n-minutes 60 --limit 50 # spans from last hour
px span list --span-kind LLM --limit 10 # only LLM spans
px span list --status-code ERROR --limit 20 # only errored spans
px span list --name chat_completion --limit 10 # filter by span name
px span list --trace-id <id> --format raw --no-progress | jq . # all spans for a trace
px span list --include-annotations --limit 10 # include annotation scores
px span list output.json --limit 100 # save to JSON file
px span list --format raw --no-progress | jq '.[] | select(.status_code == "ERROR")'
Span
name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT"|"RERANKER"|"GUARDRAIL"|"EVALUATOR"|"UNKNOWN")
status_code ("OK"|"ERROR"|"UNSET"), status_message
context.span_id, context.trace_id, parent_id
start_time, end_time
attributes (same as trace span attributes above)
annotations[] (with --include-annotations)
name, result { score, label, explanation }
Trace
traceId, status ("OK"|"ERROR"), duration (ms), startTime, endTime
rootSpan — top-level span (parent_id: null)
spans[]
name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT")
status_code ("OK"|"ERROR"), parent_id, context.span_id
attributes
input.value, output.value — raw input/output
llm.model_name, llm.provider
llm.token_count.prompt/completion/total
llm.token_count.prompt_details.cache_read
llm.token_count.completion_details.reasoning
llm.input_messages.{N}.message.role/content
llm.output_messages.{N}.message.role/content
llm.invocation_parameters — JSON string (temperature, etc.)
exception.message — set if span errored
px session list --limit 10 --format raw --no-progress | jq .
px session list --order asc --format raw --no-progress | jq '.[].session_id'
px session get <session-id> --format raw | jq .
px session get <session-id> --include-annotations --format raw | jq '.annotations'
SessionData
id, session_id, project_id
start_time, end_time
traces[]
id, trace_id, start_time, end_time
SessionAnnotation (with --include-annotations)
id, name, annotator_kind ("LLM"|"CODE"|"HUMAN"), session_id
result { label, score, explanation }
metadata, identifier, source, created_at, updated_at
px dataset list --format raw --no-progress | jq '.[].name'
px dataset get <name> --format raw | jq '.examples[] | {input, output: .expected_output}'
px experiment list --dataset <name> --format raw --no-progress | jq '.[] | {id, name, failed_run_count}'
px experiment get <id> --format raw --no-progress | jq '.[] | select(.error != null) | {input, error}'
px prompt list --format raw --no-progress | jq '.[].name'
px prompt get <name> --format text --no-progress # plain text, ideal for piping to AI
For ad-hoc queries not covered by the commands above. Output is {"data": {...}}.
px api graphql '{ projectCount datasetCount promptCount evaluatorCount }'
px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }' | jq '.data.projects.edges[].node'
px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | jq '.data.datasets.edges[].node'
px api graphql '{ evaluators { edges { node { name kind } } } }' | jq '.data.evaluators.edges[].node'
# Introspect any type
px api graphql '{ __type(name: "Project") { fields { name type { name } } } }' | jq '.data.__type.fields[]'
Key root fields: projects, datasets, prompts, evaluators, projectCount, datasetCount, promptCount, evaluatorCount, viewer.
Download Phoenix documentation markdown for local use by coding agents.
px docs fetch # fetch default workflow docs to .px/docs
px docs fetch --workflow tracing # fetch only tracing docs
px docs fetch --workflow tracing --workflow evaluation
px docs fetch --dry-run # preview what would be downloaded
px docs fetch --refresh # clear .px/docs and re-download
px docs fetch --output-dir ./my-docs # custom output directory
Key options: --workflow (repeatable, values: tracing, evaluation, datasets, prompts, integrations, sdk, self-hosting, all), --dry-run, --refresh, --output-dir (default .px/docs), --workers (default 10).
tools
End-to-end skill for building, testing, linting, versioning, and publishing a production-grade Python library to PyPI. Covers all four build backends (setuptools+setuptools_scm, hatchling, flit, poetry), PEP 440 versioning, semantic versioning, dynamic git-tag versioning, OOP/SOLID design, type hints (PEP 484/526/544/561), Trusted Publishing (OIDC), and the full PyPA packaging flow. Use for: creating Python packages, pip-installable SDKs, CLI tools, framework plugins, pyproject.toml setup, py.typed, setuptools_scm, semver, mypy, pre-commit, GitHub Actions CI/CD, or PyPI publishing.
tools
Audit MCP (Model Context Protocol) server configurations for security issues. Use this skill when: - Reviewing .mcp.json files for security risks - Checking MCP server args for hardcoded secrets or shell injection patterns - Validating that MCP servers use pinned versions (not @latest) - Detecting unpinned dependencies in MCP server configurations - Auditing which MCP servers a project registers and whether they're on an approved list - Checking for environment variable usage vs. hardcoded credentials in MCP configs - Any request like "is my MCP config secure?", "audit my MCP servers", or "check .mcp.json" keywords: [mcp, security, audit, secrets, shell-injection, supply-chain, governance]
tools
Enable code intelligence (go-to-definition, find-references, hover, type info) for any programming language by installing and configuring an LSP server for Copilot CLI. Detects the OS, installs the right server, and generates the JSON configuration (user-level or repo-level). Use when you need deeper code understanding and no LSP server is configured, or when the user asks to set up, install, or configure an LSP server.
development
Use this skill whenever the user wants to build scroll animations, scroll effects, parallax, scroll-triggered reveals, pinned sections, horizontal scroll, text animations, or any motion tied to scroll position — in vanilla JS, React, or Next.js. Covers GSAP ScrollTrigger (pinning, scrubbing, snapping, timelines, horizontal scroll, ScrollSmoother, matchMedia) and Framer Motion / Motion v12 (useScroll, useTransform, useSpring, whileInView, variants). Use this skill even if the user just says "animate on scroll", "fade in as I scroll", "make it scroll like Apple", "parallax effect", "sticky section", "scroll progress bar", or "entrance animation". Also triggers for Copilot prompt patterns for GSAP or Framer Motion code generation. Pairs with the premium-frontend-ui skill for creative philosophy and design-level polish.