.claude/skills/ts-arize/SKILL.md
You are an expert in Arize and its open-source Phoenix library for AI observability. You help developers monitor LLM applications with tracing, evaluation, embedding analysis, drift detection, and retrieval quality metrics — using Phoenix for local development (open-source, self-hosted) and Arize platform for production monitoring at scale.
npx skillsauth add eliferjunior/Claude arizeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert in Arize and its open-source Phoenix library for AI observability. You help developers monitor LLM applications with tracing, evaluation, embedding analysis, drift detection, and retrieval quality metrics — using Phoenix for local development (open-source, self-hosted) and Arize platform for production monitoring at scale.
import phoenix as px
from phoenix.otel import register
# Launch Phoenix locally (browser UI on localhost:6006)
px.launch_app()
# Register as OpenTelemetry trace provider
tracer_provider = register(project_name="my-llm-app")
# Auto-instrument OpenAI
from openinference.instrumentation.openai import OpenAIInstrumentor
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
# Now all OpenAI calls are traced
import openai
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain CRDT to a junior dev"}],
)
# Open localhost:6006 — see traces, latency, tokens, cost
from phoenix.evals import (
HallucinationEvaluator,
QAEvaluator,
RelevanceEvaluator,
run_evals,
)
from phoenix.evals.models import OpenAIModel
eval_model = OpenAIModel(model="gpt-4o")
# Evaluate RAG quality on your traces
hallucination_eval = HallucinationEvaluator(eval_model)
qa_eval = QAEvaluator(eval_model)
relevance_eval = RelevanceEvaluator(eval_model)
# Pull traces from Phoenix
traces_df = px.Client().get_spans_dataframe(
filter_condition="span_kind == 'LLM'",
)
# Run evaluations
results = run_evals(
dataframe=traces_df,
evaluators=[hallucination_eval, qa_eval, relevance_eval],
provide_explanation=True,
)
# Results: per-trace hallucination scores, QA accuracy, retrieval relevance
# All visible in Phoenix UI with explanations
import phoenix as px
import pandas as pd
# Analyze embedding drift and clustering
embeddings_df = pd.DataFrame({
"text": documents,
"embedding": embeddings, # numpy arrays
"category": categories,
})
# Launch with embedding visualization
session = px.launch_app(
primary=px.Inferences(embeddings_df, schema=px.Schema(
embedding=px.EmbeddingColumnNames(
vector_column_name="embedding",
raw_data_column_name="text",
),
tag_column_names=["category"],
)),
)
# UMAP visualization in browser — see clusters, outliers, drift
from arize.pandas.logger import Client
from arize.utils.types import ModelTypes, Environments
arize_client = Client(
space_key=os.environ["ARIZE_SPACE_KEY"],
api_key=os.environ["ARIZE_API_KEY"],
)
# Log predictions for monitoring
arize_client.log(
dataframe=predictions_df,
model_id="support-chatbot-v2",
model_version="2.1.0",
model_type=ModelTypes.GENERATIVE_LLM,
environment=Environments.PRODUCTION,
schema=arize_schema,
)
# Arize platform: drift detection, performance dashboards, alerting
pip install arize-phoenix # Open-source local
pip install arize # Arize platform client
pip install openinference-instrumentation-openai # Auto-instrumentation
px.launch_app(); free, open-source, no data leaves your machinedevelopment
Expert guidance for Fireworks AI, the platform for running open-source LLMs (Llama, Mixtral, Qwen, etc.) with enterprise-grade speed and reliability. Helps developers integrate Fireworks' inference API, fine-tune models, and deploy custom model endpoints with function calling and structured output support.
development
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
tools
Expert guidance for Firebase, Google's platform for building and scaling web and mobile applications. Helps developers set up authentication, Firestore/Realtime Database, Cloud Functions, hosting, storage, and analytics using Firebase's SDK and CLI.
development
When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.