skills/sentry-setup-ai-monitoring/SKILL.md
Setup Sentry AI Agent Monitoring in any project. Use when asked to monitor LLM calls, track AI agents, track conversations, or instrument OpenAI/Anthropic/Vercel AI/LangChain/Google GenAI/Pydantic AI. Detects installed AI SDKs and configures appropriate integrations.
npx skillsauth add getsentry/sentry-for-ai sentry-setup-ai-monitoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
All Skills > Feature Setup > AI Monitoring
Configure Sentry to track LLM calls, agent executions, tool usage, and token consumption.
Important: The SDK versions, API names, and code samples below are examples. Always verify against docs.sentry.io before implementing, as APIs and minimum versions may have changed.
AI monitoring requires tracing enabled (tracesSampleRate > 0).
Prompt and output recording captures user content that is likely PII. Before enabling send-default-PII (sendDefaultPii: true in JavaScript or send_default_pii=True in Python) or per-integration prompt/output capture (recordInputs/recordOutputs in JS, include_prompts in Python), confirm:
Ask the user whether they want prompt/output capture enabled. Do not enable prompt/output capture without explicit confirmation. Use tracesSampleRate: 1.0 only in development; in production, use a lower value or a tracesSampler function.
Always detect installed AI SDKs before configuring:
# JavaScript
grep -E '"(openai|@anthropic-ai/sdk|ai|@langchain|@google/genai)"' package.json
# Python
grep -E '(openai|anthropic|langchain|huggingface)' requirements.txt pyproject.toml 2>/dev/null
After detecting AI SDKs, check the current sampling configuration:
# JavaScript
grep -E 'tracesSampleRate|tracesSampler' sentry.*.config.* instrument.* src/instrument.* app/instrument.* 2>/dev/null
# Python
grep -E 'traces_sample_rate|traces_sampler' *.py **/*.py 2>/dev/null
If tracesSampleRate / traces_sample_rate is below 1.0 AND no tracesSampler / traces_sampler is configured:
Ask the user:
"Your current sample rate is {rate}. Agent runs are sampled as complete span trees — if the root span is dropped, all child gen_ai spans are lost. For full AI visibility, gen_ai-related transactions should be sampled at 100%. Would you like me to set up a
tracesSamplerthat keeps AI traces at 100% while sampling other traffic at your current rate?"
If user confirms, read ${SKILL_ROOT}/references/sampling.md for implementation patterns.
| Package | Integration | Min Sentry SDK | Auto? |
|---------|-------------|----------------|-------|
| openai | openAIIntegration() | 10.53.0 | Yes |
| @anthropic-ai/sdk | anthropicAIIntegration() | 10.53.0 | Yes |
| ai (Vercel) | vercelAIIntegration() | 10.53.0 | Yes* |
| @langchain/* | langChainIntegration() | 10.53.0 | Yes |
| @langchain/langgraph | langGraphIntegration() | 10.53.0 | Yes |
| @google/genai | googleGenAIIntegration() | 10.53.0 | Yes |
*Vercel AI: 10.53.0+ required. Requires experimental_telemetry per-call.
Integrations auto-enable when the AI package is installed — no explicit registration needed:
| Package | Auto? | Notes |
|---------|-------|-------|
| openai | Yes | Includes OpenAI Agents SDK |
| anthropic | Yes | |
| langchain / langgraph | Yes | |
| huggingface_hub | Yes | |
| google-genai | Yes | |
| pydantic-ai | Yes | |
| litellm | No | Requires explicit integration |
| mcp (Model Context Protocol) | Yes | |
Just ensure tracing is enabled. Integrations auto-enable when the AI package is installed:
Sentry.init({
dsn: "YOUR_DSN",
tracesSampleRate: 1.0, // Lower in production (e.g., 0.1)
streamGenAiSpans: true, // SDK ≥10.53.0
// OpenAI, Anthropic, Google GenAI, LangChain integrations auto-enable in Node.js
});
To customize (e.g., enable prompt capture after user confirmation — see Data Capture Warning):
Sentry.init({
dsn: "YOUR_DSN",
tracesSampleRate: 1.0,
streamGenAiSpans: true,
sendDefaultPii: true,
integrations: [
Sentry.openAIIntegration({
// recordInputs/recordOutputs default to true when sendDefaultPii is true
}),
],
});
In browser-side code or Next.js meta-framework apps, auto-instrumentation is not available. Wrap the client manually:
import OpenAI from "openai";
import * as Sentry from "@sentry/nextjs"; // or @sentry/react, @sentry/browser
const openai = Sentry.instrumentOpenAiClient(new OpenAI());
// Use 'openai' client as normal
Sentry.init({
dsn: "YOUR_DSN",
tracesSampleRate: 1.0,
streamGenAiSpans: true,
sendDefaultPii: true,
integrations: [
Sentry.langChainIntegration(),
Sentry.langGraphIntegration(),
],
});
Add to sentry.edge.config.ts for Edge runtime:
Sentry.init({
dsn: "YOUR_DSN",
tracesSampleRate: 1.0,
streamGenAiSpans: true,
sendDefaultPii: true,
integrations: [Sentry.vercelAIIntegration()],
});
Enable telemetry per-call:
await generateText({
model: openai("gpt-4o"),
prompt: "Hello",
experimental_telemetry: {
isEnabled: true,
recordInputs: true,
recordOutputs: true,
},
});
Integrations auto-enable — just init with tracing. Only add explicit imports to customize options:
import sentry_sdk
sentry_sdk.init(
dsn="YOUR_DSN",
traces_sample_rate=1.0, # Lower in production (e.g., 0.1)
stream_gen_ai_spans=True, # SDK ≥2.60.0
send_default_pii=True,
# Integrations auto-enable when the AI package is installed.
# Only specify explicitly to customize (e.g., include_prompts):
# integrations=[OpenAIIntegration(include_prompts=True)],
)
Use when no supported SDK is detected. Follow the canonical Sentry Conventions for gen_ai.* attributes — the JS docs may lag behind; do not set attributes marked deprecated in the conventions.
| op | Span name pattern | Purpose |
|------|---------------------|---------|
| gen_ai.{operation} (e.g. gen_ai.chat, gen_ai.request) | {operation} {model} (e.g. chat gpt-4o) | Individual LLM call |
| gen_ai.invoke_agent | invoke_agent {agent_name} | Agent execution lifecycle |
| gen_ai.execute_tool | execute_tool {tool_name} | Tool/function call |
| gen_ai.handoff | handoff from {source} to {target} | Agent-to-agent transition |
For LLM-call spans, the op follows the pattern gen_ai.{gen_ai.operation.name} — use gen_ai.chat, gen_ai.embeddings, gen_ai.generate_content, or gen_ai.text_completion where the operation is known. Span attributes only accept primitives; arrays/objects must be JSON-stringified.
const inputMessages = [
{ role: "user", parts: [{ type: "text", content: "Tell me a joke" }] },
];
await Sentry.startSpan({
op: "gen_ai.chat",
name: "chat gpt-4o",
attributes: {
"gen_ai.request.model": "gpt-4o",
"gen_ai.operation.name": "chat",
"gen_ai.input.messages": JSON.stringify(inputMessages),
},
}, async (span) => {
const result = await llmClient.complete(inputMessages);
const outputMessages = [
{
role: "assistant",
parts: [{ type: "text", content: result.text }],
finish_reason: result.finishReason,
},
];
span.setAttribute("gen_ai.output.messages", JSON.stringify(outputMessages));
span.setAttribute("gen_ai.usage.input_tokens", result.inputTokens);
span.setAttribute("gen_ai.usage.output_tokens", result.outputTokens);
return result;
});
Common (all AI spans):
| Attribute | Required | Description |
|-----------|----------|-------------|
| gen_ai.request.model | Yes | Model identifier (e.g., gpt-4o, claude-sonnet-4-6) |
| gen_ai.operation.name | No | Operation label (chat, embeddings, invoke_agent, execute_tool, handoff, etc.) |
| gen_ai.agent.name | No | Agent name (set on agent and tool spans) |
Request / response content (PII — enable only after confirming; see Data Capture Warning above):
| Attribute | Description |
|-----------|-------------|
| gen_ai.input.messages | JSON-stringified array of input messages. Each item uses {role, parts} where parts is [{type, content}]; role is "user", "assistant", "tool", or "system" |
| gen_ai.output.messages | JSON-stringified array of response messages (text + tool calls), same shape as inputs |
| gen_ai.system_instructions | System prompt passed to the model |
| gen_ai.tool.definitions | JSON-stringified list of tools available to the model |
Token usage:
| Attribute | Description |
|-----------|-------------|
| gen_ai.usage.input_tokens | Total input tokens — includes cached tokens |
| gen_ai.usage.input_tokens.cached | Subset of input tokens served from cache |
| gen_ai.usage.input_tokens.cache_write | Tokens written to cache while processing input |
| gen_ai.usage.output_tokens | Total output tokens — includes reasoning tokens |
| gen_ai.usage.output_tokens.reasoning | Subset of output tokens used for reasoning |
| gen_ai.usage.total_tokens | Sum of input + output tokens |
Tool spans (gen_ai.execute_tool):
| Attribute | Description |
|-----------|-------------|
| gen_ai.tool.name | Tool identifier |
| gen_ai.tool.description | Human-readable tool description |
| gen_ai.tool.call.arguments | JSON-stringified tool arguments |
| gen_ai.tool.call.result | JSON-stringified tool result |
Sentry uses token attributes to calculate model costs. Cached and reasoning tokens are subsets, not separate counts — gen_ai.usage.input_tokens already includes gen_ai.usage.input_tokens.cached, and gen_ai.usage.output_tokens already includes gen_ai.usage.output_tokens.reasoning.
Sentry subtracts the cached/reasoning counts from the totals to compute the uncached/non-reasoning portion. Reporting a cached or reasoning count greater than its total produces negative costs in the dashboard.
Example — 100 input tokens total, 90 served from cache:
input_tokens = 100, input_tokens.cached = 90input_tokens = 10, input_tokens.cached = 90 (cached larger than total → negative cost)The same rule applies to gen_ai.usage.output_tokens vs. gen_ai.usage.output_tokens.reasoning.
After configuring, make an LLM call and check the Sentry Traces dashboard. AI spans appear with gen_ai.* operations showing model, token counts, and latency.
Conversations gives a readable, chat-style view of past sessions with your AI agent. It groups spans by gen_ai.conversation.id — so whether a user talked across multiple traces or multiple conversations happened inside one trace, you get a timeline of every message, tool call, and response.
Find it at Explore > Conversations in Sentry.
tracesSampleRate > 0streamGenAiSpans: true (JS SDK >=10.53.0) / stream_gen_ai_spans=True (Python SDK >=2.60.0) — required so AI spans are sent as standalone items. Without this, spans with large inputs/outputs can hit transaction payload size limits and be dropped.gen_ai.input.messages and gen_ai.output.messages attributes. Set sendDefaultPii: true (JS) / send_default_pii=True (Python). Without it, conversations appear empty.Some integrations (OpenAI Agents SDK for Python, OpenAI SDK for Node) infer the conversation ID automatically. For all others, set it manually.
import * as Sentry from "@sentry/node"; // or @sentry/nextjs, @sentry/nestjs, etc.
// Set at the start of a conversation
Sentry.setConversationId("conv_abc123");
// All subsequent AI calls carry gen_ai.conversation.id: "conv_abc123"
await openai.chat.completions.create({
model: "gpt-5.5",
messages: [{ role: "user", content: "Hello" }],
});
import sentry_sdk.ai
# Set at the start of a conversation
sentry_sdk.ai.set_conversation_id("conv_abc123")
# All subsequent AI calls carry gen_ai.conversation.id = "conv_abc123"
Some integrations infer the conversation ID automatically. For example, the Python OpenAI integration picks it up when you use the conversation parameter:
import openai
import sentry_sdk
sentry_sdk.init(...)
conversation = openai.conversations.create()
response = openai.responses.create(
model="gpt-5.4",
input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}],
conversation=conversation.id # automatically sets gen_ai.conversation.id
)
These are independent concepts:
| Issue | Solution |
|-------|----------|
| AI spans not appearing | Verify tracesSampleRate > 0, check SDK version |
| Token counts missing | Some providers don't return tokens for streaming |
| Negative or wrong costs in dashboard | Cached/reasoning tokens are subsets of totals — see Token Usage and Cost Calculation |
| Prompts not captured | Set sendDefaultPii: true (JS) or send_default_pii=True (Python); use recordInputs/include_prompts only for explicit overrides |
| Vercel AI not working | Add experimental_telemetry to each call |
| Conversations view empty | Ensure streamGenAiSpans: true / stream_gen_ai_spans=True, sendDefaultPii: true / send_default_pii=True, and a conversation ID is set |
development
Decide which Sentry signal to reach for when instrumenting code — error, span, span attribute, log, or metric. Use when adding instrumentation and unsure whether something should be a log vs a span vs a metric, when deciding "what to instrument where", when reviewing instrumentation for gaps, or when a coding agent needs a rule for choosing between errors, traces, logs, and metrics. This skill decides WHAT to emit; the sentry-*-sdk skills handle HOW to set each pillar up.
development
Full Sentry SDK setup for React Native and Expo. Use when asked to "add Sentry to React Native", "install @sentry/react-native", "setup Sentry in Expo", or configure error monitoring, tracing, profiling, session replay, or logging for React Native applications. Supports Expo managed, Expo bare, and vanilla React Native.
development
Full Sentry SDK setup for Python. Use when asked to "add Sentry to Python", "install sentry-sdk", "setup Sentry in Python", or configure error monitoring, tracing, profiling, logging, metrics, crons, or AI monitoring for Python applications. Supports Django, Flask, FastAPI, Celery, Starlette, AIOHTTP, Tornado, and more.
development
Configure specific Sentry features beyond basic SDK setup. Use when asked to monitor AI/LLM calls, set up OpenTelemetry pipelines, or create alerts and notifications.