dist/plugins/api-ai-langfuse/skills/api-ai-langfuse/SKILL.md
LLM observability with Langfuse — OpenTelemetry-based tracing, evaluations, prompt management, datasets, and production best practices
npx skillsauth add agents-inc/skills api-ai-langfuseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick Guide: Use the Langfuse TypeScript SDK (built on OpenTelemetry) to add observability to LLM applications. Install
@langfuse/tracing,@langfuse/otel, and@opentelemetry/sdk-nodefor core tracing. UsestartActiveObservation()for automatic context propagation orobserve()to wrap functions. Use@langfuse/openaiwithobserveOpenAI()for zero-config OpenAI tracing. UseLangfuseClientfrom@langfuse/clientfor prompt management, scores, and datasets. Always callforceFlush()orsdk.shutdown()in short-lived processes.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST import and register instrumentation.ts at the top of your entry point BEFORE any other imports -- OpenTelemetry must instrument modules before they are loaded)
(You MUST call forceFlush() or sdk.shutdown() in short-lived processes (serverless, scripts, CLI tools) -- events are batched and will be lost without explicit flushing)
(You MUST use @langfuse/openai with observeOpenAI() for OpenAI SDK tracing -- do NOT manually create generation observations for OpenAI calls when the wrapper handles it automatically)
(You MUST set LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_BASE_URL via environment variables -- never hardcode credentials)
(You MUST use startActiveObservation() or observe() for nested tracing -- manual startObservation() requires explicit .end() calls and does NOT propagate context automatically)
</critical_requirements>
Auto-detection: Langfuse, langfuse, @langfuse/tracing, @langfuse/otel, @langfuse/client, @langfuse/openai, LangfuseSpanProcessor, LangfuseClient, startActiveObservation, startObservation, observeOpenAI, langfuse.score, langfuse.prompt, langfuse.dataset, LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, forceFlush
When to use:
Key patterns covered:
LangfuseSpanProcessorstartActiveObservation, observe, and manual startObservationobserveOpenAI()When NOT to use:
console.log debugging -- Langfuse is for structured production observabilityLangfuse provides open-source LLM observability built on OpenTelemetry. The SDK (v4+, August 2025) is a ground-up rewrite using OTel as the tracing backbone, meaning traces integrate naturally with the broader observability ecosystem.
Core principles:
@langfuse/tracing for instrumentation, @langfuse/client for prompts/scores/datasets, @langfuse/openai for OpenAI auto-instrumentation. Install only what you need.startActiveObservation() automatically propagates parent-child relationships. Nested observations inherit context without manual ID threading.generation, agent, tool, retriever, evaluator, embedding) provide semantic meaning to traces, enabling richer dashboard views and filtering.When to use Langfuse:
When NOT to use:
console.log is sufficient for local developmentCreate an instrumentation.ts file and import it at the top of your entry point.
// instrumentation.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { LangfuseSpanProcessor } from "@langfuse/otel";
const sdk = new NodeSDK({
spanProcessors: [new LangfuseSpanProcessor()],
});
sdk.start();
export { sdk };
// index.ts -- import instrumentation FIRST
import "./instrumentation";
// All other imports AFTER instrumentation
import { startActiveObservation } from "@langfuse/tracing";
Why good: OTel must instrument modules before they are loaded; importing instrumentation first ensures all subsequent imports are traced automatically
// BAD: importing instrumentation after other modules
import { startActiveObservation } from "@langfuse/tracing";
import "./instrumentation"; // TOO LATE -- tracing won't capture earlier imports
Why bad: OpenAI/LangChain auto-instrumentation requires OTel to be initialized before those SDKs are imported
See: examples/core.md for environment variables, sampling, masking, and production configuration
The primary instrumentation pattern. Creates an observation, makes it the active context, and automatically ends it when the callback completes.
import { startActiveObservation } from "@langfuse/tracing";
async function handleRequest(query: string): Promise<string> {
return await startActiveObservation("handle-request", async (span) => {
span.update({ input: { query } });
// Nested observation -- automatically becomes a child
const result = await startActiveObservation(
"process-query",
async (child) => {
child.update({ input: { query } });
const answer = await callLLM(query);
child.update({ output: { answer } });
return answer;
},
);
span.update({ output: { result } });
return result;
});
}
Why good: Automatic context propagation, automatic end on callback completion, nesting creates parent-child hierarchy without manual ID management
// BAD: using startObservation without ending it
import { startObservation } from "@langfuse/tracing";
const span = startObservation("my-span");
await doWork();
// span.end() never called -- observation stays open forever
Why bad: Manual startObservation requires explicit .end() calls; forgetting creates open-ended observations
See: examples/tracing.md for observe wrapper, observation types, metadata, and manual tracing
Wraps a function to automatically capture inputs, outputs, timings, and errors.
import { observe } from "@langfuse/tracing";
const classifyIntent = observe(
async (query: string) => {
const result = await callLLM(query);
return result.intent;
},
{ name: "classify-intent", asType: "generation" },
);
// Usage -- automatically traced
const intent = await classifyIntent("Book a flight to Paris");
Why good: Declarative tracing, inputs/outputs captured automatically, asType tags the observation type for richer dashboard filtering
Use observeOpenAI() to wrap the OpenAI client for automatic tracing of all calls.
import OpenAI from "openai";
import { observeOpenAI } from "@langfuse/openai";
const openai = observeOpenAI(new OpenAI());
// All calls automatically traced with model, tokens, cost
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
Why good: Zero manual instrumentation, captures model name, token counts, estimated costs, latency, and streaming metrics automatically
// BAD: manually creating generation observations for OpenAI calls
await startActiveObservation("openai-call", async (span) => {
const result = await rawOpenai.chat.completions.create({ ... });
span.update({
model: "gpt-4o",
input: messages,
output: result.choices[0].message.content,
});
}, { asType: "generation" });
Why bad: observeOpenAI handles all of this automatically with more accurate token/cost data; manual tracking is error-prone and duplicates effort
See: examples/openai-integration.md for streaming, custom attributes, and token tracking on streams
Fetch versioned prompts, compile with variables, and link to traces.
import { LangfuseClient } from "@langfuse/client";
const langfuse = new LangfuseClient();
// Fetch a text prompt (production label by default)
const prompt = await langfuse.prompt.get("summarize-article");
const compiled = prompt.compile({ topic: "AI safety", length: "brief" });
// -> "Write a brief summary about AI safety."
// Fetch a chat prompt
const chatPrompt = await langfuse.prompt.get("assistant-v2", { type: "chat" });
const messages = chatPrompt.compile({ userName: "Alice" });
// -> [{ role: "system", content: "You are helping Alice..." }, ...]
Why good: Centralized prompt management with versioning, labels for A/B testing, variable compilation, and built-in caching
See: examples/prompt-management.md for versioning, labels, cache control, and linking prompts to traces
Attach quality measurements to traces and observations.
import { LangfuseClient } from "@langfuse/client";
const langfuse = new LangfuseClient();
// Numeric score
langfuse.score.create({
traceId: "trace-123",
name: "relevance",
value: 0.95,
dataType: "NUMERIC",
});
// Categorical score
langfuse.score.create({
traceId: "trace-123",
name: "quality",
value: "good",
dataType: "CATEGORICAL",
});
// Boolean score (0 or 1)
langfuse.score.create({
traceId: "trace-123",
name: "contains-hallucination",
value: 0,
dataType: "BOOLEAN",
});
// Score a specific observation within a trace
langfuse.score.create({
traceId: "trace-123",
observationId: "obs-456",
name: "accuracy",
value: 0.88,
dataType: "NUMERIC",
});
// Flush in short-lived processes
await langfuse.score.flush();
Why good: Three data types cover all evaluation needs, scores attach at trace or observation level, fire-and-forget API with batching
See: examples/scores-datasets.md for active observation scoring, session scores, datasets, and experiments
Always flush in short-lived processes. The SDK batches events and sends them asynchronously.
import { sdk } from "./instrumentation";
import { LangfuseClient } from "@langfuse/client";
import { LangfuseSpanProcessor } from "@langfuse/otel";
const langfuse = new LangfuseClient();
async function main() {
// ... do work ...
// Flush scores
await langfuse.score.flush();
// Shutdown OTel SDK (flushes all pending spans)
await sdk.shutdown();
}
main();
Why good: Explicit flush/shutdown ensures all events are sent before the process exits; without this, data is silently lost in serverless and scripts
// BAD: exiting without flushing
async function handler() {
await startActiveObservation("my-trace", async (span) => {
span.update({ output: "done" });
});
// Process exits -- batched events never sent
}
Why bad: Langfuse batches events locally; if the process exits before the flush interval, events are lost
</patterns>Reduce costs by sampling a subset of traces:
import { TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base";
const sdk = new NodeSDK({
sampler: new TraceIdRatioBasedSampler(0.2), // Sample 20% of traces
spanProcessors: [new LangfuseSpanProcessor()],
});
Or via environment variable: LANGFUSE_SAMPLE_RATE=0.2
LANGFUSE_FLUSH_AT (default 10) and LANGFUSE_FLUSH_INTERVAL (default 1s) for your workloadshouldExportSpan on LangfuseSpanProcessor to drop noisy non-LLM spansmask option to avoid storing sensitive datastream_options: { include_usage: true } on OpenAI streaming calls so observeOpenAI captures token counts<decision_framework>
What do you need?
+-- Tracing LLM calls?
| +-- YES -> npm install @langfuse/tracing @langfuse/otel @opentelemetry/sdk-node
| +-- Also using OpenAI SDK?
| +-- YES -> npm install @langfuse/openai
+-- Prompt management, scores, or datasets?
| +-- YES -> npm install @langfuse/client
+-- Both tracing AND client features?
+-- YES -> Install all: @langfuse/tracing @langfuse/otel @opentelemetry/sdk-node @langfuse/client
How do you want to instrument?
+-- Wrapping a function? -> observe() (declarative, auto-captures inputs/outputs)
+-- Block of code with nesting? -> startActiveObservation() (context propagation, auto-end)
+-- Need manual start/end control? -> startObservation() (requires explicit .end())
+-- OpenAI SDK calls? -> observeOpenAI() (zero-config auto-instrumentation)
+-- Update active span without reference? -> updateActiveObservation()
What is this observation?
+-- LLM call (prompt -> completion) -> "generation"
+-- AI agent decision-making step -> "agent"
+-- External API or function call -> "tool"
+-- Vector store or DB retrieval -> "retriever"
+-- Quality assessment step -> "evaluator"
+-- Embedding creation -> "embedding"
+-- Link between application steps -> "chain"
+-- Content safety / jailbreak check -> "guardrail"
+-- Generic duration operation -> "span" (default)
+-- Point-in-time event -> "event"
</decision_framework>
<red_flags>
High Priority Issues:
instrumentation.ts before other modules (auto-instrumentation silently fails)forceFlush() or sdk.shutdown() (events are silently lost)LANGFUSE_SECRET_KEY or LANGFUSE_PUBLIC_KEY in source code (credential exposure)observeOpenAI() would handle it automatically (duplicated effort, less accurate data)startObservation() without calling .end() (observation stays open indefinitely)Medium Priority Issues:
stream_options: { include_usage: true } on OpenAI streaming calls (token counts missing from observeOpenAI traces)langfuse.score.flush() in short-lived processes (scores are batched and may be lost)startObservation() when startActiveObservation() would work (no automatic context propagation or auto-end)asType on observations (all observations appear as generic spans, losing semantic meaning)LANGFUSE_BASE_URL for self-hosted instances (defaults to cloud.langfuse.com)Common Mistakes:
@langfuse/openai without setting up the OTel NodeSDK first -- the OpenAI wrapper requires OTel context to send tracesLangfuseClient (from @langfuse/client, for prompts/scores/datasets) with the OTel tracing functions (from @langfuse/tracing)prompt.compile() without matching all {{variable}} placeholders -- unmatched variables remain as literal {{name}} in outputlangfuse.score.create() with a value of type string for NUMERIC scores or number for CATEGORICAL scores (type mismatch)startActiveObservation which requires OTelGotchas & Edge Cases:
observeOpenAI() does NOT support the OpenAI Assistants API -- only Chat Completions and Responses APIshouldExportSpan to include it.LangfuseClient.prompt.get() caches prompts with a default TTL. If you update a prompt and don't see changes, set cacheTtlSeconds: 0 to bypass caching.0 or 1), not JavaScript booleans (true/false).score.create() is fire-and-forget (synchronous) -- it queues the score for batched delivery. You only need await on flush().evaluation/qa-dataset) must be URL-encoded when used as path parameters.Langfuse class, trace(), span(), generation() from v3 are replaced by OTel-based APIs.</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST import and register instrumentation.ts at the top of your entry point BEFORE any other imports -- OpenTelemetry must instrument modules before they are loaded)
(You MUST call forceFlush() or sdk.shutdown() in short-lived processes (serverless, scripts, CLI tools) -- events are batched and will be lost without explicit flushing)
(You MUST use @langfuse/openai with observeOpenAI() for OpenAI SDK tracing -- do NOT manually create generation observations for OpenAI calls when the wrapper handles it automatically)
(You MUST set LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_BASE_URL via environment variables -- never hardcode credentials)
(You MUST use startActiveObservation() or observe() for nested tracing -- manual startObservation() requires explicit .end() calls and does NOT propagate context automatically)
Failure to follow these rules will produce silent data loss, missing traces, or credential exposure in LLM observability.
</critical_reminders>
development
Material Design component library for Vue 3
development
VitePress 1.x — Vue-powered static site generator for documentation sites, built on Vite
tools
Docusaurus 3.x documentation framework — site configuration, docs/blog plugins, sidebars, versioning, MDX, swizzling, and deployment
development
TanStack Form patterns - useForm, form.Field, validators, arrays, linked fields, createFormHook, type safety