plugins/langfuse/skills/langfuse-cli/SKILL.md
This skill should be used when the user asks to "query Langfuse traces", "show sessions", "check LLM costs", "analyse token usage", "view observations", "get scores", "create score", "add score to trace", "query metrics", or mentions Langfuse, traces, or LLM observability. Also triggers on requests to analyse API latency, debug LLM calls, or investigate model performance. Use for prompt management tasks like "list prompts", "get prompt", "create prompt", "update prompt labels", or "deploy prompt to production". Use for dataset management tasks like "list datasets", "create dataset", "add dataset item", "view dataset runs", or "manage evaluation datasets".
npx skillsauth add tavva/ben-claude-plugins langfuse-cliInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
lf)Command-line interface for the Langfuse LLM observability platform. Query traces, sessions, observations, scores, and metrics. Manage prompts with versioning and labels. Manage datasets for evaluation.
lf traces list [OPTIONS] # List traces with filters
lf traces get <ID> # Get specific trace (--with-observations)
lf sessions list [OPTIONS] # List sessions
lf sessions show <ID> # Show session details (--with-traces)
lf observations list [OPTIONS] # List observations (spans/generations/events)
lf observations get <ID> # Get specific observation
lf scores list [OPTIONS] # List scores
lf scores get <ID> # Get specific score
lf scores create [OPTIONS] # Create a new score
lf metrics query [OPTIONS] # Query aggregated metrics
lf prompts list [OPTIONS] # List prompts
lf prompts get <NAME> # Get prompt (by label or version)
lf prompts create-text # Create text prompt (-m for commit message)
lf prompts create-chat # Create chat prompt (-m for commit message)
lf prompts label <NAME> <VER> # Set labels on prompt version
lf prompts delete <NAME> # Delete prompt
lf datasets list [OPTIONS] # List datasets
lf datasets get <NAME> # Get dataset by name
lf datasets create <NAME> # Create a new dataset
lf datasets items [OPTIONS] # List dataset items (--dataset to filter)
lf datasets item-get <ID> # Get dataset item by ID
lf datasets item-create # Create dataset item (--dataset, --input required)
lf datasets runs <DATASET> # List runs for a dataset
lf datasets run-get <DS> <RUN> # Get a specific run
lf traces list --limit 20
# Today's traces
lf traces list --from "$(date -u +%Y-%m-%dT00:00:00Z)"
# Last 24 hours
lf traces list --from "$(date -u -v-1d +%Y-%m-%dT%H:%M:%SZ)"
# Specific date range
lf traces list --from 2024-01-15T00:00:00Z --to 2024-01-16T00:00:00Z
lf traces list --user-id user123
lf traces list --session-id sess456
lf traces list --name "chat-completion"
lf traces list --tags production --tags v2
# Total cost over time
lf metrics query --view traces --measure total-cost --aggregation sum --granularity day
# Cost by model
lf metrics query --view observations --measure total-cost --aggregation sum --dimensions model
# Average cost per trace
lf metrics query --view traces --measure total-cost --aggregation avg
# P95 latency
lf metrics query --view traces --measure latency --aggregation p95
# Latency by trace name
lf metrics query --view traces --measure latency --aggregation avg --dimensions traceName
# Latency trends
lf metrics query --view traces --measure latency --aggregation p50 --granularity hour
# Total tokens
lf metrics query --view observations --measure total-tokens --aggregation sum
# Tokens by model
lf metrics query --view observations --measure total-tokens --aggregation sum --dimensions model
# Input vs output tokens
lf metrics query --view observations --measure input-tokens --aggregation sum
lf metrics query --view observations --measure output-tokens --aggregation sum
# Get trace details
lf traces get tr-abc123
# Get trace with observation metadata (recommended - faster, less noise)
lf traces get tr-abc123 --with-observations --summary
# Get trace with full observation content (large input/output fields)
lf traces get tr-abc123 --with-observations
# Fetch full content for a specific observation when needed
lf observations get obs-xyz789
# See all observations in a trace
lf observations list --trace-id tr-abc123
# Check scores for a trace
lf scores list --name accuracy
# Score a trace
lf scores create --name accuracy --value 0.95 --trace-id tr-abc123
# Score an observation with comment
lf scores create --name relevance --value 0.8 \
--observation-id obs-xyz789 --comment "Good but could be more specific"
# Categorical score
lf scores create --name sentiment --value 1 \
--data-type CATEGORICAL --trace-id tr-abc123
# Boolean score
lf scores create --name approved --value 1 \
--data-type BOOLEAN --trace-id tr-abc123
# List all prompts
lf prompts list
# Filter by label or tag
lf prompts list --label production
lf prompts list --tag summarisation
# Get production version of a prompt
lf prompts get my-prompt
# Get specific version or label
lf prompts get my-prompt --version 3
lf prompts get my-prompt --label staging
# Get raw content (for piping)
lf prompts get my-prompt --raw > prompt.txt
# Create text prompt from file
lf prompts create-text --name my-prompt -f prompt.txt
# Create with commit message documenting the change
lf prompts create-text --name my-prompt -f prompt.txt \
-m "Add context about user preferences"
# Create from stdin
echo "You are a helpful assistant." | lf prompts create-text --name my-prompt
# Create with labels and config
lf prompts create-text --name my-prompt -f prompt.txt \
--labels production --tags summarisation \
--config '{"model": "gpt-4", "temperature": 0.7}'
# Create chat prompt from JSON
lf prompts create-chat --name chat-prompt -f messages.json
# Label a version as production
lf prompts label my-prompt 5 --labels production
# Delete a prompt
lf prompts delete old-prompt
lf prompts delete my-prompt --version 2
Datasets store input/output pairs for evaluation. Items can be created manually or from existing traces.
# List all datasets
lf datasets list
# Create a dataset
lf datasets create my-eval-dataset -d "Test cases for summarisation"
# Create with metadata
lf datasets create my-eval-dataset \
-d "Test cases for summarisation" \
-m '{"version": "1.0", "owner": "team-ml"}'
# Get dataset details
lf datasets get my-eval-dataset
# Create item with input and expected output
lf datasets item-create --dataset my-eval-dataset \
--input '{"text": "Long article content..."}' \
--expected-output '{"summary": "Brief summary..."}'
# Create item from existing trace
lf datasets item-create --dataset my-eval-dataset \
--input '{"prompt": "Summarise this"}' \
--source-trace-id tr-abc123
# Create item with metadata
lf datasets item-create --dataset my-eval-dataset \
--input '{"text": "Content"}' \
--expected-output '{"result": "Expected"}' \
--metadata '{"category": "short-form", "difficulty": "easy"}'
# List items in a dataset
lf datasets items --dataset my-eval-dataset
# Get specific item
lf datasets item-get item-abc123
Runs represent evaluation executions against a dataset.
# List runs for a dataset
lf datasets runs my-eval-dataset
# Get details of a specific run
lf datasets run-get my-eval-dataset run-2024-01-15
All list and query commands support output format selection:
lf traces list --format table # Default, human-readable
lf traces list --format json # Machine-readable, full details
lf traces list --format csv # Spreadsheet-compatible
lf traces list --format markdown # Documentation-friendly
Save to file:
lf traces list --format json --output traces.json
The CLI uses profile-based configuration. Credentials resolve in order:
--public-key, --secret-key, --host)LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST)~/.config/langfuse/config.yml)lf config setup
lf traces list --profile production
The metrics command provides aggregated analytics:
Required parameters:
--view: traces or observations--measure: What to measure--aggregation: How to aggregateMeasures:
count - Number of itemslatency - Duration in millisecondsinput-tokens, output-tokens, total-tokens - Token countsinput-cost, output-cost, total-cost - Cost in USDAggregations:
count - Total countsum - Total sumavg - Averagep50, p95, p99 - Percentileshistogram - Distribution bucketsDimensions (group by):
traceName, model, environment, version, releaseGranularity (time bucketing):
auto, minute, hour, day, week, monthFor complete CLI documentation including all options:
references/cli-reference.md - Full command reference with all flagstools
This skill should be used when the user asks to "create a sprite", "run in sprite", "execute in sprite", "sprite exec", "open sprite console", "list sprites", "destroy sprite", "create checkpoint", "restore checkpoint", "proxy through sprite", or mentions Sprite, isolated environments, or persistent microVMs. Also triggers on requests to manage sprite authentication, checkpoints, or port forwarding.
tools
This skill should be used when working on frontend code, debugging UI issues, verifying visual changes, scraping web pages, testing web features, or inspecting page state. Also triggers on "open browser", "take screenshot", "navigate to URL", "scrape website", "extract page content", "check accessibility", or any web automation task. Use proactively during frontend development to verify changes visually.
tools
This skill should be used when the user asks to "send an email", "send email via Resend", "list emails", "check email status", "cancel scheduled email", "manage domains", "add domain", "verify domain DNS", "create API key", "list API keys", "manage templates", "create email template", or mentions Resend, transactional email, or email delivery. Also triggers on requests to configure Resend, check domain verification, or manage email infrastructure.
documentation
This skill should be used when the user asks to "create a README", "write a README", "generate a README", "improve my README", "make my README better", "README best practices", or mentions needing project documentation. Provides guidance for creating excellent READMEs following patterns from awesome-readme.