toolkit/packages/skills/claude-langfuse/SKILL.md
Claude Code observability skill: analyze session traces stored in Langfuse, extract learnings from corrections, identify success patterns, and propose agent/skill improvements based on historical data. Powers self-improvement through trace analysis of Claude Code sessions.
npx skillsauth add stevengonsalvez/agents-in-a-box claude-langfuseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze Claude Code session traces stored in Langfuse to extract learnings, identify patterns, and drive continuous improvement.
| Command | Description |
|---------|-------------|
| /claude-langfuse | Show help and available sub-commands |
| /claude-langfuse:status | Current session status and recent traces |
| /claude-langfuse:reflect | Analyze recent sessions for learnings and corrections |
| /claude-langfuse:insights [trace_id] | Deep analysis of a specific session |
| /claude-langfuse:patterns | Identify recurring patterns across sessions |
/claude-langfuse:status
Shows:
/claude-langfuse:reflect
/claude-langfuse:reflect --sessions 10
/claude-langfuse:reflect --since 2024-01-01
Analyzes traces to find:
/claude-langfuse:insights <trace_id>
Provides detailed analysis of a specific session including:
When this skill is invoked, execute the appropriate sub-command:
/claude-langfuse or /langfuse:status:source ~/.secrets && python3 {{HOME_TOOL_DIR}}/skills/claude-langfuse/utils/status.py
/claude-langfuse:reflect:source ~/.secrets && python3 {{HOME_TOOL_DIR}}/skills/claude-langfuse/utils/reflect.py $ARGUMENTS
/claude-langfuse:insights <trace_id>:source ~/.secrets && python3 {{HOME_TOOL_DIR}}/skills/claude-langfuse/utils/insights.py $ARGUMENTS
| Category | Examples | Target Files | |----------|----------|--------------| | Code Style | Formatting, naming conventions | agents/code-reviewer.md | | Architecture | Design patterns, boundaries | agents/solution-architect.md | | Process | Workflow, review practices | CLAUDE.md | | Tools | Preferred utilities, commands | agents/superstar-engineer.md | | Domain | Project-specific knowledge | Project CLAUDE.md |
═══════════════════════════════════════════════════════════════
LANGFUSE REFLECT - Session Analysis
═══════════════════════════════════════════════════════════════
Sessions Analyzed: 5
Time Range: 2024-01-08 to 2024-01-10
┌─────────────────────────────────────────────────────────────┐
│ HIGH CONFIDENCE SIGNALS (3 found) │
├─────────────────────────────────────────────────────────────┤
│ [1] "Never guess file paths - always verify with ls first" │
│ Session: abc123... @ 2024-01-09 │
│ Target: agents/superstar-engineer.md │
│ Proposed: Add to working rules section │
├─────────────────────────────────────────────────────────────┤
│ [2] "Always use ast-grep for code searches" │
│ Session: def456... @ 2024-01-10 │
│ Target: CLAUDE.md │
│ Proposed: Already exists - reinforce │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ MEDIUM CONFIDENCE SIGNALS (2 found) │
├─────────────────────────────────────────────────────────────┤
│ [1] User approved parallel agent pattern │
│ Session: ghi789... @ 2024-01-10 │
│ Pattern: Launch 3+ agents for independent tasks │
└─────────────────────────────────────────────────────────────┘
Apply these learnings? [Y/n/modify]:
The Langfuse hooks (session_start, pre_tool_use, post_tool_use, stop) automatically capture:
This skill reads that data to power reflection and insights.
Requires Langfuse credentials in ~/.secrets:
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_SECRET_KEY="sk-lf-..."
export LANGFUSE_HOST="https://cloud.langfuse.com" # optional
documentation
Report reflect drain spend over a time window — tokens split by cached (cache_read), uncached writes (cache_creation), and io (input+output), with a $ estimate, grouped by day / outcome / model / transcript. Reads the drainer's cost log and surfaces outlier runs and cache-reuse health (the 41.5M-token failure mode = low cache reuse + high cache writes). Use to answer "what is reflection costing me" for the last day / week.
development
Show fleet status — every claude session running on the host, merged across ainb + claude-peers broker + background jobs. Use when you need to enumerate sessions before composing an action, see which sessions have a peer registered (broker-routable) vs tmux-only, check the `summary` of each session, or pipe the list into jq for filtering. Default output: text table. Pass --format json for LLM consumption.
testing
Ordered multi-step prompts to fleet targets, ack-gated between steps via JSONL assistant-turn-end detection. Use for cycles like disconnect→reconnect→verify, or any flow where step N+1 requires step N to have completed first. The skill BLOCKS until each target's transcript shows the next assistant turn finishing OR per-step timeout fires (default 300s).
development
Center control panel — enumerate every claude session that is blocked waiting on something: a user answer (AskUserQuestion fired), an API error retry, an idle assistant turn-end with no follow-up, or an explicit WAITING: marker. Returns rich JSON with signal kind + context per session. Use this when you've stepped away from the fleet and want one place to see everything that wants your attention and answer it.