skills/braintrust-tracing/SKILL.md
Braintrust tracing for Claude Code - hook architecture, sub-agent correlation, debugging
npx skillsauth add rubicanjr/FinCognis braintrust-tracingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive guide to tracing Claude Code sessions in Braintrust, including sub-agent correlation.
PARENT SESSION
+---------------------+
| SessionStart |
| (creates root) |
+----------+----------+
|
+----------v----------+
| UserPromptSubmit |
| (creates Turn) |
+----------+----------+
|
+--------------------+--------------------+
| | |
+---------v--------+ +--------v--------+ +--------v--------+
| PostToolUse | | PostToolUse | | PreToolUse |
| (Read span) | | (Edit span) | | (Task - inject) |
+------------------+ +-----------------+ +--------+--------+
|
+----------v----------+
| SUB-AGENT |
| SessionStart |
| (NEW root_span_id)|
+----------+----------+
|
+----------v----------+
| SubagentStop |
| (has session_id) |
+---------------------+
| Hook | Trigger | Creates | Key Fields |
|------|---------|---------|------------|
| SessionStart | Session begins | Root span | session_id, root_span_id |
| UserPromptSubmit | User sends prompt | Turn span | prompt, turn_number |
| PreToolUse | Before tool runs | (modifies Task prompts) | tool_input.prompt |
| PostToolUse | After tool runs | Tool span | tool_name, input, output |
| Stop | Turn completes | LLM spans | model, tokens, tool_calls |
| SubagentStop | Sub-agent finishes | (no span) | session_id of sub-agent |
| SessionEnd | Session ends | (finalizes root) | turn_count, tool_count |
Session (task span) - root_span_id = session_id
|
+-- Turn 1 (task span)
| |
| +-- claude-sonnet (llm span) - model call with tool_use
| +-- Read (tool span)
| +-- Edit (tool span)
| +-- claude-sonnet (llm span) - response after tools
|
+-- Turn 2 (task span)
| |
| +-- claude-sonnet (llm span)
| +-- Task (tool span) -----> [Sub-agent session - SEPARATE trace]
| +-- claude-sonnet (llm span)
|
+-- Turn 3 ...
SessionStart doesn't receive the Task prompt.
We tried injecting trace context into Task prompts via PreToolUse:
# PreToolUse hook injects:
[BRAINTRUST_TRACE_CONTEXT]
{"root_span_id": "abc", "parent_span_id": "xyz", "project_id": "123"}
[/BRAINTRUST_TRACE_CONTEXT]
But SessionStart only receives session metadata, not the modified prompt. The injected context is lost.
Task spans in parent session contain everything:
agentId - identifier for the sub-agent runtotalTokens, totalToolUseCount - metricscontent - full agent response/summarytool_input.prompt - original task prompttool_input.subagent_type - agent type (e.g., "oracle")SubagentStop hook receives the sub-agent's session_id:
root_span_idCurrent state: Sub-agents create orphaned traces (new root_span_id).
Correlation method:
agentId or timing with orphaned tracessession_id = its trace's root_span_idFuture solution (not yet implemented):
SubagentStop fires -> writes session_id to temp file
PostToolUse (Task) -> reads temp file -> adds child_session_id to Task span metadata
This would link: Task.agentId + Task.child_session_id -> orphaned trace root_span_id
~/.claude/state/braintrust_sessions/
{session_id}.json # Per-session state
Each session file contains:
{
"root_span_id": "abc-123",
"project_id": "proj-456",
"turn_count": 5,
"tool_count": 23,
"current_turn_span_id": "turn-789",
"current_turn_start": 1703456789,
"started": "2025-12-24T10:00:00.000Z",
"is_subagent": false
}
~/.claude/state/braintrust_global.json # Cached project_id
~/.claude/state/braintrust_hook.log # Debug log
# View hook logs in real-time
tail -f ~/.claude/state/braintrust_hook.log
# Check if session has state
cat ~/.claude/state/braintrust_sessions/*.json | jq -s '.'
# Verify environment
echo "TRACE_TO_BRAINTRUST=$TRACE_TO_BRAINTRUST"
echo "BRAINTRUST_API_KEY=${BRAINTRUST_API_KEY:+set}"
# List recent sessions
uv run python -m runtime.harness scripts/braintrust_analyze.py --sessions 5
# Analyze last session
uv run python -m runtime.harness scripts/braintrust_analyze.py --last-session
# Replay specific session
uv run python -m runtime.harness scripts/braintrust_analyze.py --replay <session-id>
# Find sub-agent traces (orphaned roots)
uv run python -m runtime.harness scripts/braintrust_analyze.py --agent-stats
# Enable verbose logging
export BRAINTRUST_CC_DEBUG=true
# Test hooks manually
echo '{"session_id":"test-123","type":"resume"}' | \
bash "$CLAUDE_PROJECT_DIR/.claude/plugins/braintrust-tracing/hooks/session_start.sh"
# Test PreToolUse (Task injection)
echo '{"session_id":"test-123","tool_name":"Task","tool_input":{"prompt":"test"}}' | \
bash "$CLAUDE_PROJECT_DIR/.claude/plugins/braintrust-tracing/hooks/pre_tool_use.sh"
No traces appearing:
TRACE_TO_BRAINTRUST=true in .claude/settings.local.jsonecho $BRAINTRUST_API_KEYtail -20 ~/.claude/state/braintrust_hook.logSub-agents not linking:
--agent-stats to find agent activityagentId in parent Task spanMissing spans:
current_turn_span_id in session stateState corruption:
rm ~/.claude/state/braintrust_sessions/*.jsonrm ~/.claude/state/braintrust_global.json| File | Purpose |
|------|---------|
| .claude/plugins/braintrust-tracing/hooks/common.sh | Shared utilities, API, state management |
| .claude/plugins/braintrust-tracing/hooks/session_start.sh | Creates root span, handles sub-agent context |
| .claude/plugins/braintrust-tracing/hooks/user_prompt_submit.sh | Creates Turn spans per user message |
| .claude/plugins/braintrust-tracing/hooks/pre_tool_use.sh | Injects trace context into Task prompts |
| .claude/plugins/braintrust-tracing/hooks/post_tool_use.sh | Creates tool spans, captures agent/skill metadata |
| .claude/plugins/braintrust-tracing/hooks/stop_hook.sh | Creates LLM spans, finalizes Turns |
| .claude/plugins/braintrust-tracing/hooks/session_end.sh | Finalizes session, triggers learning extraction |
| scripts/braintrust_analyze.py | Query and analyze traced sessions |
| ~/.claude/state/braintrust_sessions/ | Per-session state files |
| ~/.claude/state/braintrust_hook.log | Debug log |
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| TRACE_TO_BRAINTRUST | Yes | - | Set to "true" to enable |
| BRAINTRUST_API_KEY | Yes | - | API key for Braintrust |
| BRAINTRUST_CC_PROJECT | No | claude-code | Project name |
| BRAINTRUST_CC_DEBUG | No | false | Verbose logging |
| BRAINTRUST_API_URL | No | https://api.braintrust.dev | API endpoint |
Attempted: Inject trace context via PreToolUse into Task prompts.
Result: Failed - SessionStart only receives session metadata, not the prompt.
Discovery: Task spans already contain rich sub-agent data:
metadata.agent_type - agent type from subagent_typemetadata.skill_name - skill from Skill tooltool_input - full prompt sent to agenttool_output - agent responseCurrent correlation path:
agentId and timingroot_span_id = session_idsession_idsession_id linkFuture work: Write child_session_id to Task span metadata from PostToolUse after SubagentStop.
agentId, sub-agent has separate session_id1. Prompt injection via PreToolUse
SessionStart hook only receives session metadata (session_id, type, cwd), NOT the prompt. Injected trace context is never seen.
The hook receives:
{
"session_id": "...",
"type": "start|resume|compact|clear",
"cwd": "...",
"env": {...}
}
No prompt field exists - context injection is impossible at SessionStart.
2. SubagentStop → PostToolUse file handoff
Race condition. These are independent async hooks with no timing guarantees:
3. PreToolUse correlation files
SessionStart can't access the task_span_id because it has no context about which Task spawned it. PreToolUse modifies prompts but doesn't create a reliably accessible state file that SessionStart can find.
Post-hoc matching for dataset building:
Parent session Task spans contain:
agentId - identifier for the sub-agent runtotalTokens, totalToolUseCount - aggregated metricscontent - full agent response/summarytool_input.prompt - original task prompttool_input.subagent_type - agent type (e.g., "oracle")Sub-agent sessions contain:
session_id (equals orphaned trace root_span_id)Correlation strategy:
root_span_id)subagent_type from Task promptsession_id (can be captured and logged)SessionStart input is intentionally minimal - it contains no prompt or tool context:
interface SessionStartInput {
session_id: string;
type: "start" | "resume" | "compact" | "clear";
cwd: string;
env: { [key: string]: string };
// NO: prompt, tool_context, task_span_id, parent_span_id
}
This design boundary prevents real-time correlation at hook time.
For building agent run datasets with sub-agent correlation:
session_id in logs or stateExample script pattern:
# 1. Export parent session
braintrust_analyze.py --replay <parent-session-id> > parent_traces.json
# 2. Query for orphaned sub-agent traces (those created during parent's time window)
braintrust_analyze.py --agent-stats > all_agent_traces.json
# 3. Correlate in Python:
# - Parent Task spans -> agentId, timestamps, subagent_type
# - Orphaned traces -> root_span_id, timestamps
# - Match by timing and type
This approach is reliable, testable, and doesn't require hooks to maintain implicit state.
development
Goal-based workflow orchestration - routes tasks to specialist agents based on user goals
tools
Wiring Verification
development
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
development
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.