skills/exploring-apm-traces/SKILL.md
Investigates distributed application performance using PostHog APM (OpenTelemetry span) data via MCP. Use when the user asks about service traces, slow HTTP/database spans, error spans, trace IDs, or span attributes — not AI observability traces or product logs. Uses posthog:query-apm-spans, posthog:apm-trace-get, posthog:apm-services-list, posthog:apm-attributes-list, and posthog:apm-attribute-values-list.
npx skillsauth add posthog/ai-plugin exploring-apm-tracesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
PostHog captures distributed traces from OpenTelemetry. Each trace is a tree of spans representing a request's path through services.
Disambiguation: This skill is for APM / OpenTelemetry traces. Do not confuse with AI observability traces (agent/model $ai_* events) or logs (posthog:query-logs, posthog:logs-*).
| Tool | Purpose |
| ----------------------------------- | ------------------------------------------------- |
| posthog:query-apm-spans | Search and filter spans (compact list view) |
| posthog:apm-trace-get | Get the full span list for one hex trace_id |
| posthog:apm-spans-aggregate | Per-operation aggregates (count, p50/p95, errors) |
| posthog:apm-spans-tree | Call-tree aggregates per (parent, child) edge |
| posthog:apm-services-list | List distinct service names |
| posthog:apm-attributes-list | List span or resource attribute keys |
| posthog:apm-attribute-values-list | List values for a specific attribute key |
See references/spans-and-fields.md for the response schema and the kind/status_code enums.
posthog:apm-trace-get
{
"trace_id": "<hex_trace_id>"
}
The response is { results: [span, span, …] } — a flat list of every span in the trace.
The list can be very large for fan-out request flows; when it exceeds the inline limit, Claude Code auto-persists it to a file.
From the result you get:
name, service_name, kind, status_code, parent_span_id, duration_nano, is_root_span_posthogUrl — always include this in your response so the user can click through to the UIWhen the result is persisted to a file (traces with hundreds of spans across services), use the parsing scripts to explore it.
Start with the summary to get the full picture, then drill into specifics:
# 1. Overview: services, span count, slowest spans, errors
python3 scripts/print_summary.py /path/to/persisted-file.json
# 2. Indented chronological tree (DFS by parent_span_id)
python3 scripts/print_timeline.py /path/to/persisted-file.json
# 3. Drill into a specific span by name
SPAN="HTTP GET /api/users" python3 scripts/extract_span.py /path/to/persisted-file.json
# 4. Search for a keyword across span names, services, IDs
SEARCH="keyword" python3 scripts/search_spans.py /path/to/persisted-file.json
# 5. When the JSON shape looks unfamiliar
python3 scripts/show_structure.py /path/to/persisted-file.json
All scripts support MAX_LEN=N env var to control truncation (0 = unlimited).
The flat span list is a tree. Each span carries:
trace_id — same on every span in the tracespan_id — this span's unique hex IDparent_span_id — points to the parent's span_id (zero-padded hex 000…000 for the root)is_root_span — convenience flag for the trace entryTo rebuild the tree:
is_root_span is true (or parent_span_id == "00000000…") are root spans.span_id matches its parent_span_id.parent_span_id, walk from each root downward.scripts/print_timeline.py does this for you and prints a DFS-indented tree.
print_summary.py — it surfaces the top-5 slowest spans by duration_nano.print_timeline.py and scan the indented durations — you can see whether time is dominated by one child span or fan-out across many.SPAN="<name>" python3 scripts/extract_span.py FILE.print_summary.py lists every span with status_code == 2 (Error). Each entry shows service, span name, and parent context.parent_span_id to see what request path led there.attributes map (e.g. exception.message, exception.type), which is returned in the trace payload — read it directly off the error span. apm-attribute-values-list is for discovering values across spans, not a prerequisite for reading one span's attributes.print_summary.py — it prints the set of services involved in the trace.apm-services-list to confirm X has emitted spans recently at all).print_timeline.py shows the indentation — wide trees mean parallel calls, deep trees mean sequential dependencies.Client (3) followed by matching Server (2) spans on the called service — that's a synchronous downstream call.http.method=POST)Each span carries an attributes map (span-level OTel attributes like http.method, db.statement) in the payload — so for a span you already have, just read it. Resource attributes (k8s labels, service.version) are not in the payload. To filter the whole dataset by an attribute:
apm-attributes-list / apm-attribute-values-list to discover keys and values (resource attributes especially).query-apm-spans with a filterGroup entry of type span_attribute or span_resource_attribute.apm-trace-get and query-apm-spans return _posthogUrl — always surface this to the user so they can verify in the PostHog UI.
When presenting findings, include the relevant PostHog URL.
Use posthog:query-apm-spans to search and filter spans. Note this returns spans, not a tree — pass query.traceId or grab a trace_id from the results and feed it to apm-trace-get for the tree.
Before constructing filters, discover what's actually in the project:
apm-services-list to see which services have emitted spans.apm-attributes-list with attribute_type: "span" or "resource".apm-attribute-values-list with a key to see the real values in use.Only then construct query-apm-spans filters. Custom attributes vary per project and cannot be guessed.
posthog:query-apm-spans
{
"query": {
"serviceNames": ["api-gateway"],
"dateRange": {"date_from": "-1h"},
"filterGroup": [
{"key": "http.status_code", "operator": "gt", "type": "span_attribute", "value": "499"}
]
}
}
posthog:apm-trace-get
{
"trace_id": "0123456789abcdef0123456789abcdef"
}
1_000_000_000. Filter values in query-apm-spans for duration are also nanoseconds.status_code == 2 is Error. 0 is Unset, 1 is OK. Use OK to match {0, 1} in the UI filter.kind is an integer 0–5: 0 Unspecified, 1 Internal, 2 Server, 3 Client, 4 Producer, 5 Consumer.parent_span_id of a root span is "0000000000000000" (16 zero hex chars, matching the 8-byte span ID width — not the 16-byte trace ID width), not null.Trace tool results are JSON. When too large to read inline, Claude Code persists them to a file.
[{ "type": "text", "text": "{\"results\": [...], \"_posthogUrl\": \"...\"}" }]
Every script in scripts/ unwraps this envelope before parsing.
results (array of span dicts)
└── each span:
├── uuid, trace_id, span_id, parent_span_id (hex strings)
├── name, kind (int 0–5), service_name
├── status_code (int 0–2), is_root_span (bool)
├── timestamp, end_time (ISO 8601)
├── duration_nano (int, nanoseconds)
├── attributes (map of span-level OTel attributes, e.g. db.statement, http.url)
└── matched_filter (0/1 — 1 if this span matched the query-apm-spans filter, 0 if it
only shares a trace with a match; always present, only meaningful from query-apm-spans)
| Script | Purpose | Usage |
| -------------------------------------------------- | ---------------------------------------------------- | -------------------------------------------------- |
| print_summary.py | Trace metadata, services, slowest spans, errors | python3 scripts/print_summary.py FILE |
| print_timeline.py | DFS-indented tree from parent_span_id walk | python3 scripts/print_timeline.py FILE |
| extract_span.py | Full row + parent/children for spans matching a name | SPAN="name" python3 scripts/extract_span.py FILE |
| search_spans.py | Find a keyword across name, service_name, IDs | SEARCH="kw" python3 scripts/search_spans.py FILE |
| show_structure.py | Show JSON keys and types without values | python3 scripts/show_structure.py FILE |
dateRange on query-apm-spans — queries without a time range are slow. Default is -1h; widen only when needed._posthogUrl in your response so the user can click through.apm-trace-get / query-apm-spans payload (each span's attributes map). Resource attributes are not — use apm-attributes-list (type resource) and apm-attribute-values-list for those.is_root_span is the cheap way to find the trace entry — don't string-match 00000000….apm-spans-aggregate for a flat view or apm-spans-tree for parent→child edges — don't reach for SQL.tools
Focused Signals scout for PostHog projects with web traffic. Watches the acquisition and site-health layer the web analytics product reports on: per-channel session volume diverging from the site's own rhythm (an acquisition source silently collapsing or surging), attribution breakage (paid/campaign traffic reclassifying into Direct or Unknown when tagging breaks), landing pages that break (bounce-rate steps, 404 spikes, entry-path cliffs), and page-performance regressions (web vitals p75 steps). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.
tools
Focused Signals scout for PostHog projects using session replay. Watches two promises the replay product makes: that sessions are actually being recorded (capture integrity — recording volume vanishing while site traffic doesn't), and that the friction evidence inside recordings gets seen (rage-click / dead-click clusters concentrating on a page or element, error-after-interaction cohorts, recurring replay vision themes nobody aggregates). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.
tools
Focused Signals scout for PostHog setup health. Reads the project's active health issues — the deterministic findings of PostHog's own health checks (no live events, outdated SDKs, missing reverse proxy, absent web vitals, ingestion warnings, failing data-warehouse models, and more) — and decides which are genuinely worth surfacing. Unlike a one-signal-per-issue push, it bundles kind-clusters into a single finding, weights by real blast radius (cross-referencing actual event volume and reach), and prioritizes issues an agent can resolve via the MCP. Emits only above the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.
tools
Focused Signals scout for PostHog projects using feature flags. Watches the flag roster and the `$feature_flag_called` evaluation stream for contradictions between a flag's configured state and its real traffic: evaluation cliffs on healthy flags, ghost flags (code calling keys that no longer exist), response-distribution shifts with no corresponding flag edit, and flag debt (stale, fully-rolled-out, or dead flags still burning evaluations). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.