skills/investigating-error-issue/SKILL.md
Investigates a single PostHog error tracking issue end-to-end. Use when the user provides an issue ID or pastes an issue URL (`/error_tracking/<id>`) and wants to understand the error — who it affects, what triggers it, when it started, whether it correlates with a release, browser, OS, or feature flag, and what the next step should be. Pulls aggregated metrics, sample exception events, segment breakdowns, linked replays, and synthesizes a hypothesis-grade summary in one pass.
npx skillsauth add posthog/ai-plugin investigating-error-issueInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a user asks "what's going on with this error?" or pastes an issue URL, gather the context they would otherwise have to assemble manually: who is hitting it, what changed, where it happens, and whether a replay shows the cause.
| Tool | Purpose |
| ------------------------------------------- | ------------------------------------------------------------------------------------------- |
| posthog:query-error-tracking-issue | Compact issue details (status, assignee, top frame, release, aggregates) |
| posthog:query-error-tracking-issue-events | Sampled $exception events with stack, URL, browser, $session_id |
| posthog:execute-sql | Breakdowns, release / flag correlations, surrounding events + console logs around the error |
| posthog:query-logs | OTEL log entries around the error timestamp for server-side issues |
| posthog:query-session-recordings-list | Linked replays (delegate ranking to finding-replay-for-issue) |
| posthog:read-data-schema | Confirm property keys before filtering on them |
Fetch the issue record with its compact aggregates and a sparkline:
posthog:query-error-tracking-issue
{
"issueId": "<issue_id>",
"dateRange": { "date_from": "-30d" },
"includeSparkline": true,
"volumeResolution": 12
}
Capture: name, description, status, first_seen, last_seen, assignee,
total occurrences / users / sessions, top in-app frame, latest release
metadata, and the volume buckets.
The sparkline tells you the shape — flat, spike, ramp, or recurring — and that
shape drives the rest of the investigation. If the user only asked a status
question, skip includeSparkline to save tokens.
A captured event has the stack frames, URL, browser, and properties needed to reason about cause. Pull a recent sample first, then an early one to compare.
posthog:query-error-tracking-issue-events
{
"issueId": "<issue_id>",
"limit": 1,
"verbosity": "stack"
}
Use verbosity: "raw" only if the truncated stack hides the answer. The tool
defaults to onlyAppFrames: true, which strips vendor frames; flip to false
when the bug appears to live in a third-party library — or when the response
comes back with stacktrace.type: "resolved" but no frames at all (common for
minified bundles where every frame looks vendor-y to the resolver, e.g. React
production builds).
For the earliest sample, narrow dateRange to a tight window around the
issue's first_seen (e.g. set date_from slightly before and date_to
slightly after) and pass orderDirection: "ASC" so you get the earliest
event in the window rather than the latest — the tool defaults to DESC,
which would return a recent event and silently duplicate the first call.
If recent and earliest events look materially different — different stack
root, different URL pattern — the issue may be a grouping mistake. Flag for
grouping-noisy-errors instead of continuing as if it were one bug.
Breakdowns aren't a typed tool — drop into execute-sql. Run only the
breakdowns the issue's shape suggests; each one costs a query and clutters the
synthesis.
| Sparkline shape | First breakdown to try |
| ----------------- | ------------------------------------------------------------------------ |
| Spike from zero | By app version / release — almost always a deploy regression (see below) |
| Steady-state high | By browser / OS — rendering or platform-specific bug |
| Ramp | By geography or feature flag — gradual rollout exposure |
| Bursts then quiet | By time of day or $current_url — scheduled job or specific page |
PostHog emits three version-shaped fields. They mean different things and only one of them answers "what version of the user's app introduced this?":
| Property | What it is | Auto-captured by | Use for |
| --------------------- | --------------------------------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| $exception_releases | Cymbal-managed release map, keyed by release ID | Only when SDK publishes release metadata (e.g. sourcemap upload tied to a release) | Most precise release attribution when present |
| $app_version | The user's deployed app version | iOS (CFBundleShortVersionString), React Native (Expo / react-native-device-info) | "What deploy of my app introduced this?" — the question users care about |
| $lib_version | The PostHog SDK library version (e.g. posthog-js 1.298.0) | Every SDK on every event | The narrow "did upgrading the PostHog SDK introduce this?" question |
$lib_version is on virtually every event, which makes it tempting — but it's
the PostHog library version, not the user's app version. A constant
$lib_version paired with a spike means the user shipped a regression in
their own code with the SDK unchanged, which is the common case. Reach for
$lib_version only when nothing else is populated and you're explicitly
asking "did upgrading PostHog cause this?".
Web / server / Node / Java / Python projects do not auto-capture
$app_version — the customer has to set it (via register, a context
provider, or before_send). If the breakdown comes back with one
$app_version row of all-NULL, say so explicitly in the synthesis and
suggest the customer wire it up; falling back to $exception_releases or to
a per-day timeline by first_seen keeps the investigation moving.
Example ($app_version — populated automatically on mobile, manually on
web / server):
posthog:execute-sql
SELECT
properties.$app_version AS app_version,
count() AS occurrences,
uniq(person_id) AS users,
min(timestamp) AS first_seen,
max(timestamp) AS last_seen
FROM events
WHERE event = '$exception'
AND (issue_id = '<issue_id>' OR properties.$exception_issue_id = '<issue_id>')
AND timestamp > now() - INTERVAL 30 DAY
GROUP BY app_version
ORDER BY occurrences DESC
LIMIT 20
The (issue_id = ... OR properties.$exception_issue_id = ...) pattern
mirrors the canonical build_issue_where clause from
products/error_tracking/backend/api/query_utils.py. issue_id is the
resolved virtual field on events (it follows fingerprint overrides so
merged/split issues route correctly); properties.$exception_issue_id is
the raw event property captured at ingestion. Filtering on only the property
silently undercounts events for issues that have been merged or split.
If first_seen for one app_version is much later than the issue's overall
first_seen, that release introduced or worsened the bug — strong root-cause
signal. If every row is NULL, the SDK isn't reporting an app version on
this project (common on web / server) — switch to $exception_releases if
the customer ships releases, or fall back to a toDate(timestamp) timeline.
When $exception_releases is populated, it's a JSON dict keyed by release
ID. There is no top-level $release property; query $exception_releases
directly when you need release attribution and the customer has it wired up.
Repeat with properties.$browser, properties.$os, properties.$current_url,
or any feature flag the project tags errors with.
If the user suspects an experiment or rollout, check whether affected users had a flag enabled when the error fired.
To enumerate which flags were evaluated on affected users, parse the
$active_feature_flags property — it is materialized as a JSON-encoded string in
ClickHouse, so arrayJoin(properties.$active_feature_flags) directly will fail;
JSONExtract is the working pattern:
posthog:execute-sql
SELECT
arrayJoin(JSONExtract(toString(properties.$active_feature_flags), 'Array(String)')) AS flag,
count() AS occurrences,
uniq(person_id) AS users
FROM events
WHERE event = '$exception'
AND (issue_id = '<issue_id>' OR properties.$exception_issue_id = '<issue_id>')
AND timestamp > now() - INTERVAL 14 DAY
AND notEmpty(toString(properties.$active_feature_flags))
GROUP BY flag
ORDER BY occurrences DESC
LIMIT 20
Caveat: every event captures every evaluated flag key, so this enumeration often
returns identical counts across flags and doesn't tell you which flag
correlates with the error — only which were on the user. To actually test a
hypothesis, query the per-flag value column properties.$feature/<flag-key>,
which carries the evaluated value (true/false/variant name):
posthog:execute-sql
SELECT
properties.`$feature/my-flag-key` AS variant,
count() AS occurrences,
uniq(person_id) AS users
FROM events
WHERE event = '$exception'
AND (issue_id = '<issue_id>' OR properties.$exception_issue_id = '<issue_id>')
AND timestamp > now() - INTERVAL 14 DAY
GROUP BY variant
ORDER BY occurrences DESC
Compare the variant split here to the project's overall exposure on the same flag in the same window. Disproportionate representation of one variant suggests the flag is involved in the cause — not a guarantee, but a strong hypothesis.
Use the $session_id from the sample event in step 2 to pull the activity
surrounding the exception. Three sources stack on each other; run the ones
that make sense for the SDK that captured the error.
$session_id)Mirrors the ET frontend session timeline. Pulls custom events, page views, and other exceptions captured under the same session within a ±1h window:
posthog:execute-sql
SELECT
uuid,
event,
timestamp,
properties.$lib AS lib,
properties.$current_url AS url
FROM events
WHERE $session_id = '<session_id_from_step_2>'
AND (event = '$exception' OR event = '$pageview' OR left(event, 1) != '$')
AND timestamp >= toDateTime('<error_timestamp>', 'UTC') - INTERVAL 1 HOUR
AND timestamp <= toDateTime('<error_timestamp>', 'UTC') + INTERVAL 1 HOUR
ORDER BY timestamp ASC
LIMIT 100
The left(event, 1) != '$' clause drops PostHog autocapture / system events
while keeping every custom event. The OR event = '$pageview'/'$exception'
exceptions re-add the two system events worth seeing on the timeline. This is
the same filter the ET UI uses.
Mixed $lib values in the output are a feature, not noise. When a server SDK
propagates $session_id from the client request (PostHog's own backend does
this), the timeline shows server-side activity inline with the browser side —
"both SDKs when available" for free. Skim the lib column to see how each row
was produced.
The skill defaults to a ±1h window because that's what the UI uses; widen it when an issue's actions are slow (long batch jobs, background workers) or tighten it when only the seconds right before the throw matter.
When session replay is enabled, the replay pipeline emits console.* calls
into the log_entries table tagged with the same session id. Pull them with
the matching window:
posthog:execute-sql
SELECT timestamp, level, message
FROM log_entries
WHERE log_source = 'session_replay'
AND log_source_id = '<session_id_from_step_2>'
AND timestamp >= toDateTime('<error_timestamp>', 'UTC') - INTERVAL 1 HOUR
AND timestamp <= toDateTime('<error_timestamp>', 'UTC') + INTERVAL 1 HOUR
ORDER BY timestamp ASC
LIMIT 200
log_source = 'session_replay' is the discriminator — log_entries is shared
with other sources. Empty results are common: either replay isn't enabled, or
this specific session wasn't recorded. Mention that in the synthesis rather
than treating it as a failure.
query-logs)For server-side exceptions, correlate the exception timestamp with OTEL log
entries the customer ingests. Many projects don't ingest logs at all — if
query-logs returns nothing or errors, say so and move on. Discover available
services first with logs-attribute-values-list when you don't know which
service produced the error.
posthog:query-logs
{
"query": {
"dateRange": {
"date_from": "<error_timestamp minus 5 minutes>",
"date_to": "<error_timestamp plus 5 minutes>"
},
"severityLevels": ["error", "warn"],
"serviceNames": ["<service.name if known>"],
"limit": 50,
"orderBy": "earliest"
}
}
Caveats worth knowing before relying on this output:
trace_id / span_id come back zero-padded ("00000000...") when not set.
Trace-based correlation only works for explicitly instrumented requests, not
for every event.service.name is a resource attribute. Narrow with serviceNames rather
than a free-text searchTerm when you know the producer.Hand off to finding-replay-for-issue when picking the best session matters —
popular issues link hundreds of recordings, mostly short crash fragments or
idle-tab sessions, and that skill applies the duration / active-time / recency
ranking that finds the one most likely to show the cause. Hand off too when the
user asks for "a replay" without specifying which.
Skip the hand-off and pull a recording inline via query-session-recordings-list
with session_ids from the sample exception events you already fetched in step 2
when only a handful of sessions are linked, the user already named a specific
session, or any working example will do (e.g. proving the error reproduces).
If neither path returns a recording, mention that session replay may not be enabled for the affected users — useful context, not a failure.
Present in this order:
first_seen, plus the release / version that
introduced it if a breakdown found oneKeep the synthesis tight. The user wants the answer, not a tour of the data.
issue_id
virtual field, with properties.$exception_issue_id as fallback — see Step 3
for the reason and the build_issue_where pattern.$app_version (the
user's deployed app version, auto-captured on iOS / React Native and
manually set on web / server) or $exception_releases when populated. Avoid
$lib_version for this question — it's the PostHog SDK library version, not
the user's app. See the "Picking the right version property" subsection in
Step 3.first_seen event off the breakdown.query-error-tracking-issue returns an external_issues array, the issue
is already linked to a Linear / Jira / GitHub ticket. Mention the link in the
synthesis so the user doesn't open a duplicate.testing
Focused Signals scout for PostHog projects running surveys. Watches active surveys for score regressions (NPS / CSAT / rating drops), response-volume drops, abandonment spikes, and targeting drift, AND aggregates open-text responses into recurring themes the team should know about (clusters of complaints, praise, feature requests). Emits findings only when a theme or anomaly clears the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
development
Focused Signals scout for PostHog projects using revenue analytics. Watches the derived revenue product for upstream failures (Stripe sync stalls, capture regressions), config drift (missing subscription property, currency mix surprises, broken Stripe↔person joins, deferred-revenue gaps), and goal-miss escalations. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for finding observability gaps in PostHog itself — significant event volumes the team isn't tracking, custom events with no insight or dashboard coverage, insights pointing at events that have stopped firing, dashboards missing related context, critical events with no alerts. Watches the event-stream-vs-saved- inventory delta as the team's product evolves and emits findings recommending new insights, dashboard additions, or alerts when gaps clear the confidence bar. Self-contained peer in the signals-scout-* fleet — picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using logs. Watches for volume bursts, severity-distribution shifts, service silence, fresh message patterns, and trace-correlated bursts via the logs ingestion pipeline. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.