skills/configuring-experiment-analytics/SKILL.md
Configures the analytics side of a PostHog experiment — exposure criteria (default `$feature_flag_called` vs custom exposure events), primary and secondary metrics, the supported metric types (count, sum, ratio with `math` and `math_property`, retention with `retention_window_start` and `start_handling`), multivariate user handling ("Exclude" vs "First seen variant"), and how to read results once the experiment is live. Use when the user adds or edits a primary or secondary metric (e.g. "add a secondary metric tracking 'downloaded_file' per user"), sets up a ratio metric (e.g. "revenue from purchase_completed / pageviews"), sets up a retention metric (e.g. "$pageview → uploaded_file, 7-day window"), configures custom exposure (e.g. "only count users who hit /checkout"), changes multivariate handling, or asks "who is in the analysis?", "how do I measure impact?", "is this winning?", "what's the confidence level?", or "should I ship?".
npx skillsauth add posthog/ai-plugin configuring-experiment-analyticsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill answers: Who is included in the analysis? and How to measure impact?
Exposure criteria determine which users are counted in the experiment analysis.
Two options:
$feature_flag_called event fires for the experiment's flag. This is the standard approach — it means a user is included only when they actually encounter the feature flag in your code.When a user is exposed to multiple variants (e.g., due to flag changes or race conditions):
Bias risk on uneven splits. "Exclude multivariate users" combined with an uneven variant split can introduce bias — multi-variant users are dropped asymmetrically and the smaller variant loses a larger fraction of its assignments. If those users behave differently from the rest, the smaller variant's metrics will be skewed.
The right mitigation depends on experiment state:
configuring-experiment-rollout.exposure_criteria.filterTestAccounts (default: true) — excludes internal/test users from the analysis.
Metric changes require an experiment ID. If the user refers to an experiment by name
or description (e.g. "add metrics to the checkout test"), load the finding-experiments
skill to resolve it to a concrete ID before proceeding.
Metrics are added via experiment-update after creation. The metrics array replaces the entire list, so always get the current experiment first via experiment-get to preserve existing metrics.
Before suggesting or configuring ANY metric, you MUST call read-data-schema to discover
what events actually exist in the project. Do NOT skip this step. Do NOT suggest event names
based on what you think the project might track — only use events you have confirmed exist.
This applies even when:
Workflow:
read-data-schema to get the project's eventsLegitimate exception — allow_unknown_events: true:
Pass this on experiment-create / experiment-update only when the user is intentionally instrumenting an event that hasn't been ingested yet (e.g. setting up the experiment before the code change ships). Confirm this with the user — never use it as a workaround for "the event lookup didn't return what I expected".
Example:
User: "Let's add some metrics for the checkout experiment"
WRONG: "I'd suggest using purchase_completed as the primary metric..."
(hallucinated event name — never seen the project's actual events)
RIGHT: *calls read-data-schema* → "Here are the events in your project
related to checkout: `checkout_step_completed`, `payment_processed`,
`order_confirmed`. Which of these represents a successful checkout?"
There are four metric types. Each has kind: "ExperimentMetric":
| metric_type | When to use | Key fields |
| ------------- | -------------------------------------------------------------------------------------- | -------------------------------------------------- |
| "mean" | Average of a numeric property per user (revenue, session duration, pageviews per user) | source EventsNode |
| "funnel" | Conversion rate from exposure through one or more ordered actions | series array of EventsNode steps (1 or more) |
| "ratio" | Rate of one event relative to another | numerator, denominator EventsNode |
| "retention" | Do users come back after exposure? | start_event, completion_event, window config |
Funnel metrics and the implicit exposure step
Funnel metrics automatically prepend the experiment's exposure event as step_0.
So a funnel with 1 step in series is a valid 2-step funnel: exposure → action.
This is the correct choice for measuring "what percentage of exposed users did X?"
Examples:
$pageview filtered to /login)checkout_completed)Mean vs funnel for the same event
Both can reference the same event — the difference is whether you care about count/magnitude (mean) or yes/no conversion (funnel).
See references/metric-configuration.md for detailed JSON examples of each type.
See references/interpreting-results.md for guidance on reading experiment results, statistical significance, and when to ship vs end.
testing
Focused Signals scout for PostHog projects running surveys. Watches active surveys for score regressions (NPS / CSAT / rating drops), response-volume drops, abandonment spikes, and targeting drift, AND aggregates open-text responses into recurring themes the team should know about (clusters of complaints, praise, feature requests). Emits findings only when a theme or anomaly clears the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
development
Focused Signals scout for PostHog projects using revenue analytics. Watches the derived revenue product for upstream failures (Stripe sync stalls, capture regressions), config drift (missing subscription property, currency mix surprises, broken Stripe↔person joins, deferred-revenue gaps), and goal-miss escalations. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for finding observability gaps in PostHog itself — significant event volumes the team isn't tracking, custom events with no insight or dashboard coverage, insights pointing at events that have stopped firing, dashboards missing related context, critical events with no alerts. Watches the event-stream-vs-saved- inventory delta as the team's product evolves and emits findings recommending new insights, dashboard additions, or alerts when gaps clear the confidence bar. Self-contained peer in the signals-scout-* fleet — picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using logs. Watches for volume bursts, severity-distribution shifts, service silence, fresh message patterns, and trace-correlated bursts via the logs ingestion pipeline. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.