skills/analyzing-experiment-session-replays/SKILL.md
Analyze session replay patterns across experiment variants to understand user behavior differences. Use when the user wants to see how users interact with different experiment variants, identify usability issues, compare behavior patterns between control and test groups, or get qualitative insights to complement quantitative experiment results.
npx skillsauth add posthog/ai-plugin analyzing-experiment-session-replaysInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill guides you through analyzing session recordings for experiment variants to understand behavioral differences between control and test groups.
Use this skill when:
Before analyzing session replays:
First, retrieve the experiment information and the feature flag variants (source of truth).
Step 1a: Get experiment metadata
You can either:
experiment_results_summary tool if you already have the experiment ID from contextSELECT
id,
name,
feature_flag_key,
start_date,
end_date
FROM system.experiments
WHERE id = <experiment_id>
AND team_id = <team_id>
From the experiment data, extract:
feature_flag_key: The feature flag controlling the experimentstart_date and end_date: The experiment's time rangeStep 1b: Get variants from the feature flag
IMPORTANT: Always get variants from the feature flag, NOT from experiment.parameters.feature_flag_variants.
The parameters can be out of sync or deprecated. The feature flag is the source of truth.
Query the feature flag to get the current variants:
SELECT
key,
filters
FROM system.feature_flags
WHERE key = '<feature_flag_key>'
AND team_id = <team_id>
Extract the variant keys from filters.multivariate.variants array.
Example structure: [{"key": "control", "name": "Control", "rollout_percentage": 50}, {"key": "test", ...}]
The variant key values (e.g., "control", "test", "variant_a") are what you'll use to filter session recordings.
For each variant in the experiment, construct recording filters that match users exposed to that variant.
Filter structure for a variant:
{
"date_from": "<experiment.start_date>",
"date_to": "<experiment.end_date or current time>",
"filter_test_accounts": true,
"events": [
{
"id": "$feature_flag_called",
"type": "events",
"properties": [
{
"key": "$feature_flag",
"value": ["<feature_flag_key>"],
"operator": "exact",
"type": "event"
},
{
"key": "$feature/<feature_flag_key>",
"value": ["<variant_key>"],
"operator": "exact",
"type": "event"
}
]
}
]
}
Key points:
$feature_flag_called events where the flag matches the experiment's feature flag$feature/<flag_key> property to filter for the specific variant valuefilter_test_accounts: true to exclude test usersUse the filter_session_recordings tool with the filters constructed in step 2.
Call the tool once per variant to get recordings for each group:
The tool returns a list of recordings with metadata including:
Compare the recordings between variants by looking for:
Quantitative patterns:
Qualitative insights:
Summarize the behavioral differences between variants, highlighting:
User: "How are users behaving in my checkout experiment?"
Agent steps:
1. Query experiment details (ID: 123, feature_flag_key: "checkout-flow-test", date range: 2025-01-01 to 2025-01-31)
2. Query feature flag "checkout-flow-test" to get variants from filters.multivariate.variants
3. Extract variant keys: "control" and "new-checkout"
4. Build filters for control variant:
- Events: $feature_flag_called with $feature_flag="checkout-flow-test" and $feature/checkout-flow-test="control"
- Date range: 2025-01-01 to 2025-01-31
5. Call filter_session_recordings with control filters → 147 recordings found
6. Build filters for new-checkout variant and call filter_session_recordings → 152 recordings found
7. Compare patterns:
- Control: Average 3m 45s session duration, 12% console errors
- New-checkout: Average 2m 30s session duration, 5% console errors
8. Present findings:
"I analyzed session replays for your checkout experiment. The new checkout flow shows:
- 33% faster completion (2m 30s vs 3m 45s)
- 58% fewer console errors (5% vs 12%)
- Users in the new variant navigate directly to payment, while control users often backtrack to review cart
- Recommendation: The new checkout flow reduces friction and errors"
Do not make assumptions:
Filter construction:
$feature/<flag_key> property is how PostHog tracks which variant a user saw$feature_flag_called as the event type$feature_flag and $feature/<flag_key> properties are requiredError handling:
filter_session_recordings: Core tool for retrieving session recordings with filtersexperiment_results_summary: Get experiment metadata and statistical resultsexecute_sql: Query experiments table for details via HogQLtesting
Focused Signals scout for PostHog projects running surveys. Watches active surveys for score regressions (NPS / CSAT / rating drops), response-volume drops, abandonment spikes, and targeting drift, AND aggregates open-text responses into recurring themes the team should know about (clusters of complaints, praise, feature requests). Emits findings only when a theme or anomaly clears the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
development
Focused Signals scout for PostHog projects using revenue analytics. Watches the derived revenue product for upstream failures (Stripe sync stalls, capture regressions), config drift (missing subscription property, currency mix surprises, broken Stripe↔person joins, deferred-revenue gaps), and goal-miss escalations. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for finding observability gaps in PostHog itself — significant event volumes the team isn't tracking, custom events with no insight or dashboard coverage, insights pointing at events that have stopped firing, dashboards missing related context, critical events with no alerts. Watches the event-stream-vs-saved- inventory delta as the team's product evolves and emits findings recommending new insights, dashboard additions, or alerts when gaps clear the confidence bar. Self-contained peer in the signals-scout-* fleet — picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using logs. Watches for volume bursts, severity-distribution shifts, service silence, fresh message patterns, and trace-correlated bursts via the logs ingestion pipeline. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.