skills/configuring-experiment-rollout/SKILL.md
Configures the rollout shape of a PostHog experiment — the variant split (50/50, 80/20, A/B/C ratios), the overall rollout percentage that gates how many users enter the experiment, and the disambiguation when a percentage like "roll out to 25%" could mean either. Use when the user mentions a rollout percentage, variant split, or traffic distribution; gives a ratio like 60/40, 70/30, or 80/20; asks "who sees the test variant?"; wants to increase, decrease, or change the rollout or split on a draft or running experiment; weighs equal vs uneven splits; or proposes a mid-experiment split change (often an anti-pattern that needs reset or end-and-restart).
npx skillsauth add posthog/ai-plugin configuring-experiment-rolloutInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill answers: Who sees what variant?
In most cases, experiments work best with an equal split. If you want to limit exposure to the test variant, adjust the rollout percentage instead.
Why equal splits are better:
Always default to an equal split unless the user explicitly requests otherwise.
Uneven splits combined with the default "Exclude multivariate users" handling can introduce bias. If the experiment observes multi-variant users (users exposed to more than one variant) then those are dropped asymmetrically — the smaller variant loses a larger fraction of its assignments. If those users behave differently from the rest, the smaller variant's metrics will be skewed.
The right mitigation depends on experiment state:
configuring-experiment-analytics
for how to set this. Note that "first seen" handling can introduce other biases, but it's
preferable to mid-run reassignment.There are two separate controls that determine who sees what. Both are set via parameters.
parameters.feature_flag_variants)How users inside the experiment are distributed across variants.
{key, name, split_percent} — percentages must sum to 100"control" — this is the baselineIf the user says "A/B/C test", map the baseline to "control" and create additional variants for the others.
parameters.rollout_percentage)What percentage of all users enter the experiment at all. Default: 100%.
Users not included are excluded entirely — they don't see any variant and are not part of the analysis.
These two controls multiply:
| Overall rollout | Variant split | % seeing test | % in analysis | | --------------- | ------------------ | ------------- | ------------- | | 100% | 50/50 | 50% | 100% | | 100% | 75/25 control/test | 25% | 100% | | 50% | 50/50 | 25% | 50% | | 25% | 50/50 | 12.5% | 25% |
CRITICAL: If the user requests an uneven variant split (e.g. "60/40", "70/20/10") or mentions a specific percentage that could refer to either the split or the rollout (e.g. "roll out to 25%"), you MUST clarify before proceeding. This covers two cases:
The percentage is ambiguous — it could mean a variant split or a rollout change. Ask:
There are two ways to get 25% of users seeing the test variant:
- Reduced rollout with equal split (recommended): reduce the overall rollout and split variants equally. Only a subset of users enter the experiment, and of those, each variant gets the same share. Equal splits maximize statistical power and avoid bias.
- Asymmetric split: keep 100% rollout but give the test variant only 25%. All users enter the experiment, but the uneven split reduces power on the smaller variant and risks bias.
Which approach do you prefer?
Adjust the numbers to match whatever percentage the user requested.
The ratio looks like an explicit variant split, but a reduced rollout with an equal split is almost always better. Explain the trade-off and recommend the alternative:
An uneven variant split works, but an equal split with reduced rollout is recommended:
- Equal split + reduced rollout (recommended): reduce the overall rollout so that the same fraction of users sees the test variant, but split variants equally within the experiment. Equal splits maximize statistical power and avoid bias from asymmetric multivariate exclusion.
- Uneven split. Achieves the same user-facing outcome, but reduces power on the smaller variant and risks bias.
Would you like the equal split approach, or do you have a specific reason for the uneven split?
Adjust the numbers to match the ratio. For experiments with more than two variants, "equal" means each variant gets the same share (e.g. 34/33/33 for three variants). If the user confirms they want the uneven split after seeing the trade-off, proceed — but DO NOT skip the next section.
If the user proceeds with an uneven split (option 2 in either case above), you MUST surface the multivariate-handling implication BEFORE creating or updating the experiment. The user has chosen the riskier rollout path and needs to make an informed choice about how to mitigate.
Ask:
One more thing — with an uneven split, the default "Exclude multivariate users" handling drops users exposed to multiple variants asymmetrically. The smaller variant loses a larger fraction of its assignments, which can skew its metrics if those users behave differently from the rest.
Two options:
- Switch multivariate handling to "First seen variant" (recommended for uneven splits) — keeps all users in the analysis and avoids asymmetric exclusion. Has its own caveats (other biases can creep in) but is preferable to the default for uneven splits.
- Keep the default "Exclude" handling and accept the bias risk.
Which would you like?
See configuring-experiment-analytics for how to set the multivariate handling. Apply the choice
as part of the same operation (creation or update) — do not leave the user with an uneven split
under default handling without an explicit, informed decision.
This option (ensure_experience_continuity on the feature flag) is only relevant when:
This is not compatible with all setups. Learn more: https://posthog.com/docs/feature-flags/creating-feature-flags#persisting-feature-flags-across-authentication-steps
Only mention this to the user if their use case involves pre/post-authentication experiences.
Rollout changes require an experiment ID. If the user refers to an experiment by name
or description (e.g. "change rollout on my signup test"), load the finding-experiments
skill to resolve it to a concrete ID before proceeding.
Any change to rollout or variant split on a running experiment affects both user experience and statistical validity. You MUST warn the user and get explicit confirmation before making the change.
Do NOT silently apply the change — even if the user asked for it directly. Present the warning covering both perspectives:
Exception: Increasing rollout (without changing the split) is generally safe — no users switch variants, more users are added cleanly.
Mid-experiment fix for uneven-split bias: switching multivariate handling from "Exclude" to "First
seen variant" is the recommended mitigation for already-launched experiments — no users switch variants
and all collected data stays in the analysis. Changing the split to be even is an anti-pattern mid-run
(typically requires resetting or ending the experiment) and is only preferred if the experiment hasn't
been exposed to many users yet. See configuring-experiment-analytics for how to change the handling.
See references/changing-distribution-after-launch.md for detailed warnings, what to tell the user, and when to recommend alternatives.
testing
Focused Signals scout for PostHog projects running surveys. Watches active surveys for score regressions (NPS / CSAT / rating drops), response-volume drops, abandonment spikes, and targeting drift, AND aggregates open-text responses into recurring themes the team should know about (clusters of complaints, praise, feature requests). Emits findings only when a theme or anomaly clears the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
development
Focused Signals scout for PostHog projects using revenue analytics. Watches the derived revenue product for upstream failures (Stripe sync stalls, capture regressions), config drift (missing subscription property, currency mix surprises, broken Stripe↔person joins, deferred-revenue gaps), and goal-miss escalations. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for finding observability gaps in PostHog itself — significant event volumes the team isn't tracking, custom events with no insight or dashboard coverage, insights pointing at events that have stopped firing, dashboards missing related context, critical events with no alerts. Watches the event-stream-vs-saved- inventory delta as the team's product evolves and emits findings recommending new insights, dashboard additions, or alerts when gaps clear the confidence bar. Self-contained peer in the signals-scout-* fleet — picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using logs. Watches for volume bursts, severity-distribution shifts, service silence, fresh message patterns, and trace-correlated bursts via the logs ingestion pipeline. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.