skills/creating-experiments/SKILL.md
Guides agents through the 3-step experiment creation flow: defining the hypothesis, configuring rollout, and setting up analytics. Delegates rollout decisions to configuring-experiment-rollout and metric setup to configuring-experiment-analytics. TRIGGER when: user asks to create a new experiment or A/B test, OR when you are about to call experiment-create. DO NOT TRIGGER when: user is updating an existing experiment, managing lifecycle, or only browsing experiments.
npx skillsauth add posthog/ai-plugin creating-experimentsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill walks through the 3-step flow for creating a new A/B test experiment.
Create the experiment as a draft quickly, then iterate on metrics and configuration. The user gets a tangible draft immediately and can refine it.
Gather these before calling experiment-create:
description)"product". The "web" value is reserved for no-code experiments configured visually with the PostHog
toolbar in a browser; it cannot be meaningfully driven via MCP. If a user asks for a
no-code/toolbar experiment, point them to the PostHog UI instead of creating one here.)If the user gives enough context to infer these, don't ask — just proceed.
This is about rollout configuration.
Before asking any rollout question, load configuring-experiment-rollout. The disambiguation wording, recommendations, and post-answer branches live there — do not formulate rollout questions yourself, and do not assume an example you remember covers the user's path.
Key decision points (covered in detail by configuring-experiment-rollout):
If the user doesn't mention rollout specifics, use defaults: 50/50 control/test, 100% rollout.
This is about analytics and metrics. Load the configuring-experiment-analytics skill for guidance.
Do NOT configure metrics on creation. Metrics are not passed to experiment-create — they are added
afterwards via experiment-update. This keeps the creation call lightweight.
When the user specifies metrics upfront, acknowledge them and add them immediately after creation. When they don't, create the draft and then guide them through metric setup as a follow-up.
Call experiment-create with:
{
"name": "Descriptive experiment name",
"feature_flag_key": "kebab-case-key",
"description": "Hypothesis: [what you expect to happen]",
"parameters": {
"feature_flag_variants": [
{ "key": "control", "name": "Control", "split_percent": 50 },
{ "key": "test", "name": "Test", "split_percent": 50 }
],
"rollout_percentage": 100
}
}
Two different percentages — do NOT mix them up:
feature_flag_variants[].split_percent — how users inside the experiment are split across variants (must sum to 100, recommended to have an even split).parameters.rollout_percentage — what fraction of all users enter the experiment at all (0-100, defaults to 100).Key details:
"control". Minimum 2, maximum 20 variants.rollout_percentage defaults to 100 if omitted.stats_config if the user requests Frequentist.Always show the experiment URL. The experiment-create response includes _posthogUrl — always display this link so the user can view and configure the experiment in the UI.
Remind the user to implement the feature flag in code. Link to the experiment page and say "implement the flag as shown here" — the experiment detail page shows implementation snippets for the user's SDK.
Guide through metrics if not yet configured — load the configuring-experiment-analytics skill.
Launch when ready — use the experiment-launch tool.
testing
Focused Signals scout for PostHog projects running surveys. Watches active surveys for score regressions (NPS / CSAT / rating drops), response-volume drops, abandonment spikes, and targeting drift, AND aggregates open-text responses into recurring themes the team should know about (clusters of complaints, praise, feature requests). Emits findings only when a theme or anomaly clears the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
development
Focused Signals scout for PostHog projects using revenue analytics. Watches the derived revenue product for upstream failures (Stripe sync stalls, capture regressions), config drift (missing subscription property, currency mix surprises, broken Stripe↔person joins, deferred-revenue gaps), and goal-miss escalations. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for finding observability gaps in PostHog itself — significant event volumes the team isn't tracking, custom events with no insight or dashboard coverage, insights pointing at events that have stopped firing, dashboards missing related context, critical events with no alerts. Watches the event-stream-vs-saved- inventory delta as the team's product evolves and emits findings recommending new insights, dashboard additions, or alerts when gaps clear the confidence bar. Self-contained peer in the signals-scout-* fleet — picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using logs. Watches for volume bursts, severity-distribution shifts, service silence, fresh message patterns, and trace-correlated bursts via the logs ingestion pipeline. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.