skills/ab-test-planner/SKILL.md
Design statistically rigorous A/B tests for product features, UI changes, onboarding flows, and pricing experiments. Use when asked to set up an experiment, design an A/B test, calculate sample size, or interpret test results. Produces a complete test plan with hypothesis, variant definitions, sample size, duration estimate, guardrail metrics, and a results interpretation guide.
npx skillsauth add mohitagw15856/pm-claude-skills ab-test-plannerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design experiments that produce trustworthy results — not just directional signals. Every test output includes hypothesis, success metrics, sample size, duration, and a results interpretation guide.
Ask the user for these if not provided:
Before running any test, confirm:
"We believe that [change] will cause [primary metric] to [increase/decrease] by [X%] for [user segment], because [rationale based on data or insight]."
Never run a test without a directional hypothesis. "Let's just see what happens" is not a hypothesis.
Use this formula (provide the output, not the formula, to the user):
For common scenarios, provide pre-calculated estimates:
| Baseline Rate | MDE (Relative) | Required Sample per Variant | |---|---|---| | 5% | 20% | ~19,000 | | 10% | 15% | ~14,000 | | 20% | 10% | ~15,000 | | 40% | 10% | ~9,500 | | 60% | 5% | ~42,000 |
Always warn: "These are estimates. Use a tool like Evan Miller's calculator or Statsig for precision."
Minimum: 2 full weeks (to capture weekly seasonality) Maximum: 4 weeks (novelty effect distorts results beyond this)
Duration = Required sample ÷ (Daily traffic × % exposed)
Flag if traffic is too low to reach significance in under 8 weeks — recommend a different approach (e.g., holdout test, qualitative research).
Hypothesis:
[Filled hypothesis template]
Variants:
Primary Metric: [Metric name + how measured] Guardrail Metrics: [Metrics that must not degrade]
Target Segment: [Who sees the test — % of traffic, user type] Traffic Split: [50/50 recommended unless ramp-up needed]
Sample Size Required: ~[N] users per variant Estimated Duration: [X] weeks (based on [Y] daily eligible users) Significance Threshold: 95% confidence, 80% power
Exclusions: [Any user segments to exclude and why]
Rollback Trigger: If [guardrail metric] degrades by [X%], stop the test immediately.
Results Interpretation Guide:
development
Analyse competitor moves and translate them into strategic implications for your product roadmap. Use when a competitor announces a new feature, pricing change, partnership, or strategic shift, or when producing a periodic competitive intelligence report. Produces a categorised signal analysis with reactive-vs-proactive assessment, threat ratings, specific roadmap implications, and recommended responses with owners.
development
Build a community management playbook for a brand's social media channels. Use when asked to create guidelines for managing comments, DMs, and community interactions, define a moderation policy, or build response frameworks for social media community managers. Produces a complete playbook with response templates, escalation paths, moderation rules, and tone guidelines.
development
Activate a 4-stage coding discipline framework that forces Claude to plan before coding, isolate changes on a branch, write tests first, and self-review output twice before presenting it. Use when starting a complex coding task, when past Claude sessions produced broken first drafts, or when you want to prevent rework cycles. Produces a confirmed written plan, isolated feature branch, test-first implementation, and a double-reviewed output with a correctness and code-quality checklist.
development
Optimize an article for Answer Engine Optimization (AEO) — restructuring content so AI engines like ChatGPT, Perplexity, and Claude can extract, quote, and cite it. Rewrites headings as questions, drops 50-80 word answer capsules, audits paragraph length, and flags trust signals. Use when asked to AEO-optimize, make content AI-readable, improve AI citation chances, or adapt an article for answer engines.