gemini/skills/ab-test-setup/SKILL.md
When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.
npx skillsauth add samchang72/custom-skills ab-test-setupInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
Before designing a test, understand:
Test Context
Current State
Constraints
Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].
Weak hypothesis: "Changing the button color might increase clicks."
Strong hypothesis: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."
| Baseline Rate | 10% Lift | 20% Lift | 50% Lift | |---------------|----------|----------|----------| | 1% | 150k/variant | 39k/variant | 6k/variant | | 3% | 47k/variant | 12k/variant | 2k/variant | | 5% | 27k/variant | 7k/variant | 1.2k/variant | | 10% | 12k/variant | 3k/variant | 550/variant |
Duration = Sample size needed per variant × Number of variants
───────────────────────────────────────────────────
Daily traffic to test page × Conversion rate
Minimum: 1-2 business cycles (usually 1-2 weeks) Maximum: Avoid running too long (novelty effects, external factors)
Homepage CTA test:
Pricing page test:
Signup flow test:
Best practices:
What to vary:
Headlines/Copy:
Visual Design:
CTA:
Content:
Control (A):
- Screenshot
- Description of current state
Variant (B):
- Screenshot or mockup
- Specific changes made
- Hypothesis for why this will win
Tools: PostHog, Optimizely, VWO, custom
How it works:
Best for:
Tools: PostHog, LaunchDarkly, Split, custom
How it works:
Best for:
DO:
DON'T:
Looking at results before reaching sample size and stopping when you see significance leads to:
Solutions:
Statistical ≠ Practical
Did you reach sample size?
Is it statistically significant?
Is the effect size meaningful?
Are secondary metrics consistent?
Any guardrail concerns?
Segment differences?
| Result | Conclusion | |--------|------------| | Significant winner | Implement variant | | Significant loser | Keep control, learn why | | No significant difference | Need more traffic or bolder test | | Mixed signals | Dig deeper, maybe segment |
Test Name: [Name]
Test ID: [ID in testing tool]
Dates: [Start] - [End]
Owner: [Name]
Hypothesis:
[Full hypothesis statement]
Variants:
- Control: [Description + screenshot]
- Variant: [Description + screenshot]
Results:
- Sample size: [achieved vs. target]
- Primary metric: [control] vs. [variant] ([% change], [confidence])
- Secondary metrics: [summary]
- Segment insights: [notable differences]
Decision: [Winner/Loser/Inconclusive]
Action: [What we're doing]
Learnings:
[What we learned, what to test next]
# A/B Test: [Name]
## Hypothesis
[Full hypothesis using framework]
## Test Design
- Type: A/B / A/B/n / MVT
- Duration: X weeks
- Sample size: X per variant
- Traffic allocation: 50/50
## Variants
[Control and variant descriptions with visuals]
## Metrics
- Primary: [metric and definition]
- Secondary: [list]
- Guardrails: [list]
## Implementation
- Method: Client-side / Server-side
- Tool: [Tool name]
- Dev requirements: [If any]
## Analysis Plan
- Success criteria: [What constitutes a win]
- Segment analysis: [Planned segments]
When test is complete
Next steps based on results
If you need more context:
development
Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
development
Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup
tools
Orchestrate a comprehensive code review process involving both Gemini (Antigravity) and Claude CLI, synthesizing a consensus report.
development
Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks