Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

desenyon/ab-test-setup

Name: ab-test-setup
Author: desenyon

.github/skills/ab-test-setup/SKILL.md

npx skillsauth add desenyon/infinitecontex ab-test-setup

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

A/B Test Setup

You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.

Initial Assessment

Check for product marketing context first: If .claude/product-marketing-context.md exists, read it before asking questions. Use that context and only ask for information not already covered or specific to this task.

Before designing a test, understand:

Test Context - What are you trying to improve? What change are you considering?
Current State - Baseline conversion rate? Current traffic volume?
Constraints - Technical complexity? Timeline? Tools available?

Core Principles

1. Start with a Hypothesis

Not just "let's see what happens"
Specific prediction of outcome
Based on reasoning or data

2. Test One Thing

Single variable per test
Otherwise you don't know what worked

3. Statistical Rigor

Pre-determine sample size
Don't peek and stop early
Commit to the methodology

4. Measure What Matters

Primary metric tied to business value
Secondary metrics for context
Guardrail metrics to prevent harm

Hypothesis Framework

Structure

Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].

Example

Weak: "Changing the button color might increase clicks."

Strong: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."

Test Types

| Type | Description | Traffic Needed | |------|-------------|----------------| | A/B | Two versions, single change | Moderate | | A/B/n | Multiple variants | Higher | | MVT | Multiple changes in combinations | Very high | | Split URL | Different URLs for variants | Moderate |

Sample Size

Quick Reference

| Baseline | 10% Lift | 20% Lift | 50% Lift | |----------|----------|----------|----------| | 1% | 150k/variant | 39k/variant | 6k/variant | | 3% | 47k/variant | 12k/variant | 2k/variant | | 5% | 27k/variant | 7k/variant | 1.2k/variant | | 10% | 12k/variant | 3k/variant | 550/variant |

Calculators:

Evan Miller's
Optimizely's

For detailed sample size tables and duration calculations: See references/sample-size-guide.md

Metrics Selection

Primary Metric

Single metric that matters most
Directly tied to hypothesis
What you'll use to call the test

Secondary Metrics

Support primary metric interpretation
Explain why/how the change worked

Guardrail Metrics

Things that shouldn't get worse
Stop test if significantly negative

Example: Pricing Page Test

Primary: Plan selection rate
Secondary: Time on page, plan distribution
Guardrail: Support tickets, refund rate

Designing Variants

What to Vary

| Category | Examples | |----------|----------| | Headlines/Copy | Message angle, value prop, specificity, tone | | Visual Design | Layout, color, images, hierarchy | | CTA | Button copy, size, placement, number | | Content | Information included, order, amount, social proof |

Best Practices

Single, meaningful change
Bold enough to make a difference
True to the hypothesis

Traffic Allocation

| Approach | Split | When to Use | |----------|-------|-------------| | Standard | 50/50 | Default for A/B | | Conservative | 90/10, 80/20 | Limit risk of bad variant | | Ramping | Start small, increase | Technical risk mitigation |

Considerations:

Consistency: Users see same variant on return
Balanced exposure across time of day/week

Implementation

Client-Side

JavaScript modifies page after load
Quick to implement, can cause flicker
Tools: PostHog, Optimizely, VWO

Server-Side

Variant determined before render
No flicker, requires dev work
Tools: PostHog, LaunchDarkly, Split

Running the Test

Pre-Launch Checklist

[ ] Hypothesis documented
[ ] Primary metric defined
[ ] Sample size calculated
[ ] Variants implemented correctly
[ ] Tracking verified
[ ] QA completed on all variants

During the Test

DO:

Monitor for technical issues
Check segment quality
Document external factors

DON'T:

Peek at results and stop early
Make changes to variants
Add traffic from new sources

The Peeking Problem

Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.

Analyzing Results

Statistical Significance

95% confidence = p-value < 0.05
Means <5% chance result is random
Not a guarantee—just a threshold

Analysis Checklist

Reach sample size? If not, result is preliminary
Statistically significant? Check confidence intervals
Effect size meaningful? Compare to MDE, project impact
Secondary metrics consistent? Support the primary?
Guardrail concerns? Anything get worse?
Segment differences? Mobile vs. desktop? New vs. returning?

Interpreting Results

| Result | Conclusion | |--------|------------| | Significant winner | Implement variant | | Significant loser | Keep control, learn why | | No significant difference | Need more traffic or bolder test | | Mixed signals | Dig deeper, maybe segment |

Documentation

Document every test with:

Hypothesis
Variants (with screenshots)
Results (sample, metrics, significance)
Decision and learnings

For templates: See references/test-templates.md

Common Mistakes

Test Design

Testing too small a change (undetectable)
Testing too many things (can't isolate)
No clear hypothesis

Execution

Stopping early
Changing things mid-test
Not checking implementation

Analysis

Ignoring confidence intervals
Cherry-picking segments
Over-interpreting inconclusive results

Task-Specific Questions

What's your current conversion rate?
How much traffic does this page get?
What change are you considering and why?
What's the smallest improvement worth detecting?
What tools do you have for testing?
Have you tested this area before?

Proactive Triggers

Proactively offer A/B test design when:

Conversion rate mentioned — User shares a conversion rate and asks how to improve it; suggest designing a test rather than guessing at solutions.
Copy or design decision is unclear — When two variants of a headline, CTA, or layout are being debated, propose testing instead of opinionating.
Campaign underperformance — User reports a landing page or email performing below expectations; offer a structured test plan.
Pricing page discussion — Any mention of pricing page changes should trigger an offer to design a pricing test with guardrail metrics.
Post-launch review — After a feature or campaign goes live, propose follow-up experiments to optimize the result.

Output Artifacts

| Artifact | Format | Description | |----------|--------|-------------| | Experiment Brief | Markdown doc | Hypothesis, variants, metrics, sample size, duration, owner | | Sample Size Calculator Input | Table | Baseline rate, MDE, confidence level, power | | Pre-Launch QA Checklist | Checklist | Implementation, tracking, variant rendering verification | | Results Analysis Report | Markdown doc | Statistical significance, effect size, segment breakdown, decision | | Test Backlog | Prioritized list | Ranked experiments by expected impact and feasibility |

Communication

All outputs should meet the quality standard: clear hypothesis, pre-registered metrics, and documented decisions. Avoid presenting inconclusive results as wins. Every test should produce a learning, even if the variant loses. Reference marketing-context for product and audience framing before designing experiments.

Related Skills

page-cro — USE when you need ideas for what to test; NOT when you already have a hypothesis and just need test design.
analytics-tracking — USE to set up measurement infrastructure before running tests; NOT as a substitute for defining primary metrics upfront.
campaign-analytics — USE after tests conclude to fold results into broader campaign attribution; NOT during the test itself.
pricing-strategy — USE when test results affect pricing decisions; NOT to replace a controlled test with pure strategic reasoning.
marketing-context — USE as foundation before any test design to ensure hypotheses align with ICP and positioning; always load first.

desenyon/ab-test-setup

.github/skills/ab-test-setup/SKILL.md

When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking.

testing

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add desenyon/infinitecontex ab-test-setup

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 9:31 AM9.3s1 file scanned

SKILL.md

name:: ab-test-setup
description:: When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking.
license:: MIT
version:: 1.0.0
author:: Alireza Rezvani
category:: marketing
updated:: 2026-03-06

A/B Test Setup

You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.

Initial Assessment

Before designing a test, understand:

Test Context - What are you trying to improve? What change are you considering?
Current State - Baseline conversion rate? Current traffic volume?
Constraints - Technical complexity? Timeline? Tools available?

Core Principles

1. Start with a Hypothesis

Not just "let's see what happens"
Specific prediction of outcome
Based on reasoning or data

2. Test One Thing

Single variable per test
Otherwise you don't know what worked

3. Statistical Rigor

Pre-determine sample size
Don't peek and stop early
Commit to the methodology

4. Measure What Matters

Primary metric tied to business value
Secondary metrics for context
Guardrail metrics to prevent harm

Hypothesis Framework

Structure

Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].

Example

Weak: "Changing the button color might increase clicks."

Test Types

Sample Size

Quick Reference

Calculators:

Evan Miller's
Optimizely's

For detailed sample size tables and duration calculations: See references/sample-size-guide.md

Metrics Selection

Primary Metric

Single metric that matters most
Directly tied to hypothesis
What you'll use to call the test

Secondary Metrics

Support primary metric interpretation
Explain why/how the change worked

Guardrail Metrics

Things that shouldn't get worse
Stop test if significantly negative

Example: Pricing Page Test

Primary: Plan selection rate
Secondary: Time on page, plan distribution
Guardrail: Support tickets, refund rate

Designing Variants

What to Vary

Best Practices

Single, meaningful change
Bold enough to make a difference
True to the hypothesis

Traffic Allocation

Considerations:

Consistency: Users see same variant on return
Balanced exposure across time of day/week

Implementation

Client-Side

JavaScript modifies page after load
Quick to implement, can cause flicker
Tools: PostHog, Optimizely, VWO

Server-Side

Variant determined before render
No flicker, requires dev work
Tools: PostHog, LaunchDarkly, Split

Running the Test

Pre-Launch Checklist

[ ] Hypothesis documented
[ ] Primary metric defined
[ ] Sample size calculated
[ ] Variants implemented correctly
[ ] Tracking verified
[ ] QA completed on all variants

During the Test

DO:

Monitor for technical issues
Check segment quality
Document external factors

DON'T:

Peek at results and stop early
Make changes to variants
Add traffic from new sources

The Peeking Problem

Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.

Analyzing Results

Statistical Significance

95% confidence = p-value < 0.05
Means <5% chance result is random
Not a guarantee—just a threshold

Analysis Checklist

Reach sample size? If not, result is preliminary
Statistically significant? Check confidence intervals
Effect size meaningful? Compare to MDE, project impact
Secondary metrics consistent? Support the primary?
Guardrail concerns? Anything get worse?
Segment differences? Mobile vs. desktop? New vs. returning?

Interpreting Results

Documentation

Document every test with:

Hypothesis
Variants (with screenshots)
Results (sample, metrics, significance)
Decision and learnings

For templates: See references/test-templates.md

Common Mistakes

Test Design

Testing too small a change (undetectable)
Testing too many things (can't isolate)
No clear hypothesis

Execution

Stopping early
Changing things mid-test
Not checking implementation

Analysis

Ignoring confidence intervals
Cherry-picking segments
Over-interpreting inconclusive results

Task-Specific Questions

What's your current conversion rate?
How much traffic does this page get?
What change are you considering and why?
What's the smallest improvement worth detecting?
What tools do you have for testing?
Have you tested this area before?

Proactive Triggers

Proactively offer A/B test design when:

Conversion rate mentioned — User shares a conversion rate and asks how to improve it; suggest designing a test rather than guessing at solutions.
Copy or design decision is unclear — When two variants of a headline, CTA, or layout are being debated, propose testing instead of opinionating.
Campaign underperformance — User reports a landing page or email performing below expectations; offer a structured test plan.
Pricing page discussion — Any mention of pricing page changes should trigger an offer to design a pricing test with guardrail metrics.
Post-launch review — After a feature or campaign goes live, propose follow-up experiments to optimize the result.

Output Artifacts

Communication

Related Skills

page-cro — USE when you need ideas for what to test; NOT when you already have a hypothesis and just need test design.
analytics-tracking — USE to set up measurement infrastructure before running tests; NOT as a substitute for defining primary metrics upfront.
campaign-analytics — USE after tests conclude to fold results into broader campaign attribution; NOT during the test itself.
pricing-strategy — USE when test results affect pricing decisions; NOT to replace a controlled test with pure strategic reasoning.
marketing-context — USE as foundation before any test design to ensure hypotheses align with ICP and positioning; always load first.

Related Skills

desenyon/form-cro

testing

VerifiedTrustedCommunity

When the user wants to optimize any form that is NOT signup/registration — including lead capture forms, contact forms, demo request forms, application forms, survey forms, or checkout forms. Also use when the user mentions "form optimization," "lead form conversions," "form friction," "form fields," "form completion rate," or "contact form." For signup/registration forms, see signup-flow-cro. For popups containing forms, see popup-cro.

SKILL.mdUpdated Apr 4, 2026

desenyon/financial-analyst

development

VerifiedTrustedCommunity

Performs financial ratio analysis, DCF valuation, budget variance analysis, and rolling forecast construction for strategic decision-making. Use when analyzing financial statements, building valuation models, assessing budget variances, or constructing financial projections and forecasts. Also applicable when users mention financial modeling, cash flow analysis, company valuation, financial projections, or spreadsheet analysis.

SKILL.mdUpdated Apr 4, 2026

desenyon/financial-analyst

desenyon/saas-metrics-coach

testing

VerifiedTrustedCommunity

SaaS financial health advisor. Use when a user shares revenue or customer numbers, or mentions ARR, MRR, churn, LTV, CAC, NRR, or asks how their SaaS business is doing.

SKILL.mdUpdated Apr 4, 2026

desenyon/saas-metrics-coach

desenyon/financial-analyst

development

VerifiedTrustedCommunity

SKILL.mdUpdated Apr 4, 2026

desenyon/financial-analyst

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/desenyon/infinitecontex.git

# Copy into Claude Code skills folder (global)
cp -r infinitecontex/.github/skills/ab-test-setup ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

desenyon/infinitecontex

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT