dotfiles/dot_config/skillshare/skills/ab-test-analysis/SKILL.md
Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant.
npx skillsauth add pkking/dotfiles ab-test-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
You are analyzing A/B test results for $ARGUMENTS.
If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
Understand the experiment:
Validate the test setup:
Calculate statistical significance:
If the user provides raw data, generate and run a Python script to calculate these.
Check guardrail metrics:
Interpret results:
| Outcome | Recommendation | |---|---| | Significant positive lift, no guardrail issues | Ship it — roll out to 100% | | Significant positive lift, guardrail concerns | Investigate — understand trade-offs before shipping | | Not significant, positive trend | Extend the test — need more data or larger effect | | Not significant, flat | Stop the test — no meaningful difference detected | | Significant negative lift | Don't ship — revert to control, analyze why |
Provide the analysis summary:
## A/B Test Results: [Test Name]
**Hypothesis**: [What we expected]
**Duration**: [X days] | **Sample**: [N control / M variant]
| Metric | Control | Variant | Lift | p-value | Significant? |
|---|---|---|---|---|---|
| [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
| [Guardrail] | ... | ... | ... | ... | ... |
**Recommendation**: [Ship / Extend / Stop / Investigate]
**Reasoning**: [Why]
**Next steps**: [What to do]
Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.
testing
Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
data-ai
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
development
Run the full autonomous engineering pipeline end-to-end (plan, work, code review, test, commit, push, open PR, watch CI, fix CI failures until green). Use only when the user explicitly requests hands-off execution of a software task and provides a feature description; do not auto-route casual conversation here.
development
Create an isolated git worktree for parallel feature work or PR review. Use when starting work that should not disturb the current checkout, or when `ce-work` or `ce-code-review` offers a worktree option.