skills/ab-testing/SKILL.md
Designs and reviews A/B tests with explicit hypothesis, primary metric, guardrail metrics, variants, sample-size assumptions, duration, stopping rules, instrumentation checks, and decision criteria. [EXPLICIT] Trigger: "ab testing, a/b test, experiment design, split test, hypothesis formulation, statistical significance, sample size calculation, test duration"
npx skillsauth add JaviMontano/jm-adk-alfa ab-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"Method over hacks. Evidence over assumption."
Designs or audits an A/B test so a team can decide whether to run, fix, stop, or interpret an experiment without confusing speed with evidence. [EXPLICIT] The skill must make the hypothesis, metric contract, assumptions, sample-size needs, duration, instrumentation, risks, and decision rule explicit. [EXPLICIT]
| Anti-Pattern | Why It's Bad | Do This Instead | |-------------|-------------|-----------------| | Testing without a decision rule | Produces data but no decision | Define win, loss, inconclusive, and guardrail-failure actions before launch | | Optimizing many primary metrics | Inflates false positives and weakens accountability | Choose one primary metric and separate guardrails | | Peeking and stopping early | Makes confidence claims unreliable | Define monitoring and stopping policy before launch | | Missing instrumentation checks | Invalidates results after traffic is spent | Verify events, exposure logging, and sample ratio before analysis | | Treating significance as business value | A statistically detectable lift may be too small to matter | Include MDE and practical impact threshold |
analytics-eventsfunnel-analyticsconversion-optimizationdata-validationexperimentation-strategyExample invocations:
| Scenario | Handling | |----------|----------| | Empty or minimal input | Request clarification before proceeding | | Conflicting requirements | Flag conflicts explicitly, propose resolution | | Out-of-scope request | Redirect to appropriate skill or escalate |
development
This skill should be used when the user asks to "design analytics models", "set up a dbt project", "plan data transformations", "define data contracts", or "model a star schema", or mentions staging models, marts, incremental strategies, or materializations. It produces analytics pipeline designs with dbt-style transformations, data modeling patterns, testing strategies, and documentation plans. [EXPLICIT] Use this skill whenever the user needs source-to-target mapping, materialization decisions, or transformation framework architecture, even if they don't explicitly ask for "analytics engineering". [EXPLICIT]
testing
Alert fatigue prevention, escalation rules, severity classification. [EXPLICIT] Trigger: "alerting strategy"
tools
LLM-in-the-loop workflows, human-AI handoff, approval gates. [EXPLICIT] Trigger: "ai workflow automation"
tools
Comprehensive testing strategy for AI systems — testing scope matrix (6 types x 6 layers), model prediction testing, data quality testing, compliance and fairness testing, integration approaches, and CI/CD test automation. This skill should be used when the user asks to "define AI testing strategy", "test ML models", "design data quality tests", "plan fairness testing", "test AI pipelines", "design integration tests for ML", or mentions adversarial testing, drift simulation, model regression testing, bias testing, explainability testing, or AI test automation. [EXPLICIT]