skills/ab-testing/SKILL.md
Designs and reviews A/B tests with explicit hypothesis, primary metric, guardrail metrics, variants, sample-size assumptions, duration, stopping rules, instrumentation checks, and decision criteria. [EXPLICIT] Trigger: "ab testing, a/b test, experiment design, split test, hypothesis formulation, statistical significance, sample size calculation, test duration"
npx skillsauth add javimontano/jm-agentic-development-kit-alfa ab-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"Method over hacks. Evidence over assumption."
Designs or audits an A/B test so a team can decide whether to run, fix, stop, or interpret an experiment without confusing speed with evidence. [EXPLICIT] The skill must make the hypothesis, metric contract, assumptions, sample-size needs, duration, instrumentation, risks, and decision rule explicit. [EXPLICIT]
| Anti-Pattern | Why It's Bad | Do This Instead | |-------------|-------------|-----------------| | Testing without a decision rule | Produces data but no decision | Define win, loss, inconclusive, and guardrail-failure actions before launch | | Optimizing many primary metrics | Inflates false positives and weakens accountability | Choose one primary metric and separate guardrails | | Peeking and stopping early | Makes confidence claims unreliable | Define monitoring and stopping policy before launch | | Missing instrumentation checks | Invalidates results after traffic is spent | Verify events, exposure logging, and sample ratio before analysis | | Treating significance as business value | A statistically detectable lift may be too small to matter | Include MDE and practical impact threshold |
analytics-eventsfunnel-analyticsconversion-optimizationdata-validationexperimentation-strategyExample invocations:
| Scenario | Handling | |----------|----------| | Empty or minimal input | Request clarification before proceeding | | Conflicting requirements | Flag conflicts explicitly, propose resolution | | Out-of-scope request | Redirect to appropriate skill or escalate |
testing
AI-generated content detection, watermarking, human-AI hybrid strategies. [EXPLICIT] Trigger: "ai content detection"
testing
Concept of Operations (CONOPS) for AI systems — system vision, stakeholder mapping, AI-human interaction spectrum, business value assessment, success metrics, and operational modes. This skill should be used when the user asks to "define the AI operational concept", "map AI stakeholders", "design AI-human interaction levels", "assess AI business value", "define AI success metrics", "plan AI operational modes", or mentions CONOPS, IEEE 1362, AI autonomy levels, AI value matrix, or AI system vision. [EXPLICIT]
development
LLM-assisted code review patterns, automated suggestion generation. [EXPLICIT] Trigger: "ai code review"
testing
AI-generated test cases, fuzzing, mutation testing, coverage optimization. [EXPLICIT] Trigger: "ai assisted testing"