Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

JaviMontano/ab-testing

Name: ab-testing
Author: JaviMontano

skills/ab-testing/SKILL.md

npx skillsauth add JaviMontano/jm-adk-alfa ab-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Ab Testing

"Method over hacks. Evidence over assumption."

TL;DR

Designs or audits an A/B test so a team can decide whether to run, fix, stop, or interpret an experiment without confusing speed with evidence. [EXPLICIT] The skill must make the hypothesis, metric contract, assumptions, sample-size needs, duration, instrumentation, risks, and decision rule explicit. [EXPLICIT]

Procedure

Step 1: Discover

Identify the experiment goal, decision owner, user segment, traffic source, current baseline, candidate variant, and business constraint. [EXPLICIT]
Capture the primary metric, guardrail metrics, minimum detectable effect (MDE), desired power, significance threshold, and acceptable runtime. [EXPLICIT]
Inspect existing analytics, event names, funnel definitions, docs, or code when available. If they are missing, mark the gap instead of inventing metrics. [EXPLICIT]

Step 2: Analyze

Convert the idea into a falsifiable hypothesis: "If we change X for audience Y, metric Z will move by N because R." [EXPLICIT]
Check whether an A/B test is appropriate or whether discovery, analytics cleanup, usability testing, or a feature flag rollout is safer first. [EXPLICIT]
Estimate sample-size and duration qualitatively or quantitatively from the provided baseline, traffic, variance, MDE, power, and significance inputs. If any required input is absent, return a requirements gap and a formula-ready checklist. [EXPLICIT]
Identify validity threats: novelty effects, peeking, seasonality, sample ratio mismatch, overlapping experiments, instrumentation drift, and segment bias. [EXPLICIT]

Step 3: Execute

Produce an experiment brief with hypothesis, variants, metric contract, sample-size assumptions, duration recommendation, launch checklist, monitoring plan, and decision rule. [EXPLICIT]
If asked to review an existing test, classify it as ready, blocked, risky, or invalid, and name the blocking evidence. [EXPLICIT]
Keep implementation recommendations scoped to the experiment; route broader funnel or analytics work to related skills when needed. [EXPLICIT]

Step 4: Validate

Verify that the primary metric has one owner and one definition. [EXPLICIT]
Verify that every recommendation is tied to provided evidence, an explicit assumption, or an open data requirement. [EXPLICIT]
Verify that the decision rule says what happens for win, loss, inconclusive, harmed guardrail, and instrumentation failure outcomes. [EXPLICIT]
Do not claim statistical significance, lift, ROI, or causality unless the required data and method are available. [EXPLICIT]

Quality Criteria

[ ] Hypothesis is falsifiable and names change, audience, metric, expected movement, and rationale.
[ ] Primary metric, guardrail metrics, event names, and data source are defined or explicitly marked as missing.
[ ] Sample-size, MDE, power, significance, and duration assumptions are stated; absent inputs are listed as blocking requirements.
[ ] Launch, monitoring, stopping, and decision rules are actionable.
[ ] Risks include at least peeking, seasonality, overlapping experiments, sample ratio mismatch, and instrumentation drift when relevant.
[ ] Claims use evidence tags or are marked as assumptions/open questions.

Anti-Patterns

| Anti-Pattern | Why It's Bad | Do This Instead | |-------------|-------------|-----------------| | Testing without a decision rule | Produces data but no decision | Define win, loss, inconclusive, and guardrail-failure actions before launch | | Optimizing many primary metrics | Inflates false positives and weakens accountability | Choose one primary metric and separate guardrails | | Peeking and stopping early | Makes confidence claims unreliable | Define monitoring and stopping policy before launch | | Missing instrumentation checks | Invalidates results after traffic is spent | Verify events, exposure logging, and sample ratio before analysis | | Treating significance as business value | A statistically detectable lift may be too small to matter | Include MDE and practical impact threshold |

Related Skills

analytics-events
funnel-analytics
conversion-optimization
data-validation
experimentation-strategy

Usage

Example invocations:

"/ab-testing" — Run the full ab testing workflow
"ab testing on this project" — Apply to current context

Assumptions & Limits

Assumes access to project artifacts (code, docs, configs) [EXPLICIT]
Does not replace domain expert judgment for final decisions [EXPLICIT]
If baseline conversion, traffic, variance, or MDE are missing, this skill can produce a readiness brief but not a reliable sample-size claim. [EXPLICIT]

Edge Cases

| Scenario | Handling | |----------|----------| | Empty or minimal input | Request clarification before proceeding | | Conflicting requirements | Flag conflicts explicitly, propose resolution | | Out-of-scope request | Redirect to appropriate skill or escalate |

JaviMontano/ab-testing

skills/ab-testing/SKILL.md

Designs and reviews A/B tests with explicit hypothesis, primary metric, guardrail metrics, variants, sample-size assumptions, duration, stopping rules, instrumentation checks, and decision criteria. [EXPLICIT] Trigger: "ab testing, a/b test, experiment design, split test, hypothesis formulation, statistical significance, sample size calculation, test duration"

1 stars

testing

Updated May 30, 2026

$ install --global

skillsauth

npx skillsauth add JaviMontano/jm-adk-alfa ab-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 30, 2026, 4:48 AM61.2s19 files scanned

SKILL.md

name:: ab-testing
author:: JM Labs (Javier Montaño)
version:: 1.0.0
description:: >
Trigger:: ab testing, a/b test, experiment design, split test, hypothesis formulation, statistical significance, sample size calculation, test duration

Ab Testing

"Method over hacks. Evidence over assumption."

TL;DR

Procedure

Step 1: Discover

Identify the experiment goal, decision owner, user segment, traffic source, current baseline, candidate variant, and business constraint. [EXPLICIT]
Capture the primary metric, guardrail metrics, minimum detectable effect (MDE), desired power, significance threshold, and acceptable runtime. [EXPLICIT]
Inspect existing analytics, event names, funnel definitions, docs, or code when available. If they are missing, mark the gap instead of inventing metrics. [EXPLICIT]

Step 2: Analyze

Convert the idea into a falsifiable hypothesis: "If we change X for audience Y, metric Z will move by N because R." [EXPLICIT]
Check whether an A/B test is appropriate or whether discovery, analytics cleanup, usability testing, or a feature flag rollout is safer first. [EXPLICIT]
Estimate sample-size and duration qualitatively or quantitatively from the provided baseline, traffic, variance, MDE, power, and significance inputs. If any required input is absent, return a requirements gap and a formula-ready checklist. [EXPLICIT]
Identify validity threats: novelty effects, peeking, seasonality, sample ratio mismatch, overlapping experiments, instrumentation drift, and segment bias. [EXPLICIT]

Step 3: Execute

Produce an experiment brief with hypothesis, variants, metric contract, sample-size assumptions, duration recommendation, launch checklist, monitoring plan, and decision rule. [EXPLICIT]
If asked to review an existing test, classify it as ready, blocked, risky, or invalid, and name the blocking evidence. [EXPLICIT]
Keep implementation recommendations scoped to the experiment; route broader funnel or analytics work to related skills when needed. [EXPLICIT]

Step 4: Validate

Verify that the primary metric has one owner and one definition. [EXPLICIT]
Verify that every recommendation is tied to provided evidence, an explicit assumption, or an open data requirement. [EXPLICIT]
Verify that the decision rule says what happens for win, loss, inconclusive, harmed guardrail, and instrumentation failure outcomes. [EXPLICIT]
Do not claim statistical significance, lift, ROI, or causality unless the required data and method are available. [EXPLICIT]

Quality Criteria

[ ] Hypothesis is falsifiable and names change, audience, metric, expected movement, and rationale.
[ ] Primary metric, guardrail metrics, event names, and data source are defined or explicitly marked as missing.
[ ] Sample-size, MDE, power, significance, and duration assumptions are stated; absent inputs are listed as blocking requirements.
[ ] Launch, monitoring, stopping, and decision rules are actionable.
[ ] Risks include at least peeking, seasonality, overlapping experiments, sample ratio mismatch, and instrumentation drift when relevant.
[ ] Claims use evidence tags or are marked as assumptions/open questions.

Anti-Patterns

Related Skills

analytics-events
funnel-analytics
conversion-optimization
data-validation
experimentation-strategy

Usage

Example invocations:

"/ab-testing" — Run the full ab testing workflow
"ab testing on this project" — Apply to current context

Assumptions & Limits

Assumes access to project artifacts (code, docs, configs) [EXPLICIT]
Does not replace domain expert judgment for final decisions [EXPLICIT]
If baseline conversion, traffic, variance, or MDE are missing, this skill can produce a readiness brief but not a reliable sample-size claim. [EXPLICIT]

Edge Cases

Related Skills

JaviMontano/analytics-engineering

development

VerifiedTrustedCommunity

This skill should be used when the user asks to "design analytics models", "set up a dbt project", "plan data transformations", "define data contracts", or "model a star schema", or mentions staging models, marts, incremental strategies, or materializations. It produces analytics pipeline designs with dbt-style transformations, data modeling patterns, testing strategies, and documentation plans. [EXPLICIT] Use this skill whenever the user needs source-to-target mapping, materialization decisions, or transformation framework architecture, even if they don't explicitly ask for "analytics engineering". [EXPLICIT]

1SKILL.mdUpdated May 30, 2026

JaviMontano/analytics-engineering

JaviMontano/alerting-strategy

testing

VerifiedTrustedCommunity

Alert fatigue prevention, escalation rules, severity classification. [EXPLICIT] Trigger: "alerting strategy"

1SKILL.mdUpdated May 30, 2026

JaviMontano/alerting-strategy

JaviMontano/ai-workflow-automation

tools

VerifiedTrustedCommunity

LLM-in-the-loop workflows, human-AI handoff, approval gates. [EXPLICIT] Trigger: "ai workflow automation"

1SKILL.mdUpdated May 30, 2026

JaviMontano/ai-workflow-automation

JaviMontano/ai-testing-strategy

tools

VerifiedTrustedCommunity

Comprehensive testing strategy for AI systems — testing scope matrix (6 types x 6 layers), model prediction testing, data quality testing, compliance and fairness testing, integration approaches, and CI/CD test automation. This skill should be used when the user asks to "define AI testing strategy", "test ML models", "design data quality tests", "plan fairness testing", "test AI pipelines", "design integration tests for ML", or mentions adversarial testing, drift simulation, model regression testing, bias testing, explainability testing, or AI test automation. [EXPLICIT]

1SKILL.mdUpdated May 30, 2026

JaviMontano/ai-testing-strategy

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/JaviMontano/jm-adk-alfa.git

# Copy into Claude Code skills folder (global)
cp -r jm-adk-alfa/skills/ab-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

JaviMontano/jm-adk-alfa

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT