skills/ab-test-setup/SKILL.md
Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
npx skillsauth add benjaminastera/antigravity-awesome-skills ab-test-setupInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Ensure every A/B test is valid, rigorous, and safe before a single line of code is written.
You must have:
A valid hypothesis includes:
Before designing variants or metrics, you MUST:
Ask explicitly:
“Is this the final hypothesis we are committing to for this test?”
Do NOT proceed until confirmed.
Explicitly list assumptions about:
If assumptions are weak or violated:
Choose the simplest valid test:
Default to A/B unless there is a clear reason otherwise.
Define upfront:
Estimate:
Do NOT proceed without a realistic sample size estimate.
You may proceed to implementation only if all are true:
If any item is missing, stop and resolve it.
DO:
DO NOT:
When interpreting results:
| Result | Action | | -------------------- | -------------------------------------- | | Significant positive | Consider rollout | | Significant negative | Reject variant, document learning | | Inconclusive | Consider more traffic or bolder change | | Guardrail failure | Do not ship, even if primary wins |
Document:
Store records in a shared, searchable location to avoid repeated failures.
Refuse to proceed if:
Explain why and recommend next steps.
A/B testing is not about proving ideas right. It is about learning the truth with confidence.
If you feel tempted to rush, simplify, or “just try it” — that is the signal to slow down and re-check the design.
This skill is applicable to execute the workflow or actions described in the overview.
development
Multi-agent autonomous startup system for Claude Code. Triggers on "Loki Mode". Orchestrates 100+ specialized agents across engineering, QA, DevOps, security, data/ML, business operations,...
testing
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or ...
development
You are an expert prompt engineer specializing in crafting effective prompts for LLMs through advanced techniques including constitutional AI, chain-of-thought reasoning, and model-specific optimizati
development
You are an expert LangChain agent developer specializing in production-grade AI systems using LangChain 0.1+ and LangGraph.