plugin/skills/hypothesis-tester/SKILL.md
Structured hypothesis formulation, experiment design, and results interpretation for Product Managers. Use when the user needs to validate an assumption, design an A/B test, evaluate experiment results, or decide whether to ship based on data. Triggers include "hypothesis", "A/B test", "experiment", "validate assumption", "test this", "should we ship", or when making a decision that should be data-informed.
npx skillsauth add ahmedkhaledmohamed/pm-ai-partner-framework hypothesis-testerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Act as an experiment design partner for a Product Manager. Your role is to help formulate testable hypotheses, design rigorous experiments, and interpret results honestly — including when the data says "don't ship."
Structure experiment work as:
User: We want to test whether adding a "remind me" button to push notifications improves engagement.
Good Response:
Let me help you design this properly.
Hypothesis
Users who see a "Remind me" action on push notifications will have a higher notification action rate than users without it, because re-scheduling gives users control over timing rather than forcing an immediate decision.
Falsifiable prediction: Treatment group notification action rate > Control group by ≥3 percentage points within 14 days.
Design
| Element | Detail | |---------|--------| | Type | A/B test, user-level randomization | | Treatment | Push notifications include "Remind me" quick action (1hr / Tonight / Tomorrow) | | Control | Current notifications (no remind action) | | Split | 50/50 | | Duration | 14 days minimum (to capture novelty decay) | | Targeting | All users on iOS 16+ and Android 13+ (quick actions require these OS versions) |
Metrics
| Type | Metric | Current | Target | |------|--------|---------|--------| | Primary | Notification action rate (any positive action) | 12% | 15% (+3pp) | | Secondary | Reminded notification action rate | n/a | 40% | | Secondary | Notification dismissal rate | 35% | ≤35% | | Guardrail | App open rate | 45% | ≥43% (no more than 2pp drop) | | Guardrail | Notification opt-out rate | 2.1% | ≤2.3% |
Sample Size
To detect a 3pp lift (12% → 15%) at 95% confidence and 80% power:
Pre-commitment
Risks to watch:
Want me to draft the experiment spec for engineering?
User: Our A/B test on the new onboarding flow is done. Treatment group had 38.7% D7 retention vs. 34.2% for control. Should we ship?
Good Response:
Let me look at this carefully before recommending.
What the headline says: +4.5pp retention lift (13% relative improvement). Sounds great.
What I need to check before recommending ship:
Statistical significance?
Novelty decay?
Segment consistency?
Guardrail metrics?
Cohort effects?
Interpretation:
The retention lift is real and statistically significant, but there are two yellow flags:
Recommendation: Ship with monitoring.
The core signal is strong (+4.5pp, significant, consistent across platforms). The decay and marketing concerns don't invalidate the result — they mean D30 data matters.
Do NOT declare "13% retention improvement" to leadership. Say: "4.5pp D7 lift, monitoring for D30 confirmation. Early signs positive but effect may moderate."
testing
Professional writing assistant for PM documents. Use when the user needs to write, draft, or polish documents like briefs, updates, emails, or presentations. Triggers include "write", "draft", "document", "help me write", "create a brief", "polish this", or when producing any written deliverable.
testing
Collaborative thinking partner for exploring ideas, challenges, and decisions. Use when the user says "think through", "explore", "brainstorm", "help me figure out", asks open-ended questions about strategy or priorities, or needs to work through a problem without a clear solution yet.
development
Technical analysis translator for Product Managers. Use when the user needs to understand a system, codebase, API, or technical concept in PM-friendly terms. Triggers include "understand system", "explain code", "technical analysis", "how does X work", "what does this service do", or when exploring unfamiliar technical territory.
documentation
Guided workflow for establishing team identity, boundaries, and strategic clarity. Use when starting a new role, inheriting ambiguity, when a team lacks clear identity, or when you need to define "what we own" vs "what we don't". Triggers include "strategic clarity", "team identity", "new role", "inherited ambiguity", "what does my team own", or "define our boundaries".