Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

product-on-purpose/measure-experiment-results

Name: measure-experiment-results
Author: product-on-purpose

skills/measure-experiment-results/SKILL.md

npx skillsauth add product-on-purpose/pm-skills measure-experiment-results

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Error

VirusTotalMulti-engine malware detection

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Experiment Results

An experiment results document captures what happened when you tested a hypothesis, including statistical outcomes, segment analysis, learnings, and clear recommendations. Good results documentation turns individual experiments into organizational knowledge that improves future decision-making.

When to Use

After an A/B test or experiment reaches statistical significance
When an experiment is ended early (for any reason)
To communicate findings to stakeholders who weren't involved
During decision-making about whether to ship, iterate, or kill a feature
To build a repository of learnings that inform future experiments

When NOT to Use

The experiment is not designed or run yet -> use measure-experiment-design
The results demand a direction decision -> use iterate-pivot-decision; this skill reports the evidence, that one decides
You want the transferable learning banked for the organization -> follow up with iterate-lessons-log
Your data is survey responses, not a controlled experiment -> use measure-survey-analysis

Instructions

When asked to document experiment results, follow these steps:

Summarize the Experiment Provide context: what was tested, when it ran, how much traffic it received. Link to the original experiment design document if one exists.
Restate the Hypothesis Remind readers what you believed would happen and why. This frames the results interpretation.
Present Primary Results Show the primary metric outcome clearly: what were the values for control and treatment? Include statistical significance (p-value), confidence intervals, and sample sizes. Be honest about whether results are conclusive.
Analyze Secondary Metrics Present guardrail metrics that ensure you didn't cause unintended harm. Note any secondary metrics that moved unexpectedly.both positive and negative.
Segment the Data Look for differential effects across user segments (platform, tenure, plan type, etc.). Sometimes overall results mask important segment-level insights.
Extract Learnings What did you learn beyond the numbers? Include surprising findings, questions raised, and implications for the product hypothesis. Negative results are valuable learnings.
Make a Recommendation Be clear: should we ship, iterate, or kill? Support the recommendation with the evidence. If the decision is nuanced, explain the trade-offs.
Define Next Steps Specify what happens now.engineering work to ship, follow-up experiments, metrics to continue monitoring, or documentation to update.

Output Format

Use the template in references/TEMPLATE.md to structure the output. A complete readout fills every template section: Summary; Hypothesis Recap; Results; Segment Analysis; Visualization; Learnings; Recommendation; Next Steps; and Appendix.

Quality Checklist

Before finalizing, verify:

[ ] Statistical methods and significance are clearly stated
[ ] Confidence intervals are included (not just p-values)
[ ] Segment analysis checked for differential effects
[ ] Secondary/guardrail metrics are reported
[ ] Learnings go beyond just the numbers
[ ] Recommendation is clear and actionable
[ ] Negative or inconclusive results are reported honestly

Examples

See references/EXAMPLE.md for a completed example.

product-on-purpose/measure-experiment-results

skills/measure-experiment-results/SKILL.md

Documents the results of a completed experiment or A/B test with statistical analysis, learnings, and recommendations. Use after experiments conclude to communicate findings, inform decisions, and build organizational knowledge.

287 stars

development

Updated Jun 11, 2026

$ install --global

skillsauth

npx skillsauth add product-on-purpose/pm-skills measure-experiment-results

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Error

VirusTotalMulti-engine malware detection

70%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 11, 2026, 5:07 AM120.2s4 files scanned

SKILL.md

name:: measure-experiment-results
description:: Documents the results of a completed experiment or A/B test with statistical analysis, learnings, and recommendations. Use after experiments conclude to communicate findings, inform decisions, and build organizational knowledge.
license:: Apache-2.0
phase:: measure
version:: 2.1.0
updated:: 2026-06-10
category:: reflection
frameworks:: [triple-diamond, lean-startup, design-thinking]
author:: product-on-purpose

Experiment Results

When to Use

After an A/B test or experiment reaches statistical significance
When an experiment is ended early (for any reason)
To communicate findings to stakeholders who weren't involved
During decision-making about whether to ship, iterate, or kill a feature
To build a repository of learnings that inform future experiments

When NOT to Use

The experiment is not designed or run yet -> use measure-experiment-design
The results demand a direction decision -> use iterate-pivot-decision; this skill reports the evidence, that one decides
You want the transferable learning banked for the organization -> follow up with iterate-lessons-log
Your data is survey responses, not a controlled experiment -> use measure-survey-analysis

Instructions

When asked to document experiment results, follow these steps:

Summarize the Experiment Provide context: what was tested, when it ran, how much traffic it received. Link to the original experiment design document if one exists.
Restate the Hypothesis Remind readers what you believed would happen and why. This frames the results interpretation.
Present Primary Results Show the primary metric outcome clearly: what were the values for control and treatment? Include statistical significance (p-value), confidence intervals, and sample sizes. Be honest about whether results are conclusive.
Analyze Secondary Metrics Present guardrail metrics that ensure you didn't cause unintended harm. Note any secondary metrics that moved unexpectedly.both positive and negative.
Segment the Data Look for differential effects across user segments (platform, tenure, plan type, etc.). Sometimes overall results mask important segment-level insights.
Extract Learnings What did you learn beyond the numbers? Include surprising findings, questions raised, and implications for the product hypothesis. Negative results are valuable learnings.
Make a Recommendation Be clear: should we ship, iterate, or kill? Support the recommendation with the evidence. If the decision is nuanced, explain the trade-offs.
Define Next Steps Specify what happens now.engineering work to ship, follow-up experiments, metrics to continue monitoring, or documentation to update.

Output Format

Quality Checklist

Before finalizing, verify:

[ ] Statistical methods and significance are clearly stated
[ ] Confidence intervals are included (not just p-values)
[ ] Segment analysis checked for differential effects
[ ] Secondary/guardrail metrics are reported
[ ] Learnings go beyond just the numbers
[ ] Recommendation is clear and actionable
[ ] Negative or inconclusive results are reported honestly

Examples

See references/EXAMPLE.md for a completed example.

Related Skills

product-on-purpose/utility-pm-workflow-builder

tools

VerifiedTrustedCommunity

Guides a contributor from a workflow idea to a complete Workflow Implementation Packet (draft workflow file, draft workflow command, cross-cutting update checklist) in a staging area for review. Runs overlap analysis against the existing workflows with a Why Gate, then helps select and sequence skills with authored handoffs. Use when creating a new multi-skill workflow or promoting a repeated ad-hoc chain into a durable one. To build a single skill instead, use utility-pm-skill-builder; to run a sequence without authoring anything, use the chain command or utility-pm-workflow-orchestrator.

287SKILL.mdUpdated Jun 11, 2026

product-on-purpose/utility-pm-workflow-builder

product-on-purpose/utility-pm-workflow-orchestrator

tools

VerifiedTrustedCommunity

Run an ordered sequence of pm-skills against one input, pausing for go/no-go and stopping on a failed or empty step. Accepts a saved prioritized action plan (Mode A) or an ad-hoc named chain (Mode B; the chain command routes here). Explicit invocation only; run --dry-run first while the native path is EXPERIMENTAL. To author a durable workflow instead, use utility-pm-workflow-builder.

287SKILL.mdUpdated Jun 4, 2026

product-on-purpose/utility-pm-workflow-orchestrator

product-on-purpose/utility-pm-skill-auditor

tools

VerifiedTrustedCommunity

Run a repo-wide cross-cutting governance audit via the pm-skill-auditor sub-agent. Aggregates the enforcing validator suite, re-derives aggregate counters, and surfaces cross-cutting issues no single validator catches, graded P0/P1/P2/P3 with a machine-readable status. Use for pre-release readiness checks or a periodic repo health audit.

287SKILL.mdUpdated May 18, 2026

product-on-purpose/utility-pm-skill-auditor

product-on-purpose/utility-pm-release-conductor

tools

VerifiedTrustedCommunity

Walk the guided 6-gate release runbook (G0 readiness, G1 adversarial review, G2 version bump and CHANGELOG, G2.5 commit and re-verify, G3 tag and push, G4 post-tag hygiene) via the pm-release-conductor sub-agent. Refuses gate bypasses and tags only the re-verified SHA. Use when cutting a pm-skills release.

287SKILL.mdUpdated May 18, 2026

product-on-purpose/utility-pm-release-conductor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/product-on-purpose/pm-skills.git

# Copy into Claude Code skills folder (global)
cp -r pm-skills/skills/measure-experiment-results ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

product-on-purpose/pm-skills

287 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT