skills/statistical-analysis-planner/SKILL.md
Plan and report statistical rigor for ML experiment results. Use when significance testing, effect size reporting, confidence intervals, seed variance analysis, or multiple-comparison corrections are needed before including results in a paper or rebuttal.
npx skillsauth add a-green-hand-jack/ml-research-skills statistical-analysis-plannerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design the statistical analysis before running, and report it correctly after results exist. This skill prevents underpowered claims, misleading averages-without-variance, and significance theater in ML papers.
Use this skill when:
Do not use this skill to run the experiments — use run-experiment. Do not use this skill to interpret surprising results scientifically — use result-diagnosis. Use this skill after results exist (or in planning mode before deciding how many seeds to run).
Pair this skill with:
experiment-design-planner to plan the number of seeds, runs, and controls before runningresult-diagnosis when the statistical analysis reveals that a result is not reliablepaper-evidence-board to update evidence slots with confidence-annotated claimstable-results-review to ensure result tables report variance and pass statistical requirements<installed-skill-dir>/
├── SKILL.md
└── references/
└── test-selection.md
references/test-selection.md when choosing a statistical test or confidence interval method.memory/claim-board.md and memory/evidence-board.md to understand what claims need statistical backing.For each result that will appear in the paper, record:
Classify each result as:
requires-analysis: main claim or primary comparisonsupporting-analysis: ablation or secondary resultdescriptive-only: mean reported, no significance claimsingle-run: only one run exists, limitations must be acknowledgedRead references/test-selection.md.
For each requires-analysis result:
Result: <claim or comparison>
Metric: <metric name>
N seeds / runs: <count>
Distribution assumption: normal / non-normal / unknown
Test: <paired t-test / Wilcoxon / bootstrap CI / permutation test / McNemar>
Significance threshold: α = 0.05 (or 0.01 for primary claim)
Effect size measure: Cohen's d / Cliff's delta / relative improvement %
Multiple comparison correction: <Bonferroni / Holm / Benjamini-Hochberg / none>
Report format: mean ± std / 95% CI / p-value + effect size
For seed variance analysis, plan:
For results that already exist, compute:
For compute-limited settings (1–3 seeds):
For main result tables:
Method A: 82.3 ± 1.2 (mean ± std, N=5 seeds)
[80.4, 84.1] 95% CI
p < 0.05 vs Baseline B (paired t-test, Bonferroni-corrected)
Effect size: d = 0.83 (large)
For text claims:
For low-seed settings:
If the paper reports more than 3 comparisons on the same held-out set:
memory/evidence-board.md when statistical analysis changes the confidence level of a claimmemory/claim-board.md to reflect corrected or strengthened claim wordingmemory/risk-board.md when low seed count or failed significance is a reviewer riskBefore finalizing:
testing
Bootstrap project-local ml-research-skills. Use from global installs when creating a new ML research project, enabling this collection in an existing ML research repo, or deciding whether to install the full bundle locally. Route to project-init for new projects; do not handle paper or experiment work directly.
development
Route project operations tasks — git, memory, bootstrap, remote, workspace, code review, timeline, ops — to the correct skill. Use when the task involves commits, pushes, worktrees, project memory, enabling project-local skills, SSH/server coordination, sidecar runners, or audits. Do not solve the ops task directly.
testing
Route ML/AI paper writing tasks to the correct skill — contract planning, prose drafting, section writing, consistency editing, review simulation, rebuttal, submission, or citation work. Use when the task involves writing, revising, reviewing, or submitting a paper instead of guessing between paper-writing-assistant, paper-writing-contract-planner, paper-reviewer-simulator, auto-paper-improvement-loop, or citation skills. Do not draft prose directly.
data-ai
Project-local router for ML research skill selection. Use inside an initialized ML research project, or while maintaining this skill repo, when the user describes an ML research/paper/experiment/discovery/ops/release workflow and may not know the skill; route to a domain router or high-signal leaf. Do not use for generic non-ML projects.