skills/biostatistics/statistical-analysis/SKILL.md
Guided statistical analysis: test choice, assumption checks, effect sizes, power, APA reporting. Pick tests, verify assumptions, or format results for publication. Covers frequentist (t-test, ANOVA, chi-square, regression, correlation, survival, count, reliability) and Bayesian. Use statsmodels or pymc-bayesian-modeling to fit.
npx skillsauth add jaechang-hits/sciagent-skills statistical-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Statistical analysis is the systematic process of selecting appropriate tests, verifying assumptions, quantifying effect magnitudes, and reporting results. This knowhow guides test selection, assumption diagnostics, and APA-style reporting for frequentist and Bayesian analyses in academic research.
| Aspect | Frequentist | Bayesian | |--------|-------------|----------| | Core output | p-value, confidence interval | Posterior distribution, credible interval | | Interpretation | "How likely is this data if H0 is true?" | "How likely is H1 given the data?" | | Null support | Cannot support H0 (only fail to reject) | Can quantify evidence for H0 via Bayes Factor | | Prior info | Not used | Incorporated via prior distributions | | Sample size | Requires adequate power | Works with any sample size | | Best for | Standard analyses, large samples | Small samples, prior info, complex models |
A statistically significant result (p < .05) may be trivially small in practice. Always report:
| Test | Effect Size | Small | Medium | Large | |------|-------------|-------|--------|-------| | t-test | Cohen's d | 0.20 | 0.50 | 0.80 | | t-test (small n) | Hedges' g | 0.20 | 0.50 | 0.80 | | ANOVA | eta-squared partial | 0.01 | 0.06 | 0.14 | | ANOVA | omega-squared | 0.01 | 0.06 | 0.14 | | Correlation | r | 0.10 | 0.30 | 0.50 | | Regression | R-squared | 0.02 | 0.13 | 0.26 | | Regression | f-squared | 0.02 | 0.15 | 0.35 | | Chi-square | Cramer's V | 0.07 | 0.21 | 0.35 | | Chi-square 2x2 | phi coefficient | 0.10 | 0.30 | 0.50 |
Cohen's benchmarks are guidelines, not rigid thresholds -- domain context always matters.
Most parametric tests require:
When assumptions are violated:
T-test assumptions: (1) Check normality per group with Shapiro-Wilk + Q-Q plots. (2) Check homogeneity with Levene's test. (3) If normality violated: Mann-Whitney U (independent) or Wilcoxon signed-rank (paired). If variance heterogeneity: use Welch's t-test.
ANOVA assumptions: (1) Normality per group. (2) Homogeneity via Levene's test. (3) For repeated measures: check sphericity (Mauchly's test); if violated, apply Greenhouse-Geisser (epsilon < 0.75) or Huynh-Feldt (epsilon > 0.75) correction. (4) If normality violated: Kruskal-Wallis (independent) or Friedman (repeated).
Linear regression assumptions: (1) Linearity via residuals-vs-fitted plot. (2) Independence via Durbin-Watson test (1.5-2.5 acceptable). (3) Homoscedasticity via Breusch-Pagan test + scale-location plot. (4) Normality of residuals via Q-Q plot + Shapiro-Wilk. (5) Multicollinearity via VIF (>10 = severe, >5 = moderate).
Logistic regression assumptions: (1) Independence. (2) Linearity of log-odds with continuous predictors (Box-Tidwell test). (3) No perfect multicollinearity (VIF). (4) Adequate sample size (10-20 events per predictor minimum).
Beyond the main decision flowchart, several specialized test families address specific data types:
Survival / time-to-event analysis:
Count outcome models:
Agreement and reliability:
Categorical data extensions:
What is your research question?
|
+-- Comparing GROUPS on a continuous outcome?
| |
| +-- How many groups?
| | +-- 2 groups
| | | +-- Independent -> Independent t-test (or Mann-Whitney U)
| | | +-- Paired/repeated -> Paired t-test (or Wilcoxon signed-rank)
| | +-- 3+ groups
| | +-- Independent -> One-way ANOVA (or Kruskal-Wallis)
| | +-- Repeated -> Repeated-measures ANOVA (or Friedman)
| |
| +-- Multiple factors? -> Factorial ANOVA / Mixed ANOVA
| +-- With covariates? -> ANCOVA
|
+-- Testing a RELATIONSHIP between variables?
| |
| +-- Both continuous?
| | +-- Normal -> Pearson correlation
| | +-- Non-normal or ordinal -> Spearman correlation
| |
| +-- Predicting continuous outcome?
| | +-- 1 predictor -> Simple linear regression
| | +-- Multiple predictors -> Multiple linear regression
| |
| +-- Predicting categorical outcome?
| | +-- Binary -> Logistic regression
| | +-- Ordinal -> Ordinal logistic regression
| |
| +-- Predicting count outcome?
| | +-- Equidispersed -> Poisson regression
| | +-- Overdispersed -> Negative binomial regression
| | +-- Excess zeros -> Zero-inflated Poisson/NB
| |
| +-- Time-to-event outcome?
| +-- Compare survival curves -> Log-rank test
| +-- With covariates -> Cox proportional hazards
|
+-- Testing ASSOCIATION between categorical variables?
| +-- Expected cell count >= 5 -> Chi-square test
| +-- Expected cell count < 5 -> Fisher's exact test
| +-- Ordered categories -> Cochran-Armitage trend test
| +-- Paired categories -> McNemar's test
|
+-- Assessing AGREEMENT / RELIABILITY?
+-- Categorical, 2 raters -> Cohen's kappa
+-- Categorical, >2 raters -> Fleiss' kappa
+-- Continuous ratings -> ICC
+-- Two measurement methods -> Bland-Altman analysis
+-- Internal consistency -> Cronbach's alpha
| Research Question | Data Type | Normal? | Test | Non-parametric Alternative | |-------------------|-----------|---------|------|---------------------------| | 2 independent groups | Continuous | Yes | Independent t-test | Mann-Whitney U | | 2 paired groups | Continuous | Yes | Paired t-test | Wilcoxon signed-rank | | 3+ independent groups | Continuous | Yes | One-way ANOVA | Kruskal-Wallis | | 3+ repeated groups | Continuous | Yes | Repeated-measures ANOVA | Friedman test | | 2 variables | Continuous | Yes | Pearson r | Spearman rho | | Predict continuous | Mixed | -- | Linear regression | -- | | Predict binary | Mixed | -- | Logistic regression | -- | | Predict counts | Count | -- | Poisson / Negative binomial | -- | | Time-to-event | Survival | -- | Cox PH / Log-rank | -- | | 2 categorical | Categorical | -- | Chi-square / Fisher's exact | -- | | Rater agreement | Categorical | -- | Cohen's kappa / Fleiss' kappa | -- | | Method agreement | Continuous | -- | Bland-Altman / ICC | -- |
Misinterpreting p-values as probability of the hypothesis being true. p-values measure P(data | H0), not P(H0 | data). How to avoid: Use precise language: "If the null hypothesis were true, the probability of observing data this extreme is p = ..."
Confusing statistical significance with practical importance. A large sample can make trivially small effects significant. How to avoid: Always report and interpret effect sizes alongside p-values
Running post-hoc power analysis after a non-significant result. Post-hoc power is a mathematical function of the p-value and adds no new information. How to avoid: Use sensitivity analysis instead -- determine what effect size the study could detect at 80% power
Ignoring assumption violations and proceeding with parametric tests. How to avoid: Run assumption checks systematically. Use Welch's corrections, non-parametric alternatives, or transformations when violated
Multiple comparisons without correction. Running 20 tests at alpha = .05 gives ~64% chance of at least one false positive. How to avoid: Apply Bonferroni, Holm, or FDR correction. Report both corrected and uncorrected p-values
Treating ordinal data as continuous. Likert scales are ordinal -- means and standard deviations assume equal intervals. How to avoid: Use non-parametric tests (Mann-Whitney, Kruskal-Wallis) or ordinal regression
Ignoring missing data patterns. Listwise deletion assumes MCAR, which is rarely true. How to avoid: Assess missingness mechanism (MCAR, MAR, MNAR). Use multiple imputation for MAR data
Confusing correlation with causation. Observational studies cannot establish causal relationships regardless of effect size. How to avoid: Use causal language only for experimental designs with random assignment
Not reporting non-significant results. Publication bias and file-drawer effect distort the literature. How to avoid: Report all pre-registered analyses. Consider registered reports
Using one-tailed tests to "improve" significance. One-tailed tests should be pre-specified based on strong directional hypotheses. How to avoid: Default to two-tailed. Only use one-tailed when justified a priori
Define research question and hypotheses
Select statistical test (use Decision Framework above)
Conduct a priori power analysis
statsmodels.stats.power, pingouinInspect and clean data
assumption_checks.py script provides automated normality, homogeneity, and outlier detection with visualizationCheck assumptions (see Test-Specific Assumption Workflows above)
Run primary analysis
references/bayesian_statistics.md)Conduct post-hoc and secondary analyses
Report results in APA format
references/reporting_standards.md for templatesreferences/effect_sizes_and_power.md -- Detailed guide to calculating, interpreting, and reporting effect sizes (Cohen's d, Hedges' g, Glass's delta, eta-squared, omega-squared, partial eta-squared, phi coefficient, standardized beta, f-squared, Cramer's V, odds ratio); a priori, sensitivity, and correlation power analysis with code examples. Condensed from 582-line original.
references/bayesian_statistics.md -- Comprehensive Bayesian analysis guide: Bayes' theorem, prior specification, ROPE (Region of Practical Equivalence), prior sensitivity analysis, Bayesian t-test/ANOVA/correlation/regression, hierarchical models, model comparison (WAIC/LOO), convergence diagnostics. Condensed from 662-line original.
references/reporting_standards.md -- APA-style reporting templates for t-tests, ANOVA, regression, correlation, chi-square, non-parametric, and Bayesian analyses; pre-registration guidance; methods section templates (participants, design, measures); null results reporting; reporting checklist. Condensed from 470-line original.
test_selection_guide.md (130 lines original) -- Fully consolidated into Decision Framework (flowchart + Quick Reference Table) and Specialized Test Categories subsection in Key Concepts. Combined coverage: flowchart (~35 lines) + Quick Reference Table (~15 lines) + Specialized Test Categories (~35 lines) = ~85 lines covering all original capabilities. Original content on sample size considerations, multiple comparisons, and missing data was consolidated into Best Practices and Common Pitfalls. Omitted: study design considerations (RCTs, observational, clustered data) -- general guidance covered by statsmodels-statistical-modeling skill.
assumptions_and_diagnostics.md (370 lines original) -- Fully consolidated into Key Concepts (Assumptions Overview + Test-Specific Assumption Workflows) and Workflow Steps 4-5. Combined coverage: Assumptions Overview (~12 lines) + Test-Specific Assumption Workflows (~20 lines) + Workflow Steps 4-5 (~16 lines) = ~48 lines. The original contained detailed code blocks for each assumption check; since this is Knowhow (not Skill), code is referenced rather than reproduced. Key diagnostic thresholds preserved (VIF > 10, Durbin-Watson 1.5-2.5, variance ratio < 2-3). Omitted: extensive Python code blocks for individual checks (normality, homogeneity, linearity, logistic regression diagnostics) -- available in scipy.stats and pingouin documentation. Sample size rules of thumb covered in Workflow Step 3.
assumption_checks.py (540 lines) -- Contains 6 functions: check_normality(), check_normality_per_group(), check_homogeneity_of_variance(), check_linearity(), detect_outliers(), comprehensive_assumption_check(). As Knowhow entry, script functions are referenced in Workflow Step 4 rather than reproduced inline. Key capabilities (Shapiro-Wilk, Levene's, IQR/z-score outlier detection, Q-Q plots) are described in Assumptions Overview and Test-Specific Assumption Workflows. Users needing automated checking should use scipy.stats and pingouin directly following the patterns described.tools
Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.
tools
smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock-blind-docking; for the Python-bindings/Vinardo workflow use autodock-vina-docking.
development
mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.
development
Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.