skills/statistical-analysis/SKILL.md
Guided statistical analysis with test selection and reporting. Use when you need help choosing appropriate tests for your data, assumption checking, power analysis, and APA-formatted results. Best for academic research reporting, test selection guidance. For implementing specific models programmatically use statsmodels.
npx skillsauth add tusosos/manus-knowledge-base statistical-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Statistical analysis is a systematic process for testing hypotheses and quantifying relationships. Conduct hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, and Bayesian analyses with assumption checks and APA reporting. Apply this skill for academic research.
This skill should be used when:
Use this decision tree to determine your analysis path:
START
│
├─ Need to SELECT a statistical test?
│ └─ YES → See "Test Selection Guide"
│ └─ NO → Continue
│
├─ Ready to check ASSUMPTIONS?
│ └─ YES → See "Assumption Checking"
│ └─ NO → Continue
│
├─ Ready to run ANALYSIS?
│ └─ YES → See "Running Statistical Tests"
│ └─ NO → Continue
│
└─ Need to REPORT results?
└─ YES → See "Reporting Results"
Use references/test_selection_guide.md for comprehensive guidance. Quick reference:
Comparing Two Groups:
Comparing 3+ Groups:
Relationships:
Bayesian Alternatives: All tests have Bayesian versions that provide:
references/bayesian_statistics.mdALWAYS check assumptions before interpreting test results.
Use the provided scripts/assumption_checks.py module for automated checking:
from scripts.assumption_checks import comprehensive_assumption_check
# Comprehensive check with visualizations
results = comprehensive_assumption_check(
data=df,
value_col='score',
group_col='group', # Optional: for group comparisons
alpha=0.05
)
This performs:
For targeted checks, use individual functions:
from scripts.assumption_checks import (
check_normality,
check_normality_per_group,
check_homogeneity_of_variance,
check_linearity,
detect_outliers
)
# Example: Check normality with visualization
result = check_normality(
data=df['score'],
name='Test Score',
alpha=0.05,
plot=True
)
print(result['interpretation'])
print(result['recommendation'])
Normality violated:
Homogeneity of variance violated:
Linearity violated (regression):
See references/assumptions_and_diagnostics.md for comprehensive guidance.
Primary libraries for statistical analysis:
import pingouin as pg
import numpy as np
# Run independent t-test
result = pg.ttest(group_a, group_b, correction='auto')
# Extract results
t_stat = result['T'].values[0]
df = result['dof'].values[0]
p_value = result['p-val'].values[0]
cohens_d = result['cohen-d'].values[0]
ci_lower = result['CI95%'].values[0][0]
ci_upper = result['CI95%'].values[0][1]
# Report
print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}")
print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")
import pingouin as pg
# One-way ANOVA
aov = pg.anova(dv='score', between='group', data=df, detailed=True)
print(aov)
# If significant, conduct post-hoc tests
if aov['p-unc'].values[0] < 0.05:
posthoc = pg.pairwise_tukey(dv='score', between='group', data=df)
print(posthoc)
# Effect size
eta_squared = aov['np2'].values[0] # Partial eta-squared
print(f"Partial η² = {eta_squared:.3f}")
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
# Fit model
X = sm.add_constant(X_predictors) # Add intercept
model = sm.OLS(y, X).fit()
# Summary
print(model.summary())
# Check multicollinearity (VIF)
vif_data = pd.DataFrame()
vif_data["Variable"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif_data)
# Check assumptions
residuals = model.resid
fitted = model.fittedvalues
# Residual plots
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Residuals vs fitted
axes[0, 0].scatter(fitted, residuals, alpha=0.6)
axes[0, 0].axhline(y=0, color='r', linestyle='--')
axes[0, 0].set_xlabel('Fitted values')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].set_title('Residuals vs Fitted')
# Q-Q plot
from scipy import stats
stats.probplot(residuals, dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Normal Q-Q')
# Scale-Location
axes[1, 0].scatter(fitted, np.sqrt(np.abs(residuals / residuals.std())), alpha=0.6)
axes[1, 0].set_xlabel('Fitted values')
axes[1, 0].set_ylabel('√|Standardized residuals|')
axes[1, 0].set_title('Scale-Location')
# Residuals histogram
axes[1, 1].hist(residuals, bins=20, edgecolor='black', alpha=0.7)
axes[1, 1].set_xlabel('Residuals')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Histogram of Residuals')
plt.tight_layout()
plt.show()
import pymc as pm
import arviz as az
import numpy as np
with pm.Model() as model:
# Priors
mu1 = pm.Normal('mu_group1', mu=0, sigma=10)
mu2 = pm.Normal('mu_group2', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=10)
# Likelihood
y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a)
y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b)
# Derived quantity
diff = pm.Deterministic('difference', mu1 - mu2)
# Sample
trace = pm.sample(2000, tune=1000, return_inferencedata=True)
# Summarize
print(az.summary(trace, var_names=['difference']))
# Probability that group1 > group2
prob_greater = np.mean(trace.posterior['difference'].values > 0)
print(f"P(μ₁ > μ₂ | data) = {prob_greater:.3f}")
# Plot posterior
az.plot_posterior(trace, var_names=['difference'], ref_val=0)
Effect sizes quantify magnitude, while p-values only indicate existence of an effect.
See references/effect_sizes_and_power.md for comprehensive guidance.
| Test | Effect Size | Small | Medium | Large | |------|-------------|-------|--------|-------| | T-test | Cohen's d | 0.20 | 0.50 | 0.80 | | ANOVA | η²_p | 0.01 | 0.06 | 0.14 | | Correlation | r | 0.10 | 0.30 | 0.50 | | Regression | R² | 0.02 | 0.13 | 0.26 | | Chi-square | Cramér's V | 0.07 | 0.21 | 0.35 |
Important: Benchmarks are guidelines. Context matters!
Most effect sizes are automatically calculated by pingouin:
# T-test returns Cohen's d
result = pg.ttest(x, y)
d = result['cohen-d'].values[0]
# ANOVA returns partial eta-squared
aov = pg.anova(dv='score', between='group', data=df)
eta_p2 = aov['np2'].values[0]
# Correlation: r is already an effect size
corr = pg.corr(x, y)
r = corr['r'].values[0]
Always report CIs to show precision:
from pingouin import compute_effsize_from_t
# For t-test
d, ci = compute_effsize_from_t(
t_statistic,
nx=len(group1),
ny=len(group2),
eftype='cohen'
)
print(f"d = {d:.2f}, 95% CI [{ci[0]:.2f}, {ci[1]:.2f}]")
Determine required sample size before data collection:
from statsmodels.stats.power import (
tt_ind_solve_power,
FTestAnovaPower
)
# T-test: What n is needed to detect d = 0.5?
n_required = tt_ind_solve_power(
effect_size=0.5,
alpha=0.05,
power=0.80,
ratio=1.0,
alternative='two-sided'
)
print(f"Required n per group: {n_required:.0f}")
# ANOVA: What n is needed to detect f = 0.25?
anova_power = FTestAnovaPower()
n_per_group = anova_power.solve_power(
effect_size=0.25,
ngroups=3,
alpha=0.05,
power=0.80
)
print(f"Required n per group: {n_per_group:.0f}")
Determine what effect size you could detect:
# With n=50 per group, what effect could we detect?
detectable_d = tt_ind_solve_power(
effect_size=None, # Solve for this
nobs1=50,
alpha=0.05,
power=0.80,
ratio=1.0,
alternative='two-sided'
)
print(f"Study could detect d ≥ {detectable_d:.2f}")
Note: Post-hoc power analysis (calculating power after study) is generally not recommended. Use sensitivity analysis instead.
See references/effect_sizes_and_power.md for detailed guidance.
Follow guidelines in references/reporting_standards.md.
Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than
Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77,
95% CI [0.36, 1.18], two-tailed. Assumptions of normality (Shapiro-Wilk:
Group A W = 0.97, p = .18; Group B W = 0.96, p = .12) and homogeneity
of variance (Levene's F(1, 98) = 1.23, p = .27) were satisfied.
A one-way ANOVA revealed a significant main effect of treatment condition
on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc
comparisons using Tukey's HSD indicated that Condition A (M = 78.2,
SD = 7.3) scored significantly higher than Condition B (M = 71.5,
SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9,
p < .001, d = 1.07). Conditions B and C did not differ significantly
(p = .52, d = 0.18).
Multiple linear regression was conducted to predict exam scores from
study hours, prior GPA, and attendance. The overall model was significant,
F(3, 146) = 45.2, p < .001, R² = .48, adjusted R² = .47. Study hours
(B = 1.80, SE = 0.31, β = .35, t = 5.78, p < .001, 95% CI [1.18, 2.42])
and prior GPA (B = 8.52, SE = 1.95, β = .28, t = 4.37, p < .001,
95% CI [4.66, 12.38]) were significant predictors, while attendance was
not (B = 0.15, SE = 0.12, β = .08, t = 1.25, p = .21, 95% CI [-0.09, 0.39]).
Multicollinearity was not a concern (all VIF < 1.5).
A Bayesian independent samples t-test was conducted using weakly
informative priors (Normal(0, 1) for mean difference). The posterior
distribution indicated that Group A scored higher than Group B
(M_diff = 6.8, 95% credible interval [3.2, 10.4]). The Bayes Factor
BF₁₀ = 45.3 provided very strong evidence for a difference between
groups, with a 99.8% posterior probability that Group A's mean exceeded
Group B's mean. Convergence diagnostics were satisfactory (all R̂ < 1.01,
ESS > 1000).
Consider Bayesian approaches when:
See references/bayesian_statistics.md for comprehensive guidance on:
This skill includes comprehensive reference materials:
comprehensive_assumption_check(): Complete workflowcheck_normality(): Normality testing with Q-Q plotscheck_homogeneity_of_variance(): Levene's test with box plotscheck_linearity(): Regression linearity checksdetect_outliers(): IQR and z-score outlier detectionWhen beginning a statistical analysis:
For questions about:
Key textbooks:
Online resources:
tools
Download video and audio from YouTube and other platforms with yt-dlp. Use when a user asks to download YouTube videos, extract audio from videos, download playlists, get subtitles, download specific formats or qualities, batch download, archive channels, extract metadata, embed thumbnails, download from social media platforms (Twitter, Instagram, TikTok), or build media ingestion pipelines. Covers format selection, audio extraction, playlists, subtitles, metadata, and automation.
development
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
development
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
development
Use when you have a spec or requirements for a multi-step task, before touching code