skills/cogsci-statistics/SKILL.md
Domain-specific statistical modeling guidance for cognitive science and neuroscience, encoding when and how to apply mixed models, correction methods, Bayesian approaches, and effect size reporting
npx skillsauth add haoxuanlithuai/awesome_cognitive_and_neuroscience_skills cogsci-statisticsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill encodes domain-specific statistical knowledge for cognitive science and neuroscience research. It addresses the modeling decisions, correction strategies, and reporting conventions that a general-purpose statistician or programmer would get wrong without training in the field. For concrete analysis recipes with code, see references/common-analyses.md.
Before executing the domain-specific steps below, you MUST:
For detailed methodology guidance, see the research-literacy skill.
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Critical domain knowledge: Clark (1973) demonstrated that failing to treat items as random effects inflates Type I error. This remains one of the most common statistical errors in cognitive science. If your stimuli are sampled from a larger population (e.g., words, faces, scenes), you must account for item variability.
Are your stimuli sampled from a larger population?
|
+-- YES --> Mixed-effects model with crossed random effects
| (subjects and items)
|
+-- NO (e.g., fixed set of 4 task conditions) -->
|
+-- Any missing data, unbalanced cells, or continuous predictors?
| |
| +-- YES --> Mixed-effects model (subjects as random effect)
| |
| +-- NO --> Repeated-measures ANOVA is acceptable
|
+-- Need trial-level analysis (e.g., RT distributions)?
|
+-- YES --> Mixed-effects model (operates on individual trials)
+-- NO --> Repeated-measures ANOVA on condition means
Barr et al. (2013) recommend fitting the maximal random effects structure justified by the design to minimize Type I error. This means including random intercepts and slopes for all within-unit factors.
For a typical 2x2 design with factors A (within-subjects, within-items) and B (within-subjects, between-items):
# Maximal structure (Barr et al., 2013)
lmer(RT ~ A * B + (1 + A * B | Subject) + (1 + A | Item), data = d)
Convergence failures are common with complex random effects. Use this hierarchy (Barr et al., 2013; Matuschek et al., 2017):
|| in lme4)Do NOT simply drop all random slopes to achieve convergence. This inflates Type I error and undermines the purpose of mixed-effects modeling (Barr et al., 2013).
| Design | Random Effects | Rationale |
|--------|---------------|-----------|
| Lexical decision (words as items) | (1 + condition | subj) + (1 + condition | item) | Subjects and words are crossed |
| Stroop task (fixed conditions) | (1 + congruency | subj) | Conditions are fixed, not sampled |
| Picture naming (pictures as items) | (1 + SOA | subj) + (1 | item) | Items may not vary on within-item factors |
| Multi-site study | (1 + condition | subj) + (1 | site) | Site is an additional clustering factor |
RT data in cognitive experiments are positively skewed, bounded below by physiological limits, and often contaminated by outliers. The approach matters.
Apply these criteria before modeling (Ratcliff, 1993; Luce, 1986):
| Criterion | Threshold | Source | |-----------|-----------|--------| | Fast outliers (anticipatory) | < 200 ms | Whelan, 2008; Ratcliff, 1993 | | Slow absolute cutoff | > 2000-3000 ms (task-dependent) | Ratcliff, 1993 | | Within-subject SD trimming | > 3 SD from participant's condition mean | Van Selst & Jolicoeur, 1994 | | Within-subject MAD trimming | > 3 MAD from participant's condition median | Leys et al., 2013 (more robust to skew) |
Task-specific note: For simple RT tasks (e.g., detection), use 100 ms as the fast cutoff (Whelan, 2008). For choice RT tasks (e.g., lexical decision), use 200 ms (Ratcliff, 1993). Always report exclusion rates.
Is your primary interest in RT distributions (not just means)?
|
+-- YES --> Drift Diffusion Model or ex-Gaussian fitting
|
+-- NO --> Choose a modeling approach:
|
+-- Option 1: Log-transform RT, then fit LMM (Gaussian)
| - Pro: Simple, widely understood
| - Con: Back-transformation of means is biased;
| changes the hypothesis being tested
| (Lo & Andrews, 2015)
|
+-- Option 2: Inverse-transform RT (1/RT = speed), then LMM
| - Pro: Often achieves better normality than log
| - Con: Same back-transformation issues as log
| (Ratcliff, 1993)
|
+-- Option 3 (Recommended): Generalized LMM with
Gamma family + identity link
- Pro: Models RT in original units; handles skew
directly; avoids transformation issues
(Lo & Andrews, 2015)
- Con: Computationally slower; may have convergence
issues with complex random effects
Recommended default: Gamma GLMM with identity link (Lo & Andrews, 2015). Report results on the original millisecond scale.
# Recommended RT model (Lo & Andrews, 2015)
glmer(RT ~ condition * group + (1 + condition | subj) + (1 | item),
family = Gamma(link = "identity"), data = d)
| Scenario | Method | Rationale | Source | |----------|--------|-----------|--------| | Small number of planned contrasts (< 5) | No correction or Holm | Planned contrasts based on a priori hypotheses do not require correction if specified before data collection | Rubin, 2021 | | All pairwise comparisons after ANOVA | Tukey HSD | Controls family-wise error for all pairwise comparisons; assumes equal variance | Tukey, 1953 | | Many tests, correlated (e.g., EEG channels) | Cluster-based permutation | Respects spatial/temporal correlation structure | Maris & Oostenveld, 2007 | | Many tests, independent | Bonferroni-Holm | More powerful than Bonferroni; step-down procedure | Holm, 1979 | | Large-scale testing (fMRI voxels, genomics) | FDR (Benjamini-Hochberg) | Controls false discovery rate rather than family-wise error; appropriate when some false positives are tolerable | Benjamini & Hochberg, 1995 | | Exploratory whole-brain fMRI | Cluster-level FWE (with cluster-forming threshold p < 0.001) | Eklund et al. (2016) showed that p < 0.01 cluster-forming threshold inflates false positive rates to ~70% | Eklund et al., 2016 | | Confirmatory ROI analysis in fMRI | Small volume correction (SVC) with FWE | Restricts search space to a priori ROI | Worsley et al., 1996 |
| BF10 Range | Evidence Category | Source | |------------|------------------|--------| | < 1/10 | Strong evidence for H0 | Jeffreys, 1961; Lee & Wagenmakers, 2013 | | 1/10 to 1/3 | Moderate evidence for H0 | Lee & Wagenmakers, 2013 | | 1/3 to 3 | Anecdotal / inconclusive | Lee & Wagenmakers, 2013 | | 3 to 10 | Moderate evidence for H1 | Lee & Wagenmakers, 2013 | | > 10 | Strong evidence for H1 | Lee & Wagenmakers, 2013 |
| Tool | Use Case | Language | |------|----------|----------| | BayesFactor | Standard designs (t-test, ANOVA, correlation, regression) | R | | brms | Complex models (multilevel, non-Gaussian, multivariate) | R (Stan backend) | | JASP | GUI-based Bayesian analysis for standard tests | Standalone | | PyMC | Custom Bayesian models | Python |
Report the exact BF, not just the category (Wagenmakers et al., 2018):
"A Bayesian paired-samples t-test indicated moderate evidence for a difference between conditions, BF10 = 5.3 (default Cauchy prior, r = 0.707)."
Always specify:
APA 7th edition (2020, Section 6.6) requires reporting effect sizes for all primary analyses. The specific measure depends on the test:
| Test | Effect Size | Interpretation Benchmarks | Source | |------|------------|--------------------------|--------| | t-test (between groups) | Cohen's d | 0.2 small, 0.5 medium, 0.8 large | Cohen, 1988 | | t-test (within subjects) | Cohen's d_z or d_av | d_z uses SD of difference scores | Lakens, 2013 | | One-way ANOVA | eta-squared or omega-squared | 0.01 small, 0.06 medium, 0.14 large | Cohen, 1988 | | Factorial ANOVA | partial eta-squared | 0.01 small, 0.06 medium, 0.14 large | Cohen, 1988; Richardson, 2011 | | Mixed-effects model | semi-partial R-squared | No universal benchmarks; report CI | Rights & Sterba, 2019 | | Correlation | r | 0.1 small, 0.3 medium, 0.5 large | Cohen, 1988 | | Chi-square | Cramer's V or phi | Depends on df | Cohen, 1988 |
Domain note: Always report confidence intervals around effect sizes (APA 7th, 2020). Use
effectsize(R) orstatsmodels(Python) for computation. The benchmarks above are Cohen's generic guidelines; paradigm-specific benchmarks are more informative (see../cogsci-power-analysis/references/effect-sizes.md).
Traditional effect sizes are not straightforward for mixed models. Options:
r2glmm or effectsize package (Rights & Sterba, 2019)MuMIn::r.squaredGLMM() (Nakagawa & Schielzeth, 2013)Problem: Analyzing condition means averaged over items, ignoring item variability, fails to generalize beyond the specific stimuli used (Clark, 1973).
Fix: Use mixed-effects models with crossed random effects for subjects and items.
Problem: Selecting voxels/channels/time-windows based on the effect of interest, then testing that same effect (Kriegeskorte et al., 2009). Inflates effect sizes by 2x or more (Vul et al., 2009).
Fix: Use independent localizer, leave-one-out cross-validation, or whole-brain corrected analysis.
Problem: ANOVA on proportion correct violates normality and homogeneity assumptions, especially at ceiling (> 90%) or floor (< 10%) (Jaeger, 2008; Dixon, 2008).
Fix: Use logistic mixed-effects model on binary (correct/incorrect) trial-level data.
Problem: Removing "outlier" participants based on the dependent variable (e.g., excluding subjects whose effects go in the wrong direction) without a priori criteria.
Fix: Define exclusion criteria before data collection. Base exclusions on performance metrics (accuracy below chance, excessive RTs), not on the effect of interest.
Problem: ANOVA on raw RT means violates normality. Condition means conceal distributional differences (Ratcliff, 1993).
Fix: Use Gamma GLMM (Lo & Andrews, 2015) or transform RTs, and supplement with distributional analysis if warranted.
Problem: Cluster-based inference with cluster-forming thresholds more lenient than p < 0.001 (uncorrected) produces unacceptable false positive rates up to 70% (Eklund et al., 2016).
Fix: Use voxel-level threshold of p < 0.001 (uncorrected) as minimum cluster-forming threshold, or use voxel-level FWE/FDR correction.
Problem: A "significant" correlation of r = 0.30 with N = 50 has a 95% CI of [0.02, 0.53] -- the true effect could be near zero (Cumming, 2014).
Fix: Always report bootstrap 95% CI for correlations. Use 10000 bootstrap samples (Efron & Tibshirani, 1993).
Based on APA 7th edition (2020) and Appelbaum et al. (2018):
See references/common-analyses.md for concrete analysis recipes with code patterns.
tools
Convert a GitHub repository or local codebase into a well-structured Claude Code skill with progressive disclosure. Use this skill whenever the user provides a GitHub URL or local repo path and asks to turn it into a skill, create a skill from a repo, or convert a library/tool/framework into reusable skill documentation. Also trigger when users say things like 'make a skill from this repo', 'turn this codebase into a skill', or 'I want a skill for [library name]'.
development
Domain-validated guidance for cortical surface visualization and brain surface rendering of fMRI data using pycortex: data types (Volume, Vertex, Dataset), 2D cortical flatmaps, 3D WebGL brain viewers, volume-to-surface mapping, FreeSurfer/fMRIPrep integration, ROI management, and surface analysis. Use this skill whenever the user mentions pycortex, `import cortex`, cortical surfaces, brain flatmaps, WebGL brain viewers, cortical surface mapping, or wants to visualize neuroimaging data on the cortex, even if they don't explicitly name pycortex.
development
Domain-validated pipeline guidance for EEG/MEG data analysis using MNE-Python: data loading, preprocessing (filtering, ICA, re-referencing), epoching, ERP/ERF computation, time-frequency decomposition, source localization, decoding/MVPA, statistical testing, simulation, and visualization. Use this skill whenever the user works with EEG/MEG/sEEG/ECoG/NIRS/eye-tracking data in Python, mentions MNE, or needs neurophysiological analysis guidance.
development
Step-by-step guidance for contributing a new skill to the NeuroAIHub/awesome_cognitive_and_neuroscience_skills repository via GitHub Pull Request, including SKILL.md format requirements, quality rules, and PR checklist