skills/43-wentorai-research-plugins/skills/domains/social-science/psychology-research-guide/SKILL.md
Psychological research methods, experimental design, and analysis
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research psychology-research-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Psychology is the scientific study of mind and behavior, spanning cognitive processes, social influence, developmental trajectories, clinical disorders, and neuroscience. The field has undergone a methodological revolution since the replication crisis of the 2010s, with new standards for statistical rigor, pre-registration, transparency, and open science fundamentally reshaping how research is conducted and evaluated.
This guide covers the practical aspects of conducting psychology research in the post-replication-crisis era: experimental design with adequate power, pre-registration, appropriate statistical analysis, effect size reporting, and the tools and platforms that support reproducible psychological science. The focus is on what reviewers and editors at top journals now expect.
Whether you are designing a behavioral experiment, analyzing survey data, conducting a psychometric validation, or reviewing a manuscript, these patterns reflect current best practices in the field.
| Design | Advantages | Disadvantages | When to Use | |--------|-----------|---------------|-------------| | Between-subjects | No carryover effects, simpler | Requires more participants, individual differences | Deception studies, one-shot manipulations | | Within-subjects | More power, fewer participants | Order effects, demand characteristics | Perception, memory, reaction time | | Mixed | Combines benefits | Complex analysis | Treatment x individual difference |
from statsmodels.stats.power import TTestIndPower, FTestAnovaPower
import numpy as np
# Two-sample t-test power analysis
analysis = TTestIndPower()
# Question: "How many participants per group for d=0.5, power=0.80?"
n_per_group = analysis.solve_power(
effect_size=0.5, # Cohen's d (medium effect)
alpha=0.05,
power=0.80,
alternative="two-sided",
)
print(f"Required N per group: {int(np.ceil(n_per_group))}") # 64
# For small effects (d=0.2), which are common after replication
n_small = analysis.solve_power(effect_size=0.2, alpha=0.05, power=0.80)
print(f"Required N per group for d=0.2: {int(np.ceil(n_small))}") # 394
# One-way ANOVA (3 groups)
anova_analysis = FTestAnovaPower()
n_anova = anova_analysis.solve_power(
effect_size=0.25, # Cohen's f (medium)
alpha=0.05,
power=0.80,
k_groups=3,
)
print(f"Required N per group (ANOVA): {int(np.ceil(n_anova))}") # 53
| Measure | Small | Medium | Large | Use For | |---------|-------|--------|-------|---------| | Cohen's d | 0.2 | 0.5 | 0.8 | Group differences | | Pearson r | 0.1 | 0.3 | 0.5 | Correlations | | Cohen's f | 0.1 | 0.25 | 0.4 | ANOVA effects | | eta-squared | 0.01 | 0.06 | 0.14 | ANOVA variance explained | | Odds ratio | 1.5 | 2.5 | 4.0 | Binary outcomes | | Cohen's w | 0.1 | 0.3 | 0.5 | Chi-squared tests |
Important: Post-replication-crisis psychology finds that most real effects are small (d = 0.2-0.4). Design for small effects unless you have strong prior evidence for larger ones.
Pre-registration template (AsPredicted.org format):
1. HYPOTHESES
H1: Participants in the gratitude condition will report higher
life satisfaction (SWLS scores) than those in the control
condition (d >= 0.3).
2. DESIGN
- 2 (gratitude vs. control) between-subjects
- Random assignment via Qualtrics randomizer
3. PLANNED SAMPLE
- N = 200 per condition (400 total)
- Power: 0.90 for d = 0.3 at alpha = 0.05
- Recruitment: Prolific, US residents, 18-65
4. EXCLUSION CRITERIA (stated before data collection)
- Failed attention check (embedded in survey)
- Completion time < 3 minutes or > 30 minutes
- Duplicate IP addresses
5. MEASURED VARIABLES
- DV: Satisfaction With Life Scale (SWLS; Diener et al., 1985)
- Manipulation check: "How grateful do you feel right now?" (1-7)
- Covariates: Age, gender, baseline mood (PANAS)
6. ANALYSIS PLAN
- Primary: Independent samples t-test on SWLS scores
- Secondary: ANCOVA controlling for baseline PANAS-PA
- Exploratory: Moderation by trait gratitude (GQ-6)
7. ANYTHING ELSE
- All deviations from this plan will be labeled as exploratory
- We will report all conditions and all measures
| Platform | Strengths | Journal Integration | |----------|-----------|-------------------| | OSF Registries | Most widely used, free, flexible | Registered Reports at 300+ journals | | AsPredicted.org | Simple, private until you share | Widely accepted | | ClinicalTrials.gov | Required for clinical studies | FDA-mandated | | EGAP | Political science, field experiments | APSR, AJPS |
import pandas as pd
import pingouin as pg
from scipy import stats
# Load data
df = pd.read_csv("experiment_data.csv")
# Step 1: Descriptive statistics by condition
descriptives = df.groupby("condition").agg(
n=("dv", "count"),
mean=("dv", "mean"),
sd=("dv", "std"),
median=("dv", "median"),
).round(3)
# Step 2: Check assumptions
# Normality
for condition in df["condition"].unique():
subset = df[df["condition"] == condition]["dv"]
stat, p = stats.shapiro(subset)
print(f"{condition}: Shapiro-Wilk W={stat:.3f}, p={p:.3f}")
# Homogeneity of variance
levene_stat, levene_p = stats.levene(
df[df["condition"] == "treatment"]["dv"],
df[df["condition"] == "control"]["dv"],
)
# Step 3: Primary analysis with effect size and CI
result = pg.ttest(
df[df["condition"] == "treatment"]["dv"],
df[df["condition"] == "control"]["dv"],
paired=False,
alternative="two-sided",
)
print(result[["T", "dof", "p-val", "cohen-d", "CI95%", "BF10"]])
# Step 4: Bayesian analysis (increasingly expected)
bf10 = float(result["BF10"].values[0])
print(f"Bayes Factor BF10 = {bf10:.2f}")
if bf10 > 10:
print("Strong evidence for H1")
elif bf10 > 3:
print("Moderate evidence for H1")
elif bf10 > 1:
print("Anecdotal evidence for H1")
else:
print("Evidence favors H0")
# One-way ANOVA
aov = pg.anova(dv="score", between="group", data=df, detailed=True)
print(aov)
# Effect size (eta-squared and omega-squared)
print(f"Eta-squared: {aov['np2'].values[0]:.3f}")
# Post-hoc pairwise comparisons with correction
posthoc = pg.pairwise_tukey(dv="score", between="group", data=df)
print(posthoc)
# Mixed ANOVA (between + within)
mixed = pg.mixed_anova(
dv="score", between="group", within="time",
subject="participant_id", data=df_long
)
print(mixed)
# Scale reliability
from pingouin import cronbach_alpha
items = df[["item1", "item2", "item3", "item4", "item5"]]
alpha, ci = cronbach_alpha(items)
print(f"Cronbach's alpha = {alpha:.3f}, 95% CI = [{ci[0]:.3f}, {ci[1]:.3f}]")
# Confirmatory Factor Analysis (using semopy)
from semopy import Model
model_spec = """
factor1 =~ item1 + item2 + item3
factor2 =~ item4 + item5 + item6
"""
model = Model(model_spec)
model.fit(df)
print(model.inspect())
# Fit indices
stats_result = model.calc_stats()
print(f"CFI = {stats_result.loc['CFI', 'Value']:.3f}")
print(f"RMSEA = {stats_result.loc['RMSEA', 'Value']:.3f}")
print(f"SRMR = {stats_result.loc['SRMR', 'Value']:.3f}")
Standard reporting patterns:
t-test:
"Participants in the gratitude condition (M = 5.23, SD = 1.12) reported
significantly higher life satisfaction than those in the control condition
(M = 4.67, SD = 1.08), t(398) = 4.89, p < .001, d = 0.49, 95% CI [0.29, 0.69]."
ANOVA:
"There was a significant main effect of group on performance,
F(2, 297) = 8.43, p < .001, eta-p-squared = .054."
Correlation:
"Life satisfaction was positively correlated with gratitude,
r(198) = .42, p < .001, 95% CI [.30, .53]."
Always include: test statistic, df, p-value, effect size, confidence interval.
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.