skills/analysis/statistics/sem-guide/SKILL.md
Structural equation modeling with latent variables guide
npx skillsauth add wentorai/research-plugins sem-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build, estimate, and evaluate structural equation models (SEM) with latent variables using Python (semopy) and R (lavaan), including confirmatory factor analysis and path analysis.
Structural Equation Modeling is a multivariate statistical framework that combines factor analysis and path analysis to test complex theoretical models involving:
| Component | Description | Diagram Symbol | |-----------|-------------|---------------| | Observed variable | Measured directly | Rectangle | | Latent variable | Inferred from indicators | Oval/circle | | Regression path | Directional relationship | Single-headed arrow | | Covariance | Non-directional association | Double-headed arrow | | Error/residual | Unexplained variance | Small circle with arrow |
CFA tests whether observed variables load onto hypothesized latent factors.
library(lavaan)
# Define the measurement model
# =~ means "is measured by"
cfa_model <- '
# Latent variable definitions
Motivation =~ mot1 + mot2 + mot3 + mot4
SelfEfficacy =~ se1 + se2 + se3
Performance =~ perf1 + perf2 + perf3 + perf4
# Covariances between latent variables (estimated by default in CFA)
'
# Fit the model
fit <- cfa(cfa_model, data = mydata, estimator = "MLR")
# View results
summary(fit, fit.measures = TRUE, standardized = TRUE)
# Key output to examine:
# - Factor loadings (standardized > 0.5 is desirable)
# - Model fit indices (see table below)
# - Modification indices (for model improvement)
modindices(fit, sort = TRUE, minimum.value = 10)
import semopy
import pandas as pd
# Define model in lavaan-like syntax
model_spec = """
Motivation =~ mot1 + mot2 + mot3 + mot4
SelfEfficacy =~ se1 + se2 + se3
Performance =~ perf1 + perf2 + perf3 + perf4
"""
# Fit the model
model = semopy.Model(model_spec)
result = model.fit(data)
# View parameter estimates
print(model.inspect())
# Get fit statistics
stats = semopy.calc_stats(model)
print(stats.T)
After confirming the measurement model, add structural (regression) paths.
sem_model <- '
# Measurement model
Motivation =~ mot1 + mot2 + mot3 + mot4
SelfEfficacy =~ se1 + se2 + se3
Performance =~ perf1 + perf2 + perf3 + perf4
# Structural model (regressions)
# ~ means "is regressed on"
Performance ~ Motivation + SelfEfficacy
SelfEfficacy ~ Motivation
# Optional: define indirect effect
# indirect := a * b
'
fit <- sem(sem_model, data = mydata, estimator = "MLR")
summary(fit, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)
mediation_model <- '
# Measurement model
X =~ x1 + x2 + x3
M =~ m1 + m2 + m3
Y =~ y1 + y2 + y3
# Structural model
M ~ a*X # a path
Y ~ b*M + c*X # b path + direct effect c
# Define indirect and total effects
indirect := a * b
total := c + a * b
'
fit <- sem(mediation_model, data = mydata, se = "bootstrap", bootstrap = 1000)
summary(fit, standardized = TRUE)
# Bootstrap confidence intervals for indirect effect
parameterEstimates(fit, boot.ci.type = "bca.simple", standardized = TRUE)
| Index | Good Fit | Acceptable | What It Measures | |-------|----------|------------|-----------------| | Chi-square (p) | p > 0.05 | Sensitive to N; use with other indices | Exact fit test | | Chi-square/df | < 2 | < 3 | Parsimony-adjusted exact fit | | CFI | > 0.95 | > 0.90 | Comparative fit vs. null model | | TLI | > 0.95 | > 0.90 | CFI adjusted for parsimony | | RMSEA | < 0.06 | < 0.08 | Approximate fit per df | | SRMR | < 0.08 | < 0.10 | Average residual correlation | | AIC/BIC | Lower = better | -- | Model comparison (not absolute) |
# Extract fit measures in lavaan
fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea",
"rmsea.ci.lower", "rmsea.ci.upper", "srmr"))
Reporting template:
The structural equation model demonstrated adequate fit to the data:
chi-square(df) = X.XX, p = .XXX; CFI = .XX; TLI = .XX; RMSEA = .XXX
[90% CI: .XXX, .XXX]; SRMR = .XXX.
# Show top modification indices
mi <- modindices(fit, sort = TRUE)
head(mi, 10)
# Common modifications:
# - Allow error covariances between similarly-worded items
# - Add cross-loadings (if theoretically justified)
# - Remove non-significant paths
# Compare nested models using chi-square difference test
fit1 <- sem(model1, data = mydata) # More constrained
fit2 <- sem(model2, data = mydata) # Less constrained
anova(fit1, fit2) # Chi-square difference test
# For non-nested models, compare AIC/BIC
fitMeasures(fit1, c("aic", "bic"))
fitMeasures(fit2, c("aic", "bic"))
| Issue | Problem | Solution | |-------|---------|----------| | Small sample size | Unstable estimates, poor fit | Minimum N = 200, or 10-20 per parameter | | Too many parameters | Overfitting, non-convergence | Simplify model, use parceling | | Non-normal data | Biased standard errors | Use MLR estimator or bootstrapping | | Ignoring missing data | Biased results | Use FIML (full information maximum likelihood) | | Data-driven respecification | Capitalizing on chance | Cross-validate with holdout sample | | Conflating fit with truth | Good fit does not mean correct model | Consider equivalent/alternative models |
# Check multivariate normality
library(MVN)
mvn(mydata[, c("mot1", "mot2", "mot3", "se1", "se2", "se3")],
mvnTest = "mardia")
# Use robust estimation if non-normal
fit_robust <- sem(sem_model, data = mydata, estimator = "MLR")
tools
10 document processing skills. Trigger: extracting text from PDFs, parsing references, document Q&A. Design: parsing pipelines (GROBID, marker) and structured extraction tools.
documentation
Guide to tldraw for infinite canvas whiteboarding and diagram creation
testing
Create graphical abstracts, schematic diagrams, and scientific illustrations
documentation
Create UML diagrams and architecture visualizations with PlantUML