Version Compatibility

Reference examples tested with: GenomicSEM 0.0.5+ (GitHub GenomicSEM/GenomicSEM), lavaan 0.6-17+, LDSC v1.0.1+ (Python 3; prefer abdenlab/ldsc-python3 v2.0.0 -- belowlab/ldsc v3.0.1 README states the CLI is broken; Docker jtb114/ldsc:latest is the belowlab fallback), baselineLD_v2.2 annotations (alkesgroup.broadinstitute.org/LDSCORE), MTAG 1.0.8+ (Python; JonJala/mtag), R 4.4+.

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('GenomicSEM') then ?ldsc, ?commonfactor, ?usermodel, ?commonfactorGWAS
Python (LDSC, MTAG): <tool>.py -h and inspect the source under ldsc/ or mtag/

GenomicSEM is GitHub-only (never on CRAN). If ldsc() or usermodel() throws an error about lavaan syntax or non-positive-definite covariance, introspect the installed API (getMethod('ldsc')) and adapt rather than retrying.

Genomic SEM

"Model the latent genetic architecture across several correlated GWAS" -> Treat each GWAS as a measured indicator of one or more latent genetic factors and fit a structural equation model to the LDSC-derived genetic covariance matrix S and its sampling covariance V (Grotzinger 2019 Nat Hum Behav 3:513). The framework extends naturally to a multivariate GWAS in which a SNP is regressed on a latent factor (common-factor GWAS), with Q_SNP testing whether the SNP effect is homogeneous across factor loadings. Sample overlap between input GWAS is absorbed by the off-diagonals of V; ignoring V inflates Type-I.

R: GenomicSEM::ldsc() produces the (S, V) covariance pair from munged sumstats
R: GenomicSEM::commonfactor() fits a single-factor CFA across all traits in S
R: GenomicSEM::usermodel() fits an arbitrary lavaan-syntax model
R: GenomicSEM::commonfactorGWAS() runs SNP -> factor multivariate GWAS with Q_SNP
R: GenomicSEM::userGWAS() runs arbitrary multivariate SNP regression with per-path Q_SNP
Python (alternative): mtag.py --sumstats t1,t2,t3 --out mtag_out (multi-trait power boost on individual traits)

Statistical Model Taxonomy

| Method | Latent structure | Min traits | SNP-level test | Strength | Fails when | |--------|------------------|-----------|----------------|----------|------------| | Common-factor CFA (Grotzinger 2019) | Single F loading all traits | 3 | None (model-fit only) | Tests whether shared variance is unidimensional | Heterogeneous architecture; CFI < 0.9; near-zero loadings | | User-specified CFA (usermodel) | Pre-specified lavaan syntax | 3 | None | Confirmatory; arbitrary structure | Misspecified model; identification under-determined | | ESEM | Exploratory rotation; cross-loadings allowed | 6+ | None | When factor count and structure unknown | Few traits; collinear traits; rotation arbitrary | | Common-factor GWAS (commonfactorGWAS) | SNP -> F -> trait1..k | 3 | Wald on F + Q_SNP heterogeneity | Discovers SNPs acting via the common factor; flags Q_SNP outliers | Q_SNP-significant SNPs not interpretable as factor SNPs | | User GWAS (userGWAS) | Arbitrary SNP-path lavaan | 3 | Wald per path + Q_SNP | Tests SNP on any specified path | Highly parameterized models lose power | | Multivariate Wald test | Joint test across SNP -> trait paths | 2+ | Joint chi-square | Boost power when SNP affects multiple traits | Heterogeneous SNP effects collapse joint test | | Stratified GenomicSEM (Grotzinger AD et al 2022 Nat Genet 54:548) | Factor model with sLDSC-partitioned annotations | 3 | Per-annotation factor tau | Localizes heritability of the factor to functional categories | Same sLDSC failure modes (small annotation, collinearity) | | MTAG (Turley 2018 Nat Genet 50:229) | Empirical-Bayes shrinkage across correlated traits | 2 | Per-trait shrunk z-score | Boosts marginal power for any input trait | MaxFDR > 5% indicates heterogeneity violates MTAG assumption |

Methodology evolves; verify the current Grotzinger 2023+ tutorials at github.com/GenomicSEM/GenomicSEM/wiki before locking a method. ESEM rotation choice (geomin vs target rotation) is an active area; report sensitivity to rotation.

MTAG vs GenomicSEM Common-Factor GWAS

Both methods exploit genetic correlation among input GWAS, but their goals and outputs differ.

| Property | MTAG | GenomicSEM commonfactorGWAS | |----------|------|------------------------------| | Output | Per-trait shrunk z-scores | SNP effect on latent factor | | Sample-overlap handling | Bivariate LDSC intercept | Full LDSC sampling-covariance matrix V | | Heterogeneity diagnostic | MaxFDR (Turley 2018) | Q_SNP (Grotzinger 2019) | | Interpretation | "Boosted power for trait k" | "Effect on what the traits share" | | Min traits | 2 | 3 (otherwise factor not identified) | | Best when | Power-boost an individual trait | Common factor hypothesized |

Both depend on accurate sampling covariance. MTAG fails (MaxFDR > 5%) under the same heterogeneity that produces large Q_SNP in GenomicSEM. The two methods should be reported together when the prior on a common factor is non-trivial; agreement increases confidence, disagreement points to architecture-specific SNPs.

Decision Tree by Scenario

| Scenario | Recommended | Why | |----------|-------------|-----| | Multi-trait GWAS power boost for one focal trait | MTAG | Optimized for per-trait marginal power | | Common-factor architecture hypothesized | GenomicSEM commonfactorGWAS | Tests SNP -> factor; reports Q_SNP heterogeneity | | Heterogeneous architecture (>1 latent factor) | ESEM, then confirmatory usermodel | Exploratory first, then confirm | | Confirming a pre-specified factor structure | usermodel with lavaan syntax | Confirmatory factor analysis | | Partition heritability of factor across annotations | Stratified GenomicSEM | Combines sLDSC + factor model | | Mediation in a SEM framework | usermodel with indirect path | Path coefficients + delta-method SE | | Sample overlap unknown or any-overlap suspected | Always use ldsc() output as input | V matrix off-diagonals absorb overlap | | Cross-ancestry common-factor analysis | Run per-ancestry, compare loadings; no published cross-ancestry SEM as of 2026 | Method not yet validated for mixed-ancestry V | | Single biobank for all traits (e.g., UKB only) | GenomicSEM with ldsc(); the V matrix will reflect overlap | Equivalent to one-sample MR -- the V matrix is the correction | | Comparing GenomicSEM and MTAG on the same traits | Run both; compare top hits + heterogeneity | Concordance increases confidence; divergence flags heterogeneity |

Per-Method Failure Modes

Heywood case (negative residual variance)

Trigger: A residual variance estimate is < 0, or a standardized loading exceeds 1.

Mechanism: Empirical underidentification; the genetic covariance matrix S is near-singular OR a trait has near-zero specific variance under the model. The maximum-likelihood / DWLS estimator runs past the boundary of the parameter space.

Symptom: lavaan warning "some estimated lv variances are negative" or "covariance matrix is not positive definite"; standardized loading > 1; non-convergence.

Fix: First, inspect the LDSC S matrix for genetic correlations near 1 (multicollinearity). Drop or merge near-identical traits. Second, constrain the offending residual variance to be non-negative in the lavaan syntax (trait1 ~~ a*trait1; a > 0). Third, verify the V matrix is positive definite via chol(V_LD); if not, the bivariate LDSC inputs disagree on intercept signs and need re-munging. Never re-fit without diagnosing the cause.

Sample overlap mis-specified

Trigger: Using LDSC intercept manually or supplying covariance from non-ldsc() source.

Mechanism: GenomicSEM's ldsc() function returns a list with S (genetic covariance) AND V (sampling covariance of the lower-triangle of S). The V off-diagonals capture sample overlap via cross-trait LDSC intercept. Skipping V and supplying only S treats all inputs as independent samples; Type-I error inflates because the sampling distribution under H0 is wrong.

Symptom: SE on factor loadings far too small; many SNPs significant in common-factor GWAS that don't replicate; comparison to MTAG shows disagreement consistent with overlap.

Fix: Always pass the full output of ldsc() -- both S and V -- to commonfactor(), usermodel(), and commonfactorGWAS(). Never construct S manually from rg estimates.

Q_SNP not reported in commonfactorGWAS

Trigger: Running commonfactorGWAS() and reporting only the factor p-value per SNP.

Mechanism: Q_SNP tests heterogeneity of the SNP's effect across factor loadings (Grotzinger 2019 supplement). A SNP with significant Q_SNP violates the common-factor assumption: its effect is NOT mediated by the factor, and the factor estimate is meaningless for that SNP.

Symptom: Top "common-factor SNPs" are dominated by trait-specific effects; replication in independent cohorts is poor for SNPs with high Q_SNP.

Fix: Always report Q_SNP p-value alongside the factor p-value. Flag SNPs with Q_SNP p < 5e-8 / N_factor_SNPs (Bonferroni for the discovered set) as architecture-violating and exclude from "common-factor SNP" claims. Re-fit those SNPs in userGWAS() with separate paths to each trait.

Trait inclusion under heterogeneous factor structure

Trigger: Forcing a common-factor model on traits that don't share a single latent factor.

Mechanism: When two or more traits load on a different factor than the rest, the single-factor model misfits. lavaan still returns parameter estimates but model fit is poor.

Symptom: CFI < 0.9; RMSEA > 0.08; some standardized loadings near 0 while others near 1; chi-square highly significant even after accounting for N.

Fix: Run ESEM first (commonfactor then usermodel with cross-loadings allowed) to discover structure. If two factors emerge, fit a two-factor usermodel. Drop traits with near-zero loadings on all factors. Document the model search.

MTAG MaxFDR > 5%

Trigger: Running MTAG on traits with low pairwise genetic correlation or with one trait that has a very different architecture.

Mechanism: MTAG assumes a homogeneous variance-covariance structure across SNPs. When heterogeneity dominates, the empirical-Bayes shrinkage can over-claim SNPs in the focal trait. Turley 2018 defines MaxFDR as the maximum estimated false discovery rate under worst-case heterogeneity; > 5% invalidates the published trait-specific summary statistics.

Symptom: MTAG output file reports maxFDR > 0.05; per-trait MTAG hits don't replicate in independent cohorts.

Fix: Check pairwise rg via LDSC; if any pair is < 0.7, MTAG is risky. Drop the most heterogeneous trait and re-run. Alternatively, switch to GenomicSEM's commonfactorGWAS which models heterogeneity explicitly via Q_SNP.

Non-positive-definite V_LD matrix

Trigger: LDSC inputs from different ancestry GWAS, or one trait with very low mean chi-square (< 1.02).

Mechanism: V is the sampling covariance of vech(S); when individual entries of S have huge SE relative to off-diagonal covariance, the resulting V is not positive definite (negative eigenvalues).

Symptom: commonfactor() errors with "matrix is not positive definite" before fitting; eigen(LDSCoutput$V)$values shows negative values.

Fix: Verify per-trait mean chi-square via ldsc() log; below 1.02, exclude that trait. Verify all GWAS are EUR ancestry (or match ancestry of LD scores). Apply nearest-PD smoothing via Matrix::nearPD(V)$mat ONLY as a last resort and document the approximation in methods.

Model Fit Diagnostics

| Index | Acceptable | Good | Source | |-------|-----------|------|--------| | CFI | >= 0.90 | >= 0.95 | Hu & Bentler 1999 Struct Equ Model 6:1 | | TLI / NNFI | >= 0.90 | >= 0.95 | Hu & Bentler 1999 | | RMSEA | <= 0.08 | <= 0.05 | Hu & Bentler 1999 | | SRMR | <= 0.08 | <= 0.05 | Hu & Bentler 1999 | | chi-square p-value | (less informative at large N) | n/a | Penalize for N inflation |

AIC / BIC are used for nested-model comparison (lower is better); only compare nested models fit on the same S.

Quantitative Thresholds

| Threshold | Source | Rationale | |-----------|--------|-----------| | CFI >= 0.95 | Hu & Bentler 1999 | Conventional good fit; SEM literature default | | RMSEA <= 0.05 | Hu & Bentler 1999 | Conventional good fit | | RMSEA 0.05 - 0.08 | Hu & Bentler 1999 | Acceptable; flag as "adequate" not "good" | | Q_SNP p < 0.05 / N_SNP_factor | Grotzinger 2019 Nat Hum Behav 3:513 | Bonferroni for heterogeneity among factor-significant SNPs | | MTAG MaxFDR < 5% | Turley 2018 Nat Genet 50:229 | Above this, MTAG marginal trait results invalid | | Per-trait LDSC mean chi-square > 1.02 | LDSC documentation | Below this, V entries too noisy; factor SE inflated | | Standardized loading 0.3 - 0.9 typical | SEM conventions | < 0.3 trait loads weakly; > 0.95 may indicate over-fit / collinearity | | Min 3 traits for common factor | SEM identification | Single factor with k traits has k(k+1)/2 moments; needs k>=3 to identify |

Standard Workflow

Goal: Fit a common-factor model across correlated GWAS and run a multivariate GWAS on the factor with Q_SNP.

Approach: Munge sumstats -> LDSC for (S, V) -> common-factor CFA -> inspect fit -> prepare SNPs -> common-factor GWAS -> report factor effects with Q_SNP flags.

library(GenomicSEM)

# Step 1: Munge sumstats (one-time; produces .sumstats.gz files)
files <- c('raw/trait1.txt', 'raw/trait2.txt', 'raw/trait3.txt')
hm3 <- 'w_hm3.snplist'  # HapMap3 SNP list
trait_names <- c('trait1', 'trait2', 'trait3')
N <- c(150000, 200000, 175000)
munge(files = files, hm3 = hm3, trait.names = trait_names, N = N)

# Step 2: LDSC produces both S (genetic covariance) and V (sampling covariance)
traits <- c('trait1.sumstats.gz', 'trait2.sumstats.gz', 'trait3.sumstats.gz')
ldsc_results <- ldsc(
    traits = traits,
    sample.prev = c(0.5, 0.5, NA),    # case prevalence; NA for continuous
    population.prev = c(0.05, 0.05, NA),
    ld = 'eur_w_ld_chr/',
    wld = 'eur_w_ld_chr/',
    trait.names = trait_names
)
# ldsc_results$S = genetic covariance; ldsc_results$V = sampling covariance

# Step 3a: Common-factor CFA via DWLS
cf_fit <- commonfactor(covstruc = ldsc_results, estimation = 'DWLS')
print(cf_fit$modelfit)  # CFI, RMSEA, SRMR, chi-square
print(cf_fit$results)   # loadings + SEs

# Step 3b: Alternative -- user-specified two-factor model (>=3 indicators per factor)
# Identification rule: each factor needs >= 3 indicators OR one anchor loading fixed
# to 1 plus factor variance free. A factor with a single indicator is NOT identified.
model_syntax <- '
    F1 =~ NA*trait1 + trait2 + trait3
    F2 =~ NA*trait4 + trait5 + trait6
    F1 ~~ 1*F1
    F2 ~~ 1*F2
    F1 ~~ F2
'
user_fit <- usermodel(covstruc = ldsc_results, model = model_syntax, estimation = 'DWLS')

estimation = 'DWLS' (diagonally weighted least squares) is the default and is required when V is large; 'ML' is faster but assumes a known V and can produce wrong SE under sample overlap.

ESEM (Exploratory Factor Structure)

When the factor structure is unknown, fit usermodel() with all loadings free across all factors, then apply a rotation post-fit. Rotation choices: geomin oblique (default; allows factor correlation), target rotation (Browne 2001 Multivariate Behav Res 36:111; uses a hypothesized loading template), quartimin (orthogonal; assumes factors are uncorrelated).

model_esem <- '
    F1 =~ NA*trait1 + trait2 + trait3 + trait4
    F2 =~ NA*trait1 + trait2 + trait3 + trait4
    F1 ~~ 1*F1
    F2 ~~ 1*F2
    F1 ~~ F2
'
esem_fit <- usermodel(covstruc = ldsc_results, model = model_esem, estimation = 'DWLS')
# Rotate post-fit via GPArotation::GPForth/GPFoblq or lavaan::rotate()

Decision: ESEM for K-factor exploration when structure is unknown; CFA via usermodel() once a structure is confirmed. Report rotation sensitivity (geomin vs target vs quartimin) and treat as exploratory.

Cross-loadings. Cross-loadings (one trait loads on > 1 factor) are common in psychiatric and behavioral GWAS. Brown 2015 Confirmatory Factor Analysis for Applied Research recommends allowing cross-loadings first and using modification indices to guide simplification. Allow a cross-loading when constraining residual variance otherwise forces a Heywood case. Constrain when CFI < 0.9 and modification indices instead suggest a correlated residual between two indicators (which is the more parsimonious fix).

userGWAS for Custom Path Models

userGWAS() fits arbitrary lavaan-syntax SNP regressions and is the right tool when the SNP needs to be tested on multiple paths simultaneously (e.g. factor-mediated effect AND a direct effect on one indicator).

# Test SNP -> F path + SNP -> trait1 direct path simultaneously
model <- '
    F =~ NA*trait1 + trait2 + trait3
    F ~~ 1*F
    F ~ SNP
    trait1 ~ SNP    # direct effect on trait1, partialed out of F
'
user_results <- userGWAS(covstruc = ldsc_results,
                         SNPs = ss,
                         estimation = 'DWLS',
                         model = model,
                         sub = c('F~SNP', 'trait1~SNP'),
                         parallel = TRUE,
                         cores = 8)
# Output columns include: lhs, op, rhs, est, SE, Z, Pvalue, Q_pval, Q_df.
# Q_pval per SNP measures heterogeneity across loadings AFTER conditioning on
# the direct path; remaining Q_SNP signal indicates a third path is needed.

Higher-Order / Bifactor / p-Factor Models

Use case: psychiatric genetics p-factor (Caspi 2014 Clin Psychol Sci 2:119; Grotzinger 2022 Nat Genet 54:548 cross-disorder), cognitive g-factor (de la Fuente 2021 Nat Hum Behav 5:49).

Hierarchical template -- first-order factors load on a single second-order p-factor:

model_pfactor <- '
    # First-order factors
    INT =~ NA*trait_anx + trait_dep + trait_neuro       # internalizing
    EXT =~ NA*trait_adhd + trait_alc + trait_subst       # externalizing
    THT =~ NA*trait_scz + trait_bp                       # thought-disorder
    # Second-order p-factor
    p =~ NA*INT + EXT + THT
    INT ~~ 1*INT
    EXT ~~ 1*EXT
    THT ~~ 1*THT
    p ~~ 1*p
'

Bifactor alternative: p =~ all traits directly, with INT/EXT/THT as orthogonal residual factors. Bifactor typically gives tighter CFI/RMSEA but the substantive interpretation of the residual factors is harder; bifactor is also prone to over-fitting at modest trait counts (Bonifay W & Cai L 2017 Multivariate Behav Res 52:465). Cite Grotzinger 2022 Nat Genet 54:548 and Karlsson Linner R, Mallard TT et al 2021 Nat Neurosci 24:1367 for the canonical psychiatric implementations.

Common-Factor GWAS with Q_SNP

Goal: Identify SNPs that affect the latent factor and flag SNPs whose effect is heterogeneous across loadings.

Approach: Build SNP-by-trait effect-and-SE matrix via sumstats(), then fit the SNP-augmented model genome-wide via commonfactorGWAS(). Report both factor p-value and Q_SNP p-value per SNP.

sumstats() flips effect signs based on the reference panel A1/A2 to enforce consistent allele coding across input GWAS. Required input columns (case-sensitive) are SNP, A1, A2, BETA/Z/OR, SE, P, N, MAF; some are conditional on se.logit / OLS. Silent failures are almost always column-name mismatches (e.g. EA/NEA instead of A1/A2 -> 0% SNPs retained) or a missing N column -> SNPs dropped. Always run head(read.table(file, header=TRUE, nrow=2)) per input before the sumstats() call.

# Prepare per-SNP betas and SEs across all input GWAS
ss <- sumstats(
    files = c('raw/trait1.txt', 'raw/trait2.txt', 'raw/trait3.txt'),
    ref = 'reference.1000G.maf.0.005.txt',
    trait.names = trait_names,
    se.logit = c(TRUE, TRUE, FALSE),     # TRUE if trait is logistic-scale (case-control); FALSE if continuous
    OLS = c(FALSE, FALSE, TRUE),
    linprob = c(FALSE, FALSE, FALSE),
    N = N,
    info.filter = 0.9,                   # standard imputation INFO threshold
    maf.filter = 0.01
)

# GenomicSEM internally detects the OS via Sys.info()[['sysname']] and chooses
# PSOCK (Windows) vs FORK (Linux/Mac) clusters automatically; there is no user
# `Operating=` argument. MPI=TRUE switches to an mpirun-based strategy for
# cluster job submission. parallel=TRUE is the default.
cfgwas <- commonfactorGWAS(
    covstruc = ldsc_results,
    SNPs = ss,
    estimation = 'DWLS',
    parallel = TRUE,
    cores = 8,
    MPI = FALSE
)

# cfgwas columns include rsID/chr/BP/MAF/A1/A2/est/se_c/Z_Estimate/Pval_Estimate (factor effect)
# plus heterogeneity columns Q / Q_df / Q_pval. Q_pval IS the per-SNP heterogeneity
# test (often referred to as Q_SNP in the literature; the column name in the data.frame is Q_pval).
cfgwas$factor_sig <- cfgwas$Pval_Estimate < 5e-08
cfgwas$qsnp_sig <- cfgwas$Q_pval < (0.05 / sum(cfgwas$factor_sig))
cfgwas$factor_only <- cfgwas$factor_sig & !cfgwas$qsnp_sig

The "factor-only" subset (factor-significant AND Q_SNP non-significant) is the publication-grade set of common-factor SNPs.

MTAG Comparison

Goal: Cross-check GenomicSEM common-factor results against MTAG per-trait shrunk z-scores.

Approach: Run MTAG CLI on the same input sumstats; compare top hits with GenomicSEM factor hits. Report MaxFDR.

# MTAG CLI (Python)
python mtag.py \
    --sumstats trait1.txt,trait2.txt,trait3.txt \
    --n_min 0 \
    --out mtag_results
# MTAG uses the signed Z by default; --use_beta_se was disabled upstream (raises a
# RuntimeError since Dec 2021 due to beta-se bugs), so supply a Z column and omit it.

# Check MaxFDR per trait
grep -iE 'max ?fdr' mtag_results.log   # matches both the section header and the 'Max FDR of Trait' value lines
# Each per-trait MTAG file: mtag_results_trait_<k>.txt

If MaxFDR > 0.05 for any trait, MTAG results for that trait are unreliable; GenomicSEM with Q_SNP filtering is the more defensible report.

Stratified GenomicSEM (Partitioned Heritability of Factor)

For partitioning the heritability of the latent factor across functional annotations, use s_ldsc() (stratified LDSC inside GenomicSEM) and pass the multi-annotation output to a stratified model fit.

# Stratified LDSC across baseline + custom annotations
s_results <- s_ldsc(
    traits = traits,
    sample.prev = c(0.5, 0.5, NA),
    population.prev = c(0.05, 0.05, NA),
    ld = 'baselineLD_v2.2.',
    wld = 'weights.hm3_noMHC.',
    frq = '1000G.EUR.QC.',
    trait.names = trait_names
)

# enrich() inventory:
#   params: lavaan syntax of the parameter under enrichment (loading, residual var, or F~~F latent var)
#   fix='regressions': hold regression paths fixed at the genome-wide estimate during stratified fit
#   std.lv=FALSE: do not standardize the latent variance
#   rm_flank=TRUE: drop flanking-window contributions (default)
#   tau=FALSE: use the baseline-annotation S/V matrices (TRUE switches to the V_Tau/S_Tau tau parametrization)
#   base=TRUE: include baseline annotation contributions in the partition
#   toler=NULL: matrix-inversion tolerance (let GenomicSEM choose; supply a small value when S is near-singular)
strat_factor <- enrich(s_covstruc = s_results,
                       model = '',
                       params = 'F =~ trait1',
                       fix = 'regressions',
                       std.lv = FALSE,
                       rm_flank = TRUE,
                       tau = FALSE,
                       base = TRUE,
                       toler = NULL)

The output gives per-annotation enrichment of the factor h2 -- the analog of cell-type S-LDSC for the latent factor (Grotzinger AD et al 2022 Nat Genet 54:548).

Computational Footprint

| Step | Runtime | Hardware | |------|---------|----------| | ldsc() multi-trait sampling covariance | minutes | laptop | | commonfactor() / usermodel() (no SNP loop) | seconds | laptop | | commonfactorGWAS() over 6-8M SNPs | 4-24h depending on cores | cluster recommended | | userGWAS() over 6-8M SNPs with complex path model | 8-48h | cluster recommended | | Stratified GenomicSEM with 50+ annotations | 1-3 days | cluster | | MTAG over 6-8M SNPs | 1-2h | laptop or cluster |

Cluster runs of commonfactorGWAS() / userGWAS() should use MPI=TRUE when submitting via mpirun; GenomicSEM detects the OS internally (via Sys.info()[['sysname']]) and selects FORK (Linux/Mac) vs PSOCK (Windows) cluster types automatically -- there is no Operating= user argument. On a Mac/Windows workstation, reduce cores to the physical-core count to avoid PSOCK fork failures.

Reconciliation: When GenomicSEM and MTAG Disagree

| Pattern | Likely cause | Action | |---------|--------------|--------| | GenomicSEM factor SNP sig, MTAG sig for all traits | Genuine common-factor SNP | Report; high confidence | | GenomicSEM factor SNP sig, MTAG sig in only 1 trait | Q_SNP heterogeneity likely; one-trait-dominant | Check Q_SNP; if sig, this is NOT a factor SNP | | MTAG sig, GenomicSEM factor null, Q_SNP sig | Trait-specific SNP captured by MTAG shrinkage | Report as trait-specific, not common-factor | | Both null but per-trait univariate sig | Power loss from multivariate parameterization | Re-check sample overlap V matrix | | GenomicSEM and MTAG both sig but opposite direction | Sample-overlap mis-specification OR sign error in munging | Re-munge with same allele convention; re-run ldsc() | | MTAG MaxFDR > 5%, GenomicSEM with Q_SNP works | MTAG assumption violated | Prefer GenomicSEM as primary | | One-trait GWAS sig but common-factor not | Trait-specific architecture | Don't force into common-factor frame |

Operational rule for publication: A common-factor SNP claim requires (1) factor p < 5e-8, (2) Q_SNP p > 0.05 / N_factor_SNPs (non-heterogeneous), and (3) replication in an independent set of traits or cohorts. Trait-specific SNPs from MTAG require MaxFDR < 5% for the trait. Reporting only the factor effect without Q_SNP is the most common reviewer-flagged error.

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | commonfactor() complains "S not positive definite" | Genetic correlations near +/-1 among inputs | Drop redundant traits; verify rg < 0.95 pairwise | | Standardized loading > 1 | Heywood case; under-identification | Constrain residual variance >= 0; inspect S for collinearity | | Factor p-value reported, Q_SNP not reported | Default focus is on factor effect | Always report Q_SNP from commonfactorGWAS output | | ldsc() fails with "category not found" | Wrong LD score column names (legacy format) | Use Python 3 LDSC fork; download eur_w_ld_chr/ from alkesgroup | | lavaan says "model not identified" | Too few traits for too many parameters | Need >= 3 traits per factor; constrain factor variance to 1 | | MTAG MaxFDR not in log | Older MTAG version (< 1.0.7) | Update MTAG; MaxFDR reporting added late 2019 | | Singular V matrix on smooth nearPD | One trait has near-zero h2 or mean chi-square < 1.02 | Drop the trait; do not smooth as a fix | | usermodel() slow or fails | Complex syntax + many traits | Simplify model; estimate with estimation = 'DWLS', not 'ML', when V is informative | | Sumstats output has zero overlap with reference | Allele coding mismatch in sumstats() | Check se.logit and OLS settings per trait; align A1/A2 | | GenomicSEM and TwoSampleMR give different rg | TwoSampleMR uses bivariate LDSC; GenomicSEM uses the same S | Match the underlying LDSC reference panel and weights |

Anticipated Reviewer Pushback

| Pushback | Standard response | |----------|-------------------| | "Q_SNP reported?" | Q_pval column reported alongside SNP->factor effect; Bonferroni threshold 5e-8 / N_factor_SNPs applied | | "MaxFDR > 5%?" | MTAG maxFDR reported per trait; > 5% invalidates MTAG for that trait -> GenomicSEM common-factor used instead | | "Sample overlap absorbed?" | Full V matrix from ldsc() is the input to all model fits; S is never constructed manually from pairwise rg estimates | | "Model fit?" | CFI >= 0.95, RMSEA <= 0.08, SRMR <= 0.08 reported; AIC / BIC for nested comparison; chi-square reported but treated as inflated at large N | | "Why a factor model and not MTAG?" | Factor structure tested first; if traits load on a common factor with CFI > 0.95, common-factor GWAS preferred -- explicitly models heterogeneity via Q_SNP | | "Heywood case?" | Negative residual variance constrained >= 0; OR the offending indicator dropped and the model re-specified; the choice is documented in methods | | "Cross-ancestry?" | Run per-ancestry; no validated cross-ancestry V matrix as of 2026; loadings compared qualitatively | | "Why DWLS and not ML?" | ML assumes V is known; DWLS uses the empirical V from ldsc() and is the appropriate estimator under sample overlap |

Tool Installation

# GenomicSEM is GitHub-only
remotes::install_github('GenomicSEM/GenomicSEM')

# Dependencies
install.packages(c('lavaan', 'Matrix', 'gdata'))

# Optional companions
remotes::install_github('MRCIEU/TwoSampleMR')  # for downstream MR using factor GWAS as exposure

For Python tools:

# LDSC python3 fork (GenomicSEM input format). belowlab/ldsc v3.0.1 broke the
# --h2/--rg/--h2-cts CLI per its README; use abdenlab/ldsc-python3 (v2.0.0)
# for a working CLI. Docker jtb114/ldsc:latest is the belowlab fallback.
git clone https://github.com/abdenlab/ldsc-python3.git
cd ldsc-python3 && pip install .   # Poetry project (pyproject.toml); no requirements.txt

# MTAG
git clone https://github.com/JonJala/mtag.git
cd mtag && pip install -r requirements.txt

Pre-downloaded reference files: eur_w_ld_chr/, baselineLD_v2.2.*, w_hm3.snplist, and 1000G allele-frequency files are hosted at alkesgroup.broadinstitute.org/LDSCORE/.

References

Grotzinger AD et al 2019 Nat Hum Behav 3:513 (GenomicSEM, common-factor GWAS, Q_SNP)
Grotzinger AD et al 2022 Nat Genet 54:548 (Stratified GenomicSEM)
Turley P et al 2018 Nat Genet 50:229 (MTAG; MaxFDR)
Bulik-Sullivan B et al 2015 Nat Genet 47:291 (LDSC for genetic covariance)
Bulik-Sullivan B et al 2015 Nat Genet 47:1236 (bivariate LDSC, sample overlap)
Rosseel Y 2012 J Stat Softw 48:1-36 (lavaan package)
Hu LT & Bentler PM 1999 Struct Equ Model 6:1 (CFI / RMSEA cutoffs)
Asparouhov T & Muthen B 2009 Struct Equ Model 16:397 (ESEM framework)
Finucane HK et al 2015 Nat Genet 47:1228 (S-LDSC, foundation for stratified GenomicSEM)
Gazal S et al 2017 Nat Genet 49:1421 (baseline-LD annotations)
Demange PA et al 2021 Nat Genet 53:35 (GenomicSEM GWAS-by-subtraction for noncognitive skills; Q_SNP in practice)
Karlsson Linner R, Mallard TT et al 2021 Nat Neurosci 24:1367 (multivariate externalizing GWAS via GenomicSEM)
de la Fuente J et al 2021 Nat Hum Behav 5:49 (GenomicSEM for cognitive g factor)
Skrivankova VW et al 2021 JAMA 326:1614 (STROBE-MR; relevant when downstream MR uses factor GWAS)

Related Skills

causal-genomics/mendelian-randomization - Use factor-GWAS effect sizes as MR exposure
causal-genomics/genetic-correlation - Bivariate LDSC produces the off-diagonals of the S matrix; GenomicSEM is the multi-trait extension
causal-genomics/heritability-partitioning - LDSC and S-LDSC foundations for stratified GenomicSEM
causal-genomics/colocalization-analysis - Cross-trait colocalization at common-factor loci
causal-genomics/pleiotropy-detection - Q_SNP is a per-SNP pleiotropy diagnostic; sibling concept
causal-genomics/fine-mapping - Resolve factor-significant loci to credible sets
causal-genomics/mediation-analysis - SEM mediation paths overlap with usermodel indirect effects
causal-genomics/transcriptome-wide-association - TWAS on factor sumstats from commonfactorGWAS
population-genetics/association-testing - GWAS sumstats are the input format

Version Compatibility

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('GenomicSEM') then ?ldsc, ?commonfactor, ?usermodel, ?commonfactorGWAS
Python (LDSC, MTAG): <tool>.py -h and inspect the source under ldsc/ or mtag/

Genomic SEM

R: GenomicSEM::ldsc() produces the (S, V) covariance pair from munged sumstats
R: GenomicSEM::commonfactor() fits a single-factor CFA across all traits in S
R: GenomicSEM::usermodel() fits an arbitrary lavaan-syntax model
R: GenomicSEM::commonfactorGWAS() runs SNP -> factor multivariate GWAS with Q_SNP
R: GenomicSEM::userGWAS() runs arbitrary multivariate SNP regression with per-path Q_SNP
Python (alternative): mtag.py --sumstats t1,t2,t3 --out mtag_out (multi-trait power boost on individual traits)

Statistical Model Taxonomy

MTAG vs GenomicSEM Common-Factor GWAS

Both methods exploit genetic correlation among input GWAS, but their goals and outputs differ.

Decision Tree by Scenario

Per-Method Failure Modes

Heywood case (negative residual variance)

Trigger: A residual variance estimate is < 0, or a standardized loading exceeds 1.

Symptom: lavaan warning "some estimated lv variances are negative" or "covariance matrix is not positive definite"; standardized loading > 1; non-convergence.

Sample overlap mis-specified

Trigger: Using LDSC intercept manually or supplying covariance from non-ldsc() source.

Symptom: SE on factor loadings far too small; many SNPs significant in common-factor GWAS that don't replicate; comparison to MTAG shows disagreement consistent with overlap.

Fix: Always pass the full output of ldsc() -- both S and V -- to commonfactor(), usermodel(), and commonfactorGWAS(). Never construct S manually from rg estimates.

Q_SNP not reported in commonfactorGWAS

Trigger: Running commonfactorGWAS() and reporting only the factor p-value per SNP.

Symptom: Top "common-factor SNPs" are dominated by trait-specific effects; replication in independent cohorts is poor for SNPs with high Q_SNP.

Trait inclusion under heterogeneous factor structure

Trigger: Forcing a common-factor model on traits that don't share a single latent factor.

Mechanism: When two or more traits load on a different factor than the rest, the single-factor model misfits. lavaan still returns parameter estimates but model fit is poor.

Symptom: CFI < 0.9; RMSEA > 0.08; some standardized loadings near 0 while others near 1; chi-square highly significant even after accounting for N.

MTAG MaxFDR > 5%

Trigger: Running MTAG on traits with low pairwise genetic correlation or with one trait that has a very different architecture.

Symptom: MTAG output file reports maxFDR > 0.05; per-trait MTAG hits don't replicate in independent cohorts.

Non-positive-definite V_LD matrix

Trigger: LDSC inputs from different ancestry GWAS, or one trait with very low mean chi-square (< 1.02).

Mechanism: V is the sampling covariance of vech(S); when individual entries of S have huge SE relative to off-diagonal covariance, the resulting V is not positive definite (negative eigenvalues).

Symptom: commonfactor() errors with "matrix is not positive definite" before fitting; eigen(LDSCoutput$V)$values shows negative values.

Model Fit Diagnostics

AIC / BIC are used for nested-model comparison (lower is better); only compare nested models fit on the same S.

Quantitative Thresholds

Standard Workflow

Goal: Fit a common-factor model across correlated GWAS and run a multivariate GWAS on the factor with Q_SNP.

Approach: Munge sumstats -> LDSC for (S, V) -> common-factor CFA -> inspect fit -> prepare SNPs -> common-factor GWAS -> report factor effects with Q_SNP flags.

library(GenomicSEM)

# Step 1: Munge sumstats (one-time; produces .sumstats.gz files)
files <- c('raw/trait1.txt', 'raw/trait2.txt', 'raw/trait3.txt')
hm3 <- 'w_hm3.snplist'  # HapMap3 SNP list
trait_names <- c('trait1', 'trait2', 'trait3')
N <- c(150000, 200000, 175000)
munge(files = files, hm3 = hm3, trait.names = trait_names, N = N)

# Step 2: LDSC produces both S (genetic covariance) and V (sampling covariance)
traits <- c('trait1.sumstats.gz', 'trait2.sumstats.gz', 'trait3.sumstats.gz')
ldsc_results <- ldsc(
    traits = traits,
    sample.prev = c(0.5, 0.5, NA),    # case prevalence; NA for continuous
    population.prev = c(0.05, 0.05, NA),
    ld = 'eur_w_ld_chr/',
    wld = 'eur_w_ld_chr/',
    trait.names = trait_names
)
# ldsc_results$S = genetic covariance; ldsc_results$V = sampling covariance

# Step 3a: Common-factor CFA via DWLS
cf_fit <- commonfactor(covstruc = ldsc_results, estimation = 'DWLS')
print(cf_fit$modelfit)  # CFI, RMSEA, SRMR, chi-square
print(cf_fit$results)   # loadings + SEs

# Step 3b: Alternative -- user-specified two-factor model (>=3 indicators per factor)
# Identification rule: each factor needs >= 3 indicators OR one anchor loading fixed
# to 1 plus factor variance free. A factor with a single indicator is NOT identified.
model_syntax <- '
    F1 =~ NA*trait1 + trait2 + trait3
    F2 =~ NA*trait4 + trait5 + trait6
    F1 ~~ 1*F1
    F2 ~~ 1*F2
    F1 ~~ F2
'
user_fit <- usermodel(covstruc = ldsc_results, model = model_syntax, estimation = 'DWLS')

estimation = 'DWLS' (diagonally weighted least squares) is the default and is required when V is large; 'ML' is faster but assumes a known V and can produce wrong SE under sample overlap.

ESEM (Exploratory Factor Structure)

model_esem <- '
    F1 =~ NA*trait1 + trait2 + trait3 + trait4
    F2 =~ NA*trait1 + trait2 + trait3 + trait4
    F1 ~~ 1*F1
    F2 ~~ 1*F2
    F1 ~~ F2
'
esem_fit <- usermodel(covstruc = ldsc_results, model = model_esem, estimation = 'DWLS')
# Rotate post-fit via GPArotation::GPForth/GPFoblq or lavaan::rotate()

userGWAS for Custom Path Models

# Test SNP -> F path + SNP -> trait1 direct path simultaneously
model <- '
    F =~ NA*trait1 + trait2 + trait3
    F ~~ 1*F
    F ~ SNP
    trait1 ~ SNP    # direct effect on trait1, partialed out of F
'
user_results <- userGWAS(covstruc = ldsc_results,
                         SNPs = ss,
                         estimation = 'DWLS',
                         model = model,
                         sub = c('F~SNP', 'trait1~SNP'),
                         parallel = TRUE,
                         cores = 8)
# Output columns include: lhs, op, rhs, est, SE, Z, Pvalue, Q_pval, Q_df.
# Q_pval per SNP measures heterogeneity across loadings AFTER conditioning on
# the direct path; remaining Q_SNP signal indicates a third path is needed.

Higher-Order / Bifactor / p-Factor Models

Use case: psychiatric genetics p-factor (Caspi 2014 Clin Psychol Sci 2:119; Grotzinger 2022 Nat Genet 54:548 cross-disorder), cognitive g-factor (de la Fuente 2021 Nat Hum Behav 5:49).

Hierarchical template -- first-order factors load on a single second-order p-factor:

model_pfactor <- '
    # First-order factors
    INT =~ NA*trait_anx + trait_dep + trait_neuro       # internalizing
    EXT =~ NA*trait_adhd + trait_alc + trait_subst       # externalizing
    THT =~ NA*trait_scz + trait_bp                       # thought-disorder
    # Second-order p-factor
    p =~ NA*INT + EXT + THT
    INT ~~ 1*INT
    EXT ~~ 1*EXT
    THT ~~ 1*THT
    p ~~ 1*p
'

Common-Factor GWAS with Q_SNP

Goal: Identify SNPs that affect the latent factor and flag SNPs whose effect is heterogeneous across loadings.

Approach: Build SNP-by-trait effect-and-SE matrix via sumstats(), then fit the SNP-augmented model genome-wide via commonfactorGWAS(). Report both factor p-value and Q_SNP p-value per SNP.

# Prepare per-SNP betas and SEs across all input GWAS
ss <- sumstats(
    files = c('raw/trait1.txt', 'raw/trait2.txt', 'raw/trait3.txt'),
    ref = 'reference.1000G.maf.0.005.txt',
    trait.names = trait_names,
    se.logit = c(TRUE, TRUE, FALSE),     # TRUE if trait is logistic-scale (case-control); FALSE if continuous
    OLS = c(FALSE, FALSE, TRUE),
    linprob = c(FALSE, FALSE, FALSE),
    N = N,
    info.filter = 0.9,                   # standard imputation INFO threshold
    maf.filter = 0.01
)

# GenomicSEM internally detects the OS via Sys.info()[['sysname']] and chooses
# PSOCK (Windows) vs FORK (Linux/Mac) clusters automatically; there is no user
# `Operating=` argument. MPI=TRUE switches to an mpirun-based strategy for
# cluster job submission. parallel=TRUE is the default.
cfgwas <- commonfactorGWAS(
    covstruc = ldsc_results,
    SNPs = ss,
    estimation = 'DWLS',
    parallel = TRUE,
    cores = 8,
    MPI = FALSE
)

# cfgwas columns include rsID/chr/BP/MAF/A1/A2/est/se_c/Z_Estimate/Pval_Estimate (factor effect)
# plus heterogeneity columns Q / Q_df / Q_pval. Q_pval IS the per-SNP heterogeneity
# test (often referred to as Q_SNP in the literature; the column name in the data.frame is Q_pval).
cfgwas$factor_sig <- cfgwas$Pval_Estimate < 5e-08
cfgwas$qsnp_sig <- cfgwas$Q_pval < (0.05 / sum(cfgwas$factor_sig))
cfgwas$factor_only <- cfgwas$factor_sig & !cfgwas$qsnp_sig

The "factor-only" subset (factor-significant AND Q_SNP non-significant) is the publication-grade set of common-factor SNPs.

MTAG Comparison

Goal: Cross-check GenomicSEM common-factor results against MTAG per-trait shrunk z-scores.

Approach: Run MTAG CLI on the same input sumstats; compare top hits with GenomicSEM factor hits. Report MaxFDR.

# MTAG CLI (Python)
python mtag.py \
    --sumstats trait1.txt,trait2.txt,trait3.txt \
    --n_min 0 \
    --out mtag_results
# MTAG uses the signed Z by default; --use_beta_se was disabled upstream (raises a
# RuntimeError since Dec 2021 due to beta-se bugs), so supply a Z column and omit it.

# Check MaxFDR per trait
grep -iE 'max ?fdr' mtag_results.log   # matches both the section header and the 'Max FDR of Trait' value lines
# Each per-trait MTAG file: mtag_results_trait_<k>.txt

If MaxFDR > 0.05 for any trait, MTAG results for that trait are unreliable; GenomicSEM with Q_SNP filtering is the more defensible report.

Stratified GenomicSEM (Partitioned Heritability of Factor)

# Stratified LDSC across baseline + custom annotations
s_results <- s_ldsc(
    traits = traits,
    sample.prev = c(0.5, 0.5, NA),
    population.prev = c(0.05, 0.05, NA),
    ld = 'baselineLD_v2.2.',
    wld = 'weights.hm3_noMHC.',
    frq = '1000G.EUR.QC.',
    trait.names = trait_names
)

# enrich() inventory:
#   params: lavaan syntax of the parameter under enrichment (loading, residual var, or F~~F latent var)
#   fix='regressions': hold regression paths fixed at the genome-wide estimate during stratified fit
#   std.lv=FALSE: do not standardize the latent variance
#   rm_flank=TRUE: drop flanking-window contributions (default)
#   tau=FALSE: use the baseline-annotation S/V matrices (TRUE switches to the V_Tau/S_Tau tau parametrization)
#   base=TRUE: include baseline annotation contributions in the partition
#   toler=NULL: matrix-inversion tolerance (let GenomicSEM choose; supply a small value when S is near-singular)
strat_factor <- enrich(s_covstruc = s_results,
                       model = '',
                       params = 'F =~ trait1',
                       fix = 'regressions',
                       std.lv = FALSE,
                       rm_flank = TRUE,
                       tau = FALSE,
                       base = TRUE,
                       toler = NULL)

The output gives per-annotation enrichment of the factor h2 -- the analog of cell-type S-LDSC for the latent factor (Grotzinger AD et al 2022 Nat Genet 54:548).

Computational Footprint

Reconciliation: When GenomicSEM and MTAG Disagree

Common Errors

Anticipated Reviewer Pushback

Tool Installation

# GenomicSEM is GitHub-only
remotes::install_github('GenomicSEM/GenomicSEM')

# Dependencies
install.packages(c('lavaan', 'Matrix', 'gdata'))

# Optional companions
remotes::install_github('MRCIEU/TwoSampleMR')  # for downstream MR using factor GWAS as exposure

For Python tools:

# LDSC python3 fork (GenomicSEM input format). belowlab/ldsc v3.0.1 broke the
# --h2/--rg/--h2-cts CLI per its README; use abdenlab/ldsc-python3 (v2.0.0)
# for a working CLI. Docker jtb114/ldsc:latest is the belowlab fallback.
git clone https://github.com/abdenlab/ldsc-python3.git
cd ldsc-python3 && pip install .   # Poetry project (pyproject.toml); no requirements.txt

# MTAG
git clone https://github.com/JonJala/mtag.git
cd mtag && pip install -r requirements.txt

Pre-downloaded reference files: eur_w_ld_chr/, baselineLD_v2.2.*, w_hm3.snplist, and 1000G allele-frequency files are hosted at alkesgroup.broadinstitute.org/LDSCORE/.

References

Grotzinger AD et al 2019 Nat Hum Behav 3:513 (GenomicSEM, common-factor GWAS, Q_SNP)
Grotzinger AD et al 2022 Nat Genet 54:548 (Stratified GenomicSEM)
Turley P et al 2018 Nat Genet 50:229 (MTAG; MaxFDR)
Bulik-Sullivan B et al 2015 Nat Genet 47:291 (LDSC for genetic covariance)
Bulik-Sullivan B et al 2015 Nat Genet 47:1236 (bivariate LDSC, sample overlap)
Rosseel Y 2012 J Stat Softw 48:1-36 (lavaan package)
Hu LT & Bentler PM 1999 Struct Equ Model 6:1 (CFI / RMSEA cutoffs)
Asparouhov T & Muthen B 2009 Struct Equ Model 16:397 (ESEM framework)
Finucane HK et al 2015 Nat Genet 47:1228 (S-LDSC, foundation for stratified GenomicSEM)
Gazal S et al 2017 Nat Genet 49:1421 (baseline-LD annotations)
Demange PA et al 2021 Nat Genet 53:35 (GenomicSEM GWAS-by-subtraction for noncognitive skills; Q_SNP in practice)
Karlsson Linner R, Mallard TT et al 2021 Nat Neurosci 24:1367 (multivariate externalizing GWAS via GenomicSEM)
de la Fuente J et al 2021 Nat Hum Behav 5:49 (GenomicSEM for cognitive g factor)
Skrivankova VW et al 2021 JAMA 326:1614 (STROBE-MR; relevant when downstream MR uses factor GWAS)

Related Skills

causal-genomics/mendelian-randomization - Use factor-GWAS effect sizes as MR exposure
causal-genomics/genetic-correlation - Bivariate LDSC produces the off-diagonals of the S matrix; GenomicSEM is the multi-trait extension
causal-genomics/heritability-partitioning - LDSC and S-LDSC foundations for stratified GenomicSEM
causal-genomics/colocalization-analysis - Cross-trait colocalization at common-factor loci
causal-genomics/pleiotropy-detection - Q_SNP is a per-SNP pleiotropy diagnostic; sibling concept
causal-genomics/fine-mapping - Resolve factor-significant loci to credible sets
causal-genomics/mediation-analysis - SEM mediation paths overlap with usermodel indirect effects
causal-genomics/transcriptome-wide-association - TWAS on factor sumstats from commonfactorGWAS
population-genetics/association-testing - GWAS sumstats are the input format

Adoption

GPTomics/bio-causal-genomics-genomic-sem

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Genomic SEM

Statistical Model Taxonomy

MTAG vs GenomicSEM Common-Factor GWAS

Decision Tree by Scenario

Per-Method Failure Modes

Heywood case (negative residual variance)

Sample overlap mis-specified

Q_SNP not reported in commonfactorGWAS

Trait inclusion under heterogeneous factor structure

MTAG MaxFDR > 5%

Non-positive-definite V_LD matrix

Model Fit Diagnostics

Quantitative Thresholds

Standard Workflow

ESEM (Exploratory Factor Structure)

userGWAS for Custom Path Models

Higher-Order / Bifactor / p-Factor Models

Common-Factor GWAS with Q_SNP

MTAG Comparison

Stratified GenomicSEM (Partitioned Heritability of Factor)

Computational Footprint

Reconciliation: When GenomicSEM and MTAG Disagree

Common Errors

Anticipated Reviewer Pushback

Tool Installation

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-causal-genomics-genomic-sem

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Genomic SEM

Statistical Model Taxonomy

MTAG vs GenomicSEM Common-Factor GWAS

Decision Tree by Scenario

Per-Method Failure Modes

Heywood case (negative residual variance)

Sample overlap mis-specified

Q_SNP not reported in commonfactorGWAS

Trait inclusion under heterogeneous factor structure

MTAG MaxFDR > 5%

Non-positive-definite V_LD matrix

Model Fit Diagnostics

Quantitative Thresholds

Standard Workflow

ESEM (Exploratory Factor Structure)

userGWAS for Custom Path Models

Higher-Order / Bifactor / p-Factor Models

Common-Factor GWAS with Q_SNP

MTAG Comparison

Stratified GenomicSEM (Partitioned Heritability of Factor)

Computational Footprint

Reconciliation: When GenomicSEM and MTAG Disagree

Common Errors

Anticipated Reviewer Pushback

Tool Installation

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis