causal-genomics/mediation-analysis/SKILL.md
Decompose total effects into direct and indirect paths through mediators using mediation, CMAverse 4-way, HIMA/HIMA2 high-dimensional, BAMA, two-step / MVMR mediation, or double-ML medDML. Use when testing whether a molecular phenotype (expression, methylation, protein) mediates a treatment-outcome relationship, decomposing exposure-mediator interaction via VanderWeele 4-way, screening high-dimensional EWAS mediators, or running MR-based mediation when sequential ignorability is implausible.
npx skillsauth add GPTomics/bioSkills bio-causal-genomics-mediation-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: R 4.3+, mediation 4.5.0+, CMAverse 0.1.0+ (GitHub BS1125/CMAverse), HIMA >= 2.3.0 (CRAN), bama 1.3+, causalweight 1.0.5+ (medDML), MVMR 0.4+, TwoSampleMR 0.6+, EValue 4.1+, gesttools 1.3+, ipw 1.0.11+.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_name to verify parameters>= 2.3.0 for the code patterns below; the formula interface hima(formula, data.pheno, data.M, mediator.type, penalty, ...) was introduced in 2.3.0. Users on HIMA 2.2.x must instead call hima_classic() with positional arguments (X=, Y=, M=, COV.XM=, Y.family=) and lose the formula interface; HIMA 2.2.x is NOT API-compatible with the examples here.hima_classic() is still available in 2.3+ for the original Zhang 2016 SIS+MCP pipeline; use it only to reproduce 2016-2021 papers.If code throws an error, introspect the installed package (?hima, args(cmest)) and adapt the example to match the actual API rather than retrying.
"Does expression of GENE_X mediate the SNP-to-disease effect?" -> Decompose the total effect of a treatment (genotype, exposure) on an outcome into direct and indirect paths through one or more mediators, with explicit handling of exposure-mediator interaction, sensitivity to unmeasured confounding, and high-dimensional mediator screening.
mediation::mediate(med_model, out_model, treat='X', mediator='M', boot=TRUE, sims=5000)CMAverse::cmest(...EMint=TRUE, estimation='paramfunc', inference='bootstrap', nboot=1000)HIMA::hima(Y ~ X + covariates, data.pheno, data.M, mediator.type='gaussian', penalty='DBlasso') (modern v2.3+ formula interface)TwoSampleMR with independent instruments OR MVMR::ivw_mvmr for joint direct effectcausalweight::medDML(y, d, m, x)Sequential ignorability (no unmeasured confounder of treatment-mediator, mediator-outcome, treatment-outcome) is the single load-bearing assumption of observational mediation and is fundamentally untestable. Every report should include a sensitivity result (Imai's rho via medsens() or a mediational E-value).
| Method | Framework | Handles E-M interaction | High-D mediators | Min n | Fails when |
|--------|-----------|--------------------------|------------------|-------|------------|
| Baron-Kenny (1986) | Additive regression-based product/difference | No | No | ~100 | Any non-linearity, interaction, or binary outcome; deprecated for causal inference |
| Imai mediation R (Imai 2010 Psychol Methods) | Counterfactual ACME/ADE with bootstrap | Yes (via interaction term in outcome model) | No | ~200 | Sequential ignorability violated; exposure-induced M-Y confounder; rare binary outcome with logistic outcome model |
| VanderWeele 4-way (VanderWeele 2014 Epidemiology 25:749; 2015 OUP) | CDE + PIE + INTref + INTmed decomposition | Native | No | ~300 | Without interaction term reduces to standard mediation; binary outcome needs rare-disease assumption |
| CMAverse (Shi 2021 Epidemiology 32:e20) | 6 estimators: regression (rb), weighting (wb), IORW (iorw), natural effect models (ne), MSM (msm), g-formula (gformula) | Yes | No (single M, or M-vector) | ~300 | Estimator-specific; wb fails with rare exposure; msm needs censoring weights for survival |
| HIMA1 / hima_classic (Zhang 2016 Bioinformatics 32:3150) | SIS screen by beta (M->Y) + MCP penalty | No | Yes (up to ~10k) | ~150 + p>>n | Misses mediators with strong alpha and weak beta; screening-step false-negatives |
| HIMA2 / hima (Perera 2022 BMC Bioinformatics 23:296) | SIS screen by alpha*beta (indirect effect) + MCP | No (linear by default) | Yes | ~150 | Outcome family limited (gaussian/binomial); HIMA-Cox for survival; HIMA-Pois for count |
| HILAMA (Zhang 2025) | High-D mediation with latent confounders | No | Yes (>= 100k) | ~500 | Newer; benchmarks evolving; requires latent-factor specification |
| BAMA (Song 2020 Biostatistics) | Bayesian high-D continuous shrinkage | No | Yes (~5k) | ~200 | Slow MCMC; prior sensitivity for very weak mediators |
| Two-step MR / network MR (Burgess 2017 Eur J Epidemiol 32:377) | IV-based at each step with INDEPENDENT instruments | Implicit (no interaction modeling) | One mediator at a time | Large summary-stat samples | Same SNP used for E and M (violates exclusion); horizontal pleiotropy; Steiger reversal of M->E direction |
| MVMR-mediation (Carter & Sanderson 2021 Eur J Epidemiol 36:465-478) | Total minus direct via MVMR | Implicit | Single mediator | Large GWAS samples for both E and M | Conditional F < 10 for either exposure; correlated instruments |
| medDML (Farbmacher 2022 Econometrics J 25:277) | Double-debiased ML, doubly-robust | Limited (depends on learner) | Moderate (sparsity-friendly) | ~500 | Severe overlap violations; cross-fitting variance with small n |
Methodology evolves; verify against the current CMAverse vignette and the Steen / Vansteelandt natural-effects-model literature before locking analytic choices. Difference-in-coefficients and product-of-coefficients give identical estimates in fully linear-Gaussian models but DIVERGE for any non-linear outcome model (logistic, Cox, Poisson); the counterfactual ACME from mediation::mediate() is the correct quantity for non-linear outcomes.
VanderWeele 4-way explicitly separates effects from exposure-mediator interaction:
Total Effect = CDE + INTref + INTmed + PIE
| Component | Meaning | Active when | |-----------|---------|-------------| | CDE | Controlled direct effect (with mediator fixed at reference level) | E directly affects Y | | INTref | Interaction-reference -- needs interaction AND exposure | E*M interaction with mediator at reference | | INTmed | Mediated interaction -- needs interaction AND exposure AND mediation | E shifts M which then interacts with E | | PIE | Pure indirect effect (older "mediation" quantity) | E shifts M which shifts Y additively |
Without an exposure-mediator interaction term, INTref = INTmed = 0 and the decomposition collapses to CDE + PIE (= ADE + ACME). With interaction present, traditional ACME mixes PIE and INTmed; the 4-way separation is the only framework that disentangles them. Most epidemiology applications include the interaction term and report all four components (Valeri & VanderWeele 2013 Psychol Methods).
| Scenario | Recommended pipeline |
|----------|---------------------|
| Observational, single measured mediator, no plausible E-M interaction, continuous outcome | mediation::mediate() with boot=TRUE, sims=5000; always run medsens() |
| Observational, single mediator, suspected E-M interaction, any outcome family | CMAverse::cmest(..., EMint=TRUE) -> read CDE, PIE, INTref, INTmed |
| Observational, BINARY outcome, rare disease (< 10%) | cmest(yreg='logistic', EMint=TRUE, casecontrol=FALSE) -- OR-based 4-way decomposition is valid under rare-disease |
| Observational, survival outcome | cmest(yreg='coxph') OR HIMA::hima_cox for high-D; report HRs |
| High-D mediators (EWAS, transcriptome-wide), continuous outcome | HIMA::hima(formula, data.pheno, data.M, mediator.type='gaussian', penalty='DBlasso'); report sigcut (FDR threshold, default 0.05) |
| High-D mediators with latent confounding (very-high-D EWAS) | HILAMA (2025) |
| High-D mediators with Bayesian shrinkage (small n, ~5k features) | bama::bama() |
| Strong genetic IVs for exposure available, single mediator with own IVs | Two-step MR with independent instruments + Steiger filter on mediator |
| Both E and M have IVs but instruments are weak / correlated | MVMR-mediation with conditional F > 10 each |
| Observational with rich confounder set, want doubly-robust estimate | causalweight::medDML (double-debiased ML) |
| Longitudinal with time-varying confounding | g-formula via CMAverse::cmest(estimation='gformula') OR gfoRmula package |
| Exposure-induced confounder of M-Y exists | Interventional indirect effects (Vansteelandt & Daniel 2017); CMAverse::cmest(estimation='msm') |
Observational mediation requires three no-unmeasured-confounding assumptions. The third (M-Y unmeasured confounder, after conditioning on E) is the most common violator in genomic mediation because biological confounders (cell composition, batch effects, technical mediators) frequently affect both M and Y.
Trigger: Always, by design.
Mechanism: No statistical test can detect an unmeasured confounder of M-Y. Bootstrap CIs assume the assumption holds; they do NOT propagate uncertainty about it.
Symptom: Significant ACME with no sensitivity reported -> reviewer rejects.
Fix: Report at least one of:
medsens(med_result, rho.by=0.05, sims=1000); the critical rho where ACME crosses 0; |rho_crit| > 0.3 is "reasonably robust" (Imai 2010), |rho_crit| < 0.1 is highly sensitive.EValue::evalues.OLS() for linear outcomes or by-hand from ACME risk ratio bounds.Template sentence for the methods write-up: "We assumed sequential ignorability conditional on {age, sex, ancestry PCs, cell composition, batch, smoking}. Robustness was assessed via Imai rho_crit at the ACME contrast (medsens, sims = 1000) and the mediational E-value on the risk-ratio scale (Smith & VanderWeele 2019 Epidemiology 30:835)."
Quantitative interpretation thresholds:
For high-stakes claims (clinical, drug-target, regulatory submissions) report BOTH rho_crit and the mediational E-value; for exploratory work either alone suffices.
Trigger: A covariate L sits between E and Y, AND is affected by E, AND confounds M-Y.
Mechanism: Standard regression-based mediation cannot adjust for L without blocking part of the indirect effect (collider stratification bias). Adjusting biases CDE; not adjusting biases ACME.
Symptom: Sensitivity to confounder set; ACME flips sign when L is added vs removed.
Operational identification: From the DAG, L is a covariate of M and Y that is also affected by E. Vansteelandt 2009 (Stat Sci 24:471) gives the criterion: if L is adjusted, part of the indirect E -> L -> M -> Y pathway is blocked; if L is not adjusted, L confounds the M-Y leg. Both are wrong under natural-effects; the natural indirect effect is simply not identified.
Fix: Switch to interventional indirect effects (Vansteelandt & Daniel 2017 Epidemiology 28:258), NOT natural indirect effects. Use CMAverse::cmest(estimation='msm') with stabilized inverse-probability weights (yields the randomized-interventional analogue), gfoRmula (parametric g-formula), or randomized indirect effects (Lin 2017 Biometrics 73:1109). The interventional indirect is identified under weaker assumptions than the natural indirect.
Trigger: data.pheno contains factor columns with NA, or formula references columns missing from data.pheno.
Mechanism: HIMA v2.3+ uses a formula interface and constructs the design matrix internally from data.pheno; missing values or unparseable formulas surface as cryptic glmnet errors.
Symptom: Pipeline fails inside hima() with a non-obvious storage.mode or model.matrix error.
Fix: Pre-clean data.pheno (drop NA rows for the variables in the formula; convert factors with factor(); ensure all RHS variables exist as columns). Example:
dat <- na.omit(dat[, c('outcome', 'exposure', 'age', 'sex', 'batch', 'pc1', 'pc2')])
dat$batch <- factor(dat$batch)
result <- hima(outcome ~ exposure + age + sex + batch + pc1 + pc2,
data.pheno=dat, data.M=M_matrix, mediator.type='gaussian')
Trigger: Survival outcome (Surv() on LHS) with mediator.type='compositional' chosen against text mediator panel; or count mediators passed as 'gaussian'.
Mechanism: HIMA v2.3+ auto-detects outcome family from the LHS of formula (continuous, binary, survival, count); mediator.type is set for the mediator data only ('gaussian', 'negbin', 'compositional'). Mismatching mediator-type to the actual mediator distribution biases the screening step.
Symptom: Hazard / rate ratios for indirect effects look implausible; many "significant" mediators fail replication.
Fix: Set mediator.type='gaussian' for continuous (e.g., methylation beta, log-CPM expression), 'negbin' for raw count (RNA-seq), 'compositional' for relative-abundance microbiome. Verify by ?hima in the installed version since the catalogue of mediator types has expanded across releases.
Trigger: sims=100 or sims=500 in early exploration left in for the final report.
Mechanism: ACME CIs from bootstrap have Monte-Carlo error that scales as 1/sqrt(sims); at sims=500 the 95% CI bounds have ~5% MC noise, enough to flip the conclusion at the boundary.
Symptom: Re-running mediate() with a different set.seed() gives substantially different CI bounds.
Fix: sims=1000 minimum for any reported result; sims=5000 for publication; sims=10000 if proximity to zero matters. BCa CIs (boot.ci.type='bca' in mediate()) are slightly more accurate than percentile CIs near zero but require more sims for stability.
Trigger: Same set of SNPs used as instruments for E in step 1 and for M in step 2.
Mechanism: If a SNP affects both E and M, the M-instrument violates exclusion restriction (the SNP-Y association is not exclusively through M). Estimates are biased toward the direct effect.
Symptom: Two-step MR shows large indirect effect; replacing M-instruments with non-overlapping SNPs makes it vanish.
Fix: Apply Steiger filter on the mediator: keep only SNPs where the SNP-M F-statistic exceeds SNP-E F-statistic (or where SNP explains more variance in M than E). For MVMR-mediation, require conditional F > 10 for each exposure independently (Sanderson 2019 IJE 48:713).
Trigger: Binary or survival outcome modeled with logistic / Cox.
Mechanism: Difference = total - direct; product = alpha * beta. Equivalent under linear-Gaussian; diverge under any link function. Counterfactual ACME from mediation::mediate() is the correct quantity; hand-computed product-of-coefficients on logistic output is biased except under rare-disease.
Fix: Report only counterfactual ACME (Imai or CMAverse). For OR-based 4-way decomposition on rare outcomes (<= 10%), Valeri & VanderWeele 2013 formulas apply; for common outcomes use risk-ratio scale or marginal effects rather than ORs.
| Component | Required | |-----------|----------| | ACME estimate + 95% CI (BCa preferred) | Yes | | ADE + 95% CI | Yes | | Total effect | Yes | | Proportion mediated | Yes when total > effect-size threshold | | Bootstrap method + sims | percentile / BCa; min 1000, recommend 5000 | | Sequential ignorability sensitivity | rho_crit (medsens) OR mediational E-value | | Exposure-mediator interaction test | Coefficient + p; 4-way decomposition if significant | | Confounder set justification | DAG description | | Mediator measurement reliability | Cite | | Sample size + missing-data handling | Yes | | Mediator / exposure scale | Standardized? log? raw? |
Reference: AGReMA-Mediation guideline (Lee 2021 BMJ 372:n122) and MacKinnon 2008 Introduction to Statistical Mediation Analysis.
| Pattern | Likely cause | Action |
|---------|--------------|--------|
| Observational ACME significant; MR-mediation null | Unmeasured M-Y confounding inflated observational estimate; OR weak IVs in MR | Re-run observational with medsens(); if rho_crit < 0.1, trust MR null |
| Observational ACME null; MR-mediation significant | Measurement error in M attenuated observational estimate | Trust MR (regression-dilution-free) IF instruments pass Steiger and pleiotropy tests (MR-Egger intercept, MR-PRESSO) |
| Both significant with same sign | Convergent evidence | High-confidence mediation; report effect size from the more-conservative estimate |
| Both significant with opposite signs | At least one is biased; revisit confounder structure and IV assumptions | Do not pool; investigate via cross-method sensitivity |
Operational rule for high-stakes claims (clinical / drug-target mediation): Require (1) significant observational ACME, (2) Imai rho_crit > 0.2 OR mediational E-value > 1.5, (3) directionally consistent MR-mediation result OR documented absence of valid instruments. Single-method mediation claims should be reported as exploratory.
| Pushback | Standard response |
|----------|-------------------|
| "Sequential ignorability?" | Imai rho_crit reported via medsens; mediational E-value reported on the risk-ratio scale |
| "Exposure-induced confounder of M-Y?" | DAG drawn; if L present, switch to CMAverse::cmest(estimation='msm') for interventional indirect effect (Vansteelandt & Daniel 2017) |
| "Why this bootstrap method?" | BCa with sims=5000 for publication; percentile fallback when BCa fails to converge (acceleration estimate unstable at boundary) |
| "Why was MR-mediation not done?" | If valid IVs for E and M exist: two-step MR or MVMR-mediation done (see code below); if not, documented absence of trans-instruments |
| "Mediator measured with error?" | Regression calibration (Carroll 2006 Measurement Error in Nonlinear Models) OR sensitivity analysis assuming reliability r = 0.7 (Valeri & VanderWeele 2014) |
| "Why HIMA2 not BAMA?" | HIMA2 = frequentist + FDR control + faster; BAMA = Bayesian when prior information is available; sample-size justification given against simulation rule-of-thumb |
| "Proportion mediated unstable?" | When |total| < 2*SE(total), proportion-mediated CI is unreliable (denominator near zero); report indirect effect alone with absolute effect size |
| Quantity | Threshold | Source / Rationale |
|----------|-----------|--------------------|
| sims (bootstrap iterations) | >= 1000 exploratory, >= 5000 publication | Imai 2010; MC error scales 1/sqrt(sims) |
| Proportion mediated -- meaningful | > 0.2 | Convention; weak guideline only -- effect size in absolute terms matters more (MacKinnon 2008) |
| Proportion mediated -- "most of the effect" | > 0.5-0.8 | Convention |
| Imai rho_crit -- robust | > 0.3 | Imai 2010 Psychol Methods 15:309 |
| Imai rho_crit -- sensitive | < 0.1 | Same |
| Mediational E-value -- robust | > 2.0 (working convention; the original Smith & VanderWeele 2019 E-value framework does not prescribe a specific cutoff -- magnitude is context-dependent) | Smith & VanderWeele 2019 Epidemiology 30:835 |
| HIMA FDR cutoff | BH FDR < 0.05 | Default; report q-values not raw p |
| MVMR conditional F per exposure | > 10 each | Sanderson 2019 IJE 48:713 |
| Two-step MR -- F for both stages | > 10 each | Burgess weak-instrument convention |
| Sample size -- single-mediator (Imai) | >= 200 for stable bootstrap | Simulation rule-of-thumb |
| Sample size -- HIMA EWAS | >= 150 with p_mediators up to ~10k | Zhang 2016 simulations |
| Rare-outcome cutoff for OR-based 4-way | outcome prevalence <= 10% | Valeri & VanderWeele 2013 |
Goal: Decompose a genotype-disease effect via measured molecular mediator with explicit sensitivity to unmeasured M-Y confounding.
Approach: Fit mediator and outcome models, bootstrap ACME/ADE, run medsens() for Imai rho-based sensitivity.
library(mediation)
med_model <- lm(expression ~ genotype + age + sex + pc1 + pc2 + pc3, data=dat)
out_model <- glm(disease ~ genotype + expression + age + sex + pc1 + pc2 + pc3,
data=dat, family=binomial)
med_result <- mediate(med_model, out_model,
treat='genotype', mediator='expression',
boot=TRUE, sims=5000, boot.ci.type='bca')
summary(med_result)
sens <- medsens(med_result, rho.by=0.05, effect.type='indirect', sims=1000)
summary(sens)
d0, z0, n0, tau.coef slots return ACME, ADE, proportion mediated, total effect.
Goal: Separate CDE, PIE, INTref, INTmed when exposure-mediator interaction is biologically plausible (e.g., gene-environment interaction modifying mediator effect).
Approach: Use CMAverse regression-based estimator with EMint=TRUE; bootstrap CIs.
library(CMAverse)
result_4way <- cmest(
data=dat, model='rb',
outcome='disease', exposure='genotype', mediator='expression',
basec=c('age','sex','pc1','pc2'),
EMint=TRUE,
mreg=list('linear'), yreg='logistic',
astar=0, a=1, mval=list(0),
estimation='paramfunc', inference='bootstrap', nboot=1000
)
summary(result_4way)
CMAverse reports the 4-way decomposition (Vanderweele 2014): for continuous outcomes the components are cde, intref, intmed, pnie (or pie), te, pm; for non-continuous outcomes (logistic / Cox / Poisson) the ratio versions Rcde, Rpnde, Rtnde, Rpnie, Rtnie are reported. When EMint=TRUE, additional proportion-attributable-to-interaction terms (int, pe) are included. Verify column names with summary(result)$results in the installed CMAverse version, since naming has evolved.
Goal: Among thousands of candidate CpG mediators, identify those mediating an exposure-outcome effect with FDR control.
Approach: HIMA v2.3+ uses a formula interface and auto-detects outcome family (Gaussian / binomial / Cox / Poisson). Screening + MCP/DBlasso penalisation + joint significance with BH; the sigcut argument controls the FDR threshold (default 0.05).
library(HIMA)
dat <- na.omit(dat[, c('outcome', 'exposure', 'age', 'sex', 'cell_pc1', 'cell_pc2')])
M_matrix <- as.matrix(beta_values)
result <- hima(
outcome ~ exposure + age + sex + cell_pc1 + cell_pc2,
data.pheno=dat,
data.M=M_matrix,
mediator.type='gaussian', # 'negbin' for count, 'compositional' for microbiome
penalty='DBlasso', # default; alternatives 'MCP', 'SCAD', 'lasso'
scale=TRUE,
sigcut=0.05,
parallel=TRUE, ncore=8, verbose=TRUE
)
# result is a data.frame of significant mediators below sigcut
For survival outcomes wrap the LHS as Surv(time, status); HIMA auto-routes to Cox. The old hima_classic() (Zhang 2016 original) is still exported but screens by beta only and misses mediators with strong alpha + weak beta -- prefer the wrapper hima() unless reproducing a 2016-2021 paper.
For highly-correlated mediators (CpG-island clusters, gene-module co-expression): HIMA uses joint significance with BH-FDR on max(p_alpha, p_beta) and handles correlation only weakly. Within hima(), set penalty='MCP' for stronger correlation handling; alternatively pre-reduce the mediator panel by principal components or by clustering correlated mediators and screening the cluster centroid (VanderWeele & Vansteelandt 2014 Epidemiol Methods 2:95).
When the mediator is measured at multiple timepoints (or exposure varies over time), natural-effects estimands are not identified; switch to one of:
gfoRmula::gformula_continuous_eof(); CMAverse::cmest(estimation='gformula')gesttools::gestSingle() / gestMultiple()ipw::ipwtm() followed by glm(..., weights=sw)Choose based on the experimental structure:
= 3 timepoints required for g-methods to identify time-varying indirect effects
Decision tree:
Two-step code sketch:
library(TwoSampleMR)
exp_E <- extract_instruments('ieu-a-2', clump=TRUE)
m_E <- extract_outcome_data(exp_E$SNP, 'ieu-b-30')
dat_EM <- harmonise_data(exp_E, m_E)
mr_EM <- mr(dat_EM)
exp_M <- extract_instruments('ieu-b-30', clump=TRUE)
exp_M_indep <- exp_M[!exp_M$SNP %in% exp_E$SNP, ]
exp_M_indep <- steiger_filtering(exp_M_indep)
out_M <- extract_outcome_data(exp_M_indep$SNP, 'ieu-a-7')
dat_MY <- harmonise_data(exp_M_indep, out_M)
mr_MY <- mr(dat_MY)
Indirect effect = beta_EM * beta_MY (product of coefficients). CI via delta method or parametric bootstrap of the joint (beta_EM, beta_MY) distribution. Steiger filter on M-instruments is mandatory to ensure the M -> Y direction (not Y -> M).
Goal: Estimate the proportion of a genetic-instrument-identified causal effect that flows through a mediator, using independent IVs for E and (E + M).
Approach: Univariable MR for total E->Y; MVMR for direct E->Y conditional on M; indirect = total - direct via delta-method CI.
library(TwoSampleMR); library(MVMR)
total <- mr_ivw(harmonised_data_E_Y)
mvmr_dat <- format_mvmr(BXGs=cbind(beta_E, beta_M),
BYG=beta_Y, seBXGs=cbind(se_E, se_M), seBYG=se_Y, RSID=snps)
fstat <- strength_mvmr(mvmr_dat, gencov=0)
mvmr_fit <- ivw_mvmr(mvmr_dat)
direct <- mvmr_fit$coef[1, 'Estimate']
direct_se <- mvmr_fit$coef[1, 'Std. Error']
indirect <- total$b - direct
indirect_se <- sqrt(total$se^2 + direct_se^2)
indirect_ci <- indirect + c(-1.96, 1.96) * indirect_se
Require fstat conditional F > 10 for both E and M independently. If fstat < 10, use Q-statistic-adjusted IVW (qhet_mvmr) or report the result as weak-instrument-limited.
Goal: Avoid model misspecification of both mediator and outcome models via cross-fitted ML nuisance estimators.
Approach: causalweight::medDML uses random forests (or other learners) with sample splitting to estimate nuisance parameters; final estimator is doubly robust.
library(causalweight)
result_dml <- medDML(
y=dat$outcome, d=dat$treatment, m=dat$mediator,
x=as.matrix(dat[, covariates]),
trim=0.05, order=1
)
Reports direct, indirect (via mediator), and total effects with influence-function-based standard errors. Robust to non-linearity and interactions; assumes sequential ignorability still.
Goal: Report the minimum strength of an unmeasured M-Y confounder required to nullify the observed indirect effect.
Approach: Convert ACME and its CI to a risk-ratio scale, then apply VanderWeele E-value formula.
library(EValue)
acme_rr <- exp(med_result$d0)
acme_lower_rr <- exp(med_result$d0.ci[1])
evalues.RR(acme_rr, lo=acme_lower_rr, hi=NULL)
For binary outcomes, convert ACME on probability scale to RR; for continuous, use evalues.OLS() with the standardized indirect effect. E-value > 2 indicates a confounder would need >2-fold associations with both M and Y to nullify the indirect effect (Smith & VanderWeele 2019).
| Package | Source | Notes |
|---------|--------|-------|
| mediation | CRAN | install.packages('mediation'); actively maintained (Imai group) |
| CMAverse | GitHub | remotes::install_github('BS1125/CMAverse'); NOT on CRAN; 6 estimators in one interface |
| HIMA | CRAN | install.packages('HIMA'); v2.x renamed hima() to HIMA2 -- verify with ?hima |
| bama | CRAN | install.packages('bama'); Bayesian; slow MCMC |
| causalweight | CRAN | install.packages('causalweight'); medDML for double-ML mediation |
| EValue | CRAN | install.packages('EValue'); for mediational E-values |
| TwoSampleMR | r-universe | See causal-genomics/mendelian-randomization for setup |
| MVMR | r-universe | remotes::install_github('WSpiller/MVMR'); for MVMR-mediation |
| gfoRmula | CRAN | For longitudinal / time-varying confounders |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| Error in storage.mode(x) <- "double" inside hima() | NA in data.pheno columns referenced by formula, or unconverted factors | na.omit(data.pheno) first; ensure all RHS vars in formula are numeric or factor |
| ACME significant, ADE significant, total NOT significant | Suppression / inconsistent mediation | Report transparently; effect partitioning can exceed total in suppression |
| medsens() errors on glm outcome | medsens requires linear OR probit (not logit) outcome | Refit outcome as glm(..., family=binomial(link='probit')) |
| mediate() runs forever with binary outcome | sims=5000 with bootstrap and small n | Use sims=1000 exploratory; verify model converges first; consider parallel via parallel='multicore' |
| CMAverse cmest() reports NaN for pm | Total effect crosses zero -> proportion ill-defined | Report ACME and TE separately; pm is unstable when |TE| is small |
| Different ACME between mediation and CMAverse rb | Default astar/a levels differ; binary mediator handled differently | Set astar=0, a=1 explicitly; for binary mediator pass mval=list(0) |
| HIMA returns zero significant mediators | Screening too aggressive; or no true mediators | Try topN=2*sqrt(n) instead of default; verify with permutation null |
| Two-step MR shows indirect > total | Steiger reversal: M actually causes E; or pleiotropic SNPs | Run MR-Steiger filter; use MR-PRESSO for pleiotropy |
| medDML trim removes most data | Severe positivity violation -- few units with overlapping treatment/mediator distributions | Tighten covariate set; check propensity score distributions |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.