causal-genomics/mendelian-randomization/SKILL.md
Estimate causal effects of an exposure on an outcome from GWAS summary statistics using genetic instruments. Implements IVW (fixed/random), MR-Egger, weighted median/mode, MR-RAPS, CAUSE, GSMR-HEIDI, MR-PRESSO, MVMR, MR-Clust, LCV, and LHC-MR via TwoSampleMR, MendelianRandomization, MR-PRESSO, cause, and lhcMR. Use when testing causal direction between traits, evaluating drug-target effects via cis-pQTL/cis-eQTL, performing multivariable mediation MR, distinguishing causation from correlated horizontal pleiotropy, or producing STROBE-MR-compliant sensitivity batteries.
npx skillsauth add GPTomics/bioSkills bio-causal-genomics-mendelian-randomizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: TwoSampleMR 0.6.0+, MendelianRandomization 0.10+, MR-PRESSO 1.0+, cause 1.2+, MVMR 0.4+, ieugwasr 1.0+, MRlap 0.0.3.2+, coloc 5.2+, mrclust 0.1+, lhcMR 0.0.1+, R 4.4+. Both TwoSampleMR 0.6.0 and ieugwasr 1.0 are the JWT-transition versions; older versions still expect deprecated OAuth.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_name to verify parameters<tool> --version then <tool> --helpIf code throws an error referencing a function that has moved (e.g. ieugwasr::ld_clump vs TwoSampleMR::clump_data) or an OAuth token failure, introspect the installed API and adapt the example rather than retrying.
"Test whether trait X causally affects trait Y from GWAS summary statistics" -> Use genetic variants as instrumental variables (IVs) that satisfy three assumptions (relevance, independence, exclusion restriction) to estimate beta_causal = beta_outcome / beta_exposure under the IV framework (Davey Smith & Ebrahim 2003 IJE 32:1; Burgess Thompson 2017 SAGE textbook). Tool choice is a decision about the regime (one-sample vs two-sample, sparse vs polygenic, drug-target vs polygenic exposure) and the pleiotropy model (balanced, directional InSIDE, correlated horizontal). Wrong tool inflates Type-I error or attenuates true effects in a direction predictable from the bias structure.
TwoSampleMR::mr() orchestrates IVW + Egger + weighted median + weighted mode in one callMendelianRandomization::mr_ivw / mr_egger / mr_median / mr_mbe / mr_conmix per-method API (S4 objects; MR-RAPS is NOT in this package -- use TwoSampleMR::mr_raps() which wraps the GitHub mr.raps)MRPRESSO::mr_presso() global / outlier / distortion testscause::cause() correlated horizontal pleiotropy mixtureMVMR::strength_mvmr() + MVMR::ivw_mvmr() multivariable conditional-F + IVW| Method | Pleiotropy assumption | Min instruments | Strength | Fails when |
|--------|------------------------|-----------------|----------|------------|
| IVW (fixed) | All IVs valid | 2 | Most efficient under no pleiotropy | Any directional or balanced pleiotropy inflates Type-I |
| IVW (random effects) | Balanced + InSIDE | 3 | Standard primary; absorbs heterogeneity into wider SE | Directional pleiotropy biases the point estimate |
| MR-Egger | Directional pleiotropy + InSIDE | 10+ for power | Detects + corrects directional pleiotropy via intercept (Bowden 2015 IJE 44:512) | NOME violated (I^2_GX < 0.9); SIMEX correction required; underpowered <10 SNPs |
| Weighted median | Up to 50% invalid IVs | 3 | Robust to a minority of bad instruments (Bowden 2016 Genet Epidemiol 40:304) | >50% invalid IVs |
| Weighted mode | Zero modal pleiotropy (ZEMPA) | 3 | Robust if the modal estimate is unbiased (Hartwig 2017 IJE 46:1985) | Bimodal pleiotropy; small numbers |
| MR-RAPS | Balanced pleiotropy + weak instruments | 10+ | Profile-score robust to weak-IV + balanced horizontal pleiotropy (Zhao 2020 Ann Stat 48:1742) | Strong directional pleiotropy; CRAN-archived 2025-03-01 |
| CAUSE | Correlated horizontal pleiotropy (CHP) | 100+ sig SNPs | Explicit shared-factor mixture; protects against CHP-driven false positives (Morrison 2020 Nat Genet 52:740) | Sparse polygenic exposures; <100 sig SNPs |
| GSMR + HEIDI-outlier | Outlier removal under InSIDE | 10+ | Alternative outlier detection; integrates with LD reference (Zhu 2018 Nat Commun 9:224) | Requires individual-level LD; HEIDI conservative |
| MR-PRESSO | Outlier-driven horizontal pleiotropy | 4+ | Global / outlier / distortion three-step (Verbanck 2018 Nat Genet 50:693) | Blind to CHP; computationally heavy at large NbDistribution |
| MVMR (IVW) | Conditional independence after measured pleiotropy | 1+ per exposure | Accounts for measured horizontal pleiotropy via multivariable regression (Sanderson 2019 IJE 48:713) | Conditional F < 10 on any exposure |
| MR-Clust | Heterogeneous causal effects (multiple mechanisms) | 30+ | Clusters SNPs by their causal-effect estimate (Foley 2020 Bioinformatics 37:531) | Single causal mechanism; small instrument sets |
| Contamination mixture | Mixture of valid + invalid IVs | 10+ | Profile-likelihood mixture (Burgess 2020 Nat Commun 11:376) | Sparse signal |
| LCV | Genome-wide; distinguishes causation vs genetic correlation | All SNPs | Tests gcp parameter using LDSC-style block jackknife (O'Connor & Price 2018 Nat Genet 50:1728) | Two-trait covariance dominated by a third confounder |
| LHC-MR | Bidirectional + heritable confounder | All SNPs | Joint likelihood over genome-wide markers; estimates both directions + confounder (Darrous 2021 Nat Commun 12:7274) | Computationally heavy; rare-variant trait |
| MRlap | Sample overlap + winner's curse + weak-IV jointly | Genome-wide sumstats | LDSC-scaffolded joint correction (Mounier & Kutalik 2023 Genet Epidemiol 47:314) | LDSC intercept poorly estimated (h^2 < 0.05); non-EUR without matched LD scores |
| Doubly-Ranked MR (DRMR) | Non-linear, non-parametric | 5+ strata | Non-parametric stratification (Tian 2023 PLoS Genet 19:e1010823); replaces residual stratification when linearity fails | Continuous exposures only; needs individual-level data; Hamilton 2023 medRxiv 23293658 shows stratum-specific bias from age/sex |
Methodology evolves; benchmark consensus shifts every 2-3 years. Verify against the current Slob & Burgess 2020 Genet Epidemiol, Burgess 2023 Wellcome Open Res "Guidelines for performing Mendelian randomization" (v3+), and STROBE-MR 2021 reporting standards before locking a method as primary.
| Scenario | Primary method | Sensitivity battery | Why |
|----------|----------------|----------------------|-----|
| Standard two-sample, independent cohorts, polygenic exposure | IVW (random) | Egger + weighted median + MR-PRESSO + MR-RAPS + Steiger | Default; covers balanced, directional, outlier, weak-IV regimes |
| One-sample (e.g. UK Biobank both ends) | IVW with weak-IV-aware (MR-RAPS) | Egger + LCV + jackknife SE | One-sample bias formulas: Bowden 2019 IJE 48:728; F-stat floor shifts to F >= 20; jackknife SE preferred over analytic at one-sample scale; do NOT run exposure GWAS and outcome GWAS on the same individuals then claim two-sample (Hartwig 2021 collider bias); within-stratum MR (e.g. "MR among smokers") risks collider bias from the stratification variable |
| Partial sample overlap (UKB exposure + UKB outcome) | MR-RAPS with overlap correction | Sample-overlap-adjusted IVW (Burgess 2016 Genet Epidemiol 40:597) | Bias is intermediate, proportional to z-score correlation |
| Drug-target / cis-MR (cis-pQTL, cis-eQTL) | IVW restricted to cis window | Colocalization PP.H4 + LD-prune within window | Exclusion restriction relaxed because the protein/transcript directly mediates effect (Schmidt 2020 Nat Commun 11:3255) |
| MVMR for measured pleiotropy (e.g. LDL adjusted for HDL/TG) | MVMR::ivw_mvmr | Conditional F + Q_A heterogeneity | Required when exposures correlate via shared SNPs |
| Mediation MR (X -> M -> Y) | MVMR difference of total vs direct | Two-step MR + product-of-coefficients (Carter 2021 Eur J Epidemiol 36:465) | Network MR; quantifies indirect effect |
| Polygenic exposure with potential CHP (e.g. BMI -> CHD) | CAUSE (primary) + IVW (secondary) | Egger + MR-PRESSO + LCV | CAUSE explicitly models CHP via shared-factor; needs >=100 sig SNPs |
| Binary outcome (e.g. T2D) on linear scale | IVW on log-OR with log-additive coding | All sensitivity on log-OR; report exp(beta) | Linearity of MR estimating equation holds on log-OR not OR |
| Time-to-event (Cox) outcome | IVW on log-HR | Burgess & Labrecque 2018 Eur J Epidemiol 33:947 framework | Non-collapsibility caveats apply |
| Non-linear MR (e.g. alcohol J-curve) | DRMR (Tian 2023) + residual stratification side-by-side | Negative-control outcomes (genotype-vs-sex within strata); Hamilton 2023 limitation cited | Both methods produce stratum-specific bias from age/sex effects (Hamilton 2023 medRxiv 23293658); pre-specify the non-linear hypothesis, do not data-snoop the J-curve, report negative-control sanity checks |
| Single-patient rare disease | Not MR -- use FRASER/DROP outlier framework | See alternative-splicing/outlier-splicing-detection | MR requires summary stats; n=1 is wrong regime |
| Design | Weak-IV bias direction | Reason | |--------|------------------------|--------| | One-sample, F<10 | Toward confounded observational estimate (overestimates causal effect if confounding is in same direction) | Sample correlation between IV-X and IV-Y residuals | | Two-sample non-overlapping, F<10 | Toward null | Independent samples decouple residuals (Burgess 2011 IJE 40:755) | | Two-sample with partial overlap | Intermediate; proportional to overlap fraction and z-score correlation | Burgess 2016 Genet Epidemiol 40:597; correction available |
Operational rule: Whenever both GWAS came from UK Biobank (or any single biobank), treat the analysis as one-sample-equivalent and prefer MR-RAPS as primary. Treating it as "two-sample because separate GWAS files" is a common error and produces overestimates.
MRlap (Mounier & Kutalik 2023 Genet Epidemiol 47:314) jointly corrects three biases that previously required three separate tools: sample overlap, winner's curse, and weak-instrument bias. It builds on an LDSC scaffold (cross-trait LD-score regression intercept estimates the overlap-induced covariance) and reweights the IVW estimate against the analytical bias-correction formula.
remotes::install_github('n-mounier/MRlap') # never on CRAN; bioconductor unsuitable
library(MRlap)
fit <- MRlap(
exposure = gwas_X_df, exposure_name = 'BMI',
outcome = gwas_Y_df, outcome_name = 'T2D',
ld = 'eur_w_ld_chr/', hm3 = 'w_hm3.snplist', # LDSC reference files
MR_threshold = 5e-8, MR_pruning_dist = 500, MR_pruning_LD = 0.05
)
fit$MRcorrection$corrected_effect # overlap + winner's curse + weak-IV corrected
fit$MRcorrection$corrected_effect_se
fit$LDSC$h2_LDSC # heritability sanity check
fit$LDSC$lambda # sample-overlap-induced lambda; ~0 means no overlap
Decision rule -- prefer MRlap when: (a) any sample overlap is suspected, (b) only sumstats are available (no individual-level data for re-running GWAS on disjoint samples), (c) exposure discovery and outcome were both run inside the same biobank (UKB-on-UKB, FinnGen-on-FinnGen). MRlap returns NA / unstable estimates when h^2 < 0.05; in that regime, fall back to Burgess 2016 overlap-corrected IVW plus MR-RAPS for the weak-IV component.
cis-MR restricts instruments to the cis-regulatory window of the gene encoding the protein/transcript exposure (Schmidt 2020 Nat Commun 11:3255), relaxing the exclusion-restriction assumption because the protein product directly mediates the SNP's effect on the outcome. Operational core: extract cis-pQTL/cis-eQTL within +/-500 kb of the gene; clump at r2 < 0.1 (looser than polygenic MR to retain power within a narrow window); require colocalization PP.H4 >= 0.7; flag protein-altering variants (PAV) which can break SomaScan/Olink aptamer/antibody binding rather than reflect biology.
Full drug-target cis-MR workflow including UKB-PPP / deCODE / Fenland pQTL panels, PAV flagging, Olink vs SomaScan replication (~15-30% cross-platform disagreement), and the operational claim ladder lives in causal-genomics/proteome-mr-drug-target. Use that skill for any drug-target nomination.
MR with logistic-GWAS sumstats returns per-allele log-OR on the population-averaged (marginal) scale, NOT the conditional log-OR (Burgess & Labrecque 2018 Eur J Epidemiol 33:947; Burgess 2017 Stat Methods Med Res 26:2333). For rare disease (prevalence < 10%), OR ~= RR ~= HR and the distinction is harmless. For common disease, OR diverges from RR/HR and the MR estimate cannot be back-converted to a conditional effect without strong assumptions; report as "per 1-SD increase in genetically-predicted X, OR for Y = ..." rather than implying an individual-level intervention effect.
Collider bias in case-only or disease-progression cohorts (Hu 2022 IJE 51:1289): conditioning the outcome on disease status (e.g. studying progression among diabetics) opens a collider path between any cause of disease and any cause of progression. MR within affected subsets without explicit adjustment for selection probability is fragile; weight by inverse probability of selection or restrict claims to the unconditioned population.
Trigger: Several SNPs affect the outcome through pathways not via the exposure, in a consistent direction.
Mechanism: IVW is a weighted regression through origin; non-zero mean pleiotropy shifts the slope.
Symptom: Egger intercept p < 0.05 with non-zero estimate; IVW differs from weighted median; MR-PRESSO global test p < 0.05.
Fix: Use Egger (if I^2_GX >= 0.9 -- otherwise SIMEX-correct via the simex package applied to the Egger fit, treating se.exposure as measurement error in beta.exposure); cross-check with weighted median, MR-PRESSO, and CAUSE; report IVW only as one of a panel, never alone. The MendelianRandomization::mr_egger() function accepts distribution='normal' and reports NOME-corrected estimates internally but does NOT expose a SIMEX wrapper.
Trigger: Mean per-instrument F-statistic < 10, or several individual F < 10.
Mechanism: Weak IVs amplify finite-sample correlation between IV-X and IV-Y errors; bias direction depends on overlap regime (see table above).
Symptom: Estimates shift markedly when removing the weakest instruments; one-sample MR estimates much larger than two-sample.
Fix: Compute F per instrument from the EXPOSURE GWAS, not the outcome; exclude F < 10; use MR-RAPS (handles weak IVs by design); for two-sample, also report unweighted IVW (less weak-IV-bias-inflated than weighted in some regimes).
Trigger: Discovery GWAS is the source of both instrument selection and effect-size estimates.
Mechanism: SNPs that just cross 5e-8 in discovery have over-estimated effect sizes (regression toward the mean in independent replication); MR uses inflated beta_X, biasing causal estimate.
Symptom: MR effect shrinks substantially when using effect sizes from an independent replication GWAS.
Fix: (1) Three-sample design (discovery / replication-for-instrument-effect / outcome) where feasible. (2) When sumstats-only: MRlap (Mounier 2023), MR-SimSS (sample-splitting from sumstats), or RIVW (Ma 2023 JASA -- rerandomized IVW) jointly correct winner's curse + weak IVs + overlap. (3) Sadreev 2024 IJE 52:1209 empirical magnitude: 5-15% bias inflation at p < 5e-8; larger at relaxed thresholds.
Trigger: Running MR-Egger with I^2_GX < 0.9.
Mechanism: Egger assumes NO Measurement Error in exposure effect sizes (NOME); when violated, Egger slope is attenuated toward null with reciprocal bias on the intercept.
Symptom: mr_pleiotropy_test() Egger estimate disagrees with weighted median in magnitude but agrees in direction; Isq() function returns <0.9.
Fix: Compute Isq(beta_X, se_X) (Bowden 2016 IJE 45:1961); if <0.9, apply SIMEX correction via simex package or report Egger as exploratory only. The MendelianRandomization package's mr_egger() returns a NOME-aware corrected estimate.
Trigger: Applying steiger_filtering() on traits with unmeasured shared confounders (e.g. SES).
Mechanism: Steiger compares variance explained in exposure vs outcome per SNP; an unmeasured confounder upstream of both produces SNPs that explain more variance in the outcome than the exposure, falsely flagging "reverse causation" (Hemani Tilling 2022 Wellcome Open Res 7:14).
Symptom: Many SNPs flagged as wrong-direction yet biology and prior MR support forward causation.
Fix: Treat Steiger as a heuristic, not gospel; cross-validate direction with bidirectional MR (forward + reverse with independent instrument sets); for known-confounder-rich domains (psychiatric traits, SES proxies) use LCV or LHC-MR instead, which jointly model confounders.
Trigger: SNPs with alleles A/T or C/G near MAF 0.5.
Mechanism: Strand orientation is ambiguous for palindromic SNPs when allele frequencies are intermediate; flipping introduces sign errors that look like pleiotropy.
Symptom: harmonise_data() reports many palindromic SNPs dropped; remaining SNPs show heterogeneity from a handful.
Fix: Default action = 2 (infer from allele frequencies) drops MAF~0.5 palindromes; action = 3 drops ALL palindromes (most conservative); never use action = 1 (assumes forward strand) unless both GWAS are guaranteed to use the same strand convention. Document choice in methods.
| Threshold | Source | Rationale |
|-----------|--------|-----------|
| F-statistic > 10 per instrument | Staiger & Stock 1997 (linear IV) | Heuristic; debated (Burgess 2011 IJE; Zhao 2020 argues 10 is too low for one-sample) |
| Conditional F > 10 per exposure (MVMR) | Sanderson 2019 IJE 48:713 | Total F can be high while conditional F low; per-exposure F is what matters |
| I^2_GX >= 0.9 for Egger | Bowden 2016 IJE 45:1961 | NOME assumption; below this, SIMEX correction required |
| CAUSE >= 100 significant SNPs | Morrison 2020 Nat Genet 52:740 | Mixture model needs signal density for shared-factor estimation |
| Egger >= 10 instruments | Bowden 2015 IJE 44:512 | Power for slope test in weighted regression |
| Sample-overlap z-score correlation | Burgess 2016 Genet Epidemiol 40:597 | Use LDSC bivariate intercept as proxy; correct IVW SE accordingly |
| Clumping r2 < 0.001, 10 Mb window | TwoSampleMR default; matches GWAS LD norms | Polygenic MR; cis-MR uses r2 < 0.1 within window |
| Steiger p < 0.05 | Hemani 2017 PLoS Genet 13:e1007081 | Heuristic; subject to confounder caveat (Hemani Tilling 2022) |
| MR-PRESSO NbDistribution | 1000 exploratory; >= 5000 publication; >= 10000 stringent | Verbanck 2018 Nat Genet 50:693; precision of global p-value scales with NbDistribution |
| STROBE-MR all 20 items | Skrivankova 2021 JAMA 326:1614; BMJ 375:n2233 | Required since 2022 by most epidemiology journals |
| Bonferroni for pheWAS-MR | Standard | Many outcomes; FDR if exploratory |
Goal: Produce a defensible primary IVW estimate plus a full sensitivity battery from two-sample summary statistics.
Approach: Extract genome-wide significant instruments -> clump (local plink preferred) -> extract outcome -> harmonise -> mr -> pleiotropy + heterogeneity + leave-one-out -> Steiger -> MR-PRESSO -> report.
library(TwoSampleMR)
library(ieugwasr)
exposure_raw <- read_exposure_data(
filename = 'exposure_gwas.tsv', sep = '\t',
snp_col = 'SNP', beta_col = 'BETA', se_col = 'SE',
effect_allele_col = 'A1', other_allele_col = 'A2',
eaf_col = 'EAF', pval_col = 'P'
)
exposure_sig <- subset(exposure_raw, pval.exposure < 5e-08) # genome-wide significance
# F-statistic computed from EXPOSURE (Burgess 2011); ratio of squared effect to its variance
exposure_sig$f_stat <- (exposure_sig$beta.exposure / exposure_sig$se.exposure)^2
exposure_sig <- subset(exposure_sig, f_stat >= 10) # Staiger-Stock 1997 weak-IV heuristic
clumped <- ld_clump(
data.frame(rsid = exposure_sig$SNP, pval = exposure_sig$pval.exposure),
clump_r2 = 0.001, clump_kb = 10000, # polygenic MR convention
plink_bin = genetics.binaRies::get_plink_binary(),
bfile = '1kg_EUR/EUR'
)
exposure_dat <- subset(exposure_sig, SNP %in% clumped$rsid)
outcome_dat <- read_outcome_data(
filename = 'outcome_gwas.tsv', snps = exposure_dat$SNP, sep = '\t',
snp_col = 'SNP', beta_col = 'BETA', se_col = 'SE',
effect_allele_col = 'A1', other_allele_col = 'A2',
eaf_col = 'EAF', pval_col = 'P'
)
dat <- harmonise_data(exposure_dat, outcome_dat, action = 2) # infer from EAF; drops MAF~0.5 palindromes
primary <- mr(dat, method_list = c('mr_ivw', 'mr_egger_regression',
'mr_weighted_median', 'mr_weighted_mode'))
heterogeneity <- mr_heterogeneity(dat) # Cochran Q
pleiotropy <- mr_pleiotropy_test(dat) # Egger intercept
loo <- mr_leaveoneout(dat) # influential-SNP check
steiger <- directionality_test(dat) # variance-explained direction
Goal: Detect horizontal-pleiotropy outliers, remove them, and test whether the corrected estimate differs from the uncorrected one (distortion test).
Approach: Three-step framework: global test (presence of pleiotropy), outlier test (per-SNP), distortion test (effect change after outlier removal).
library(MRPRESSO)
presso <- mr_presso(
BetaOutcome = 'beta.outcome', BetaExposure = 'beta.exposure',
SdOutcome = 'se.outcome', SdExposure = 'se.exposure',
OUTLIERtest = TRUE, DISTORTIONtest = TRUE,
data = dat, NbDistribution = 10000, # >= 10000 for publication-grade p-value precision
SignifThreshold = 0.05
)
print(presso$`MR-PRESSO results`$`Global Test`) # any pleiotropy
print(presso$`MR-PRESSO results`$`Distortion Test`) # change after outlier removal
outlier_snps <- which(presso$`MR-PRESSO results`$`Outlier Test`$Pvalue < 0.05 / nrow(dat))
CAUSE (Morrison 2020 Nat Genet 52:740) fits a shared-factor mixture to genome-wide sumstats and compares causal vs sharing-only models by delta-ELPD. Workflow: gwas_merge() -> sample ~1M variants for est_cause_params() -> filter sig SNPs (P < 1e-3) and optionally LD-prune -> cause(X, variants, param_ests). Needs >= 100 sig SNPs. Full annotated example and ELPD interpretation in causal-genomics/pleiotropy-detection.
Goal: Estimate the causal effect of exposure X1 on Y, adjusting for measured pleiotropy via X2.
Approach: Format exposures + outcome into MVMR object; compute conditional F per exposure (>10 required); run multivariable IVW; report Q_A heterogeneity.
library(MVMR)
mvmr_dat <- format_mvmr(
BXGs = cbind(dat$beta.x1, dat$beta.x2),
BYG = dat$beta.y,
seBXGs = cbind(dat$se.x1, dat$se.x2),
seBYG = dat$se.y,
RSID = dat$SNP
)
condF <- strength_mvmr(r_input = mvmr_dat, gencov = 0) # per-exposure conditional F
# condF must be > 10 for EACH exposure (Sanderson 2019); total F is misleading
mv_ivw <- ivw_mvmr(r_input = mvmr_dat)
mv_qa <- qhet_mvmr(r_input = mvmr_dat, pcor = 0) # Q_A heterogeneity
gencov = 0 is valid ONLY if the exposure GWAS samples don't overlap; for overlapping exposures use the bivariate LDSC intercept matrix as gencov. If any conditional F < 10, the IVW point estimate is weak-IV-biased; switch to the Q-minimization estimator: qhet_mvmr(r_input, pcor, CI = TRUE, iterations = 1000) (Sanderson 2021 Stat Med 40:5434), which minimizes Q-statistic heterogeneity rather than weighting by inverse variance and is robust to weak conditional instruments.
exposure_rev <- format_data(outcome_raw, type = 'exposure') # treat former outcome as exposure
outcome_rev <- format_data(exposure_raw, type = 'outcome')
dat_rev <- harmonise_data(exposure_rev, outcome_rev, action = 2)
results_rev <- mr(dat_rev, method_list = 'mr_ivw')
dat_filt <- steiger_filtering(dat) # per-SNP; flags SNPs where variance(Y) > variance(X)
dir_test <- directionality_test(dat) # global; correct_causal_direction == TRUE if forward
Report direction operationally: forward p < 5e-8 with reverse p > 0.05; directionality_test() correct_causal_direction == TRUE with Steiger p < 0.05; point estimate in reverse direction has |effect| substantially smaller than forward (reflecting reverse-instrument-strength asymmetry). Example sentence: "Forward MR showed BMI -> T2D (IVW beta = 0.85, p = 2e-15); reverse MR was null (IVW beta = 0.02, p = 0.61); Steiger directionality test favored the forward direction (p = 3e-7)."
| Pattern | Likely cause | Action |
|---------|--------------|--------|
| IVW sig, Egger null with non-zero intercept | Directional pleiotropy | Report Egger as primary if I^2_GX >= 0.9; otherwise SIMEX-correct |
| IVW sig, weighted median sig, mode null | Mode underpowered or bimodal pleiotropy | Trust the agreement of IVW + median |
| MR-PRESSO distortion test sig | Outliers materially shift estimate | Report PRESSO-adjusted estimate as primary |
| CAUSE sig, IVW sig, same direction | High-confidence causal claim | Report both; emphasize CAUSE rules out CHP |
| CAUSE null, IVW sig, same direction | CHP indistinguishable from causation | Downgrade to "consistent with causation but not separable from CHP" |
| Forward MR sig, reverse MR also sig | Bidirectional causation OR confounded | Use LHC-MR or LCV for genome-wide resolution |
| IVW sig but mean F << 10 in one-sample design | Weak-IV bias toward observational | Re-run with MR-RAPS; report adjusted estimate |
Operational rule for publication: Primary IVW + concordant Egger (or weighted median if NOME violated) + non-significant MR-PRESSO global test + Steiger correct direction = publication-ready evidence. CAUSE concordance is required when the exposure is polygenic and the prior on CHP is high (e.g. BMI -> outcome, lipids -> outcome). Single-method "significant IVW" claims should be downgraded to exploratory.
| Pushback | Standard response | |----------|-------------------| | "Sample overlap between exposure and outcome GWAS?" | LDSC bivariate intercept reported; MRlap or sample-overlap-corrected IVW applied; document overlap fraction | | "Weak instruments (F < 10)?" | F computed from EXPOSURE; per-instrument and mean F reported; MR-RAPS used as sensitivity if mean F borderline | | "Horizontal pleiotropy?" | IVW + Egger + weighted median + weighted mode + MR-PRESSO; if rg > 0.3 also CAUSE (see pleiotropy-detection) | | "Reverse causation?" | Steiger filter applied; bidirectional MR ran; LCV gcp reported if rg > 0.3 | | "Pre-registered?" | OSF protocol filed; STROBE-MR all 20 items reported | | "InSIDE assumption?" | INstrument Strength Independent of Direct Effect -- pleiotropic effects alpha uncorrelated with instrument-exposure effects gamma. Tested via Egger intercept + CHP-aware sensitivity (CAUSE) |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| F-statistic computed from outcome | Reading off beta.outcome / se.outcome | Compute from EXPOSURE; outcome F is meaningless for IV strength |
| All instruments dropped at harmonise | All SNPs palindromic; or allele coding mismatch | Verify A1/A2 conventions match; try action = 3 to drop, not assume |
| Unauthorized / 403 from OpenGWAS | OAuth deprecated May 2024; JWT token required | Generate at api.opengwas.io -> set OPENGWAS_JWT=<token> in ~/.Renviron -> restart R -> verify with ieugwasr::get_opengwas_jwt(). For production, skip the API: use ld_clump_local() with local 1KG bfile (saves rate-limit + auth headaches). |
| Conditional F vs total F confusion in MVMR | Reporting mean(F) not strength_mvmr() per-exposure | Always use MVMR::strength_mvmr() and report each column |
| install.packages("mr.raps") fails | CRAN-archived 2025-03-01 | remotes::install_github('qingyuanzhao/mr.raps'); TwoSampleMR's mr_raps() wrapper internally calls this package |
| MR-PRESSO returns NA p-value | NbDistribution too small; signal too thin | Increase to >= 10000; check that >= 4 SNPs remain after harmonization |
| Egger intercept "highly significant" with 5 SNPs | Underpowered Egger over-fits the slope | Egger needs >= 10 SNPs; below that, intercept is unreliable |
| Sample-overlap correction ignored | Treating UKB-on-UKB as two-sample | Apply Burgess 2016 correction; or use MR-RAPS |
| cause() runs forever | Default model fit on too many SNPs | Filter to sig SNPs (P < 1e-3) before cause(); est_cause_params uses the random subset |
| MAF column missing -> harmonise action=2 silently downgrades | EAF unavailable | Provide EAF or use action = 3 and document the loss |
# CRAN-stable
install.packages(c('remotes', 'MendelianRandomization', 'MVMR', 'coloc'))
# GitHub-only or recently archived
remotes::install_github('MRCIEU/TwoSampleMR') # primary orchestrator
remotes::install_github('MRCIEU/ieugwasr') # OpenGWAS client + local clumping
remotes::install_github('rondolab/MR-PRESSO') # never on CRAN
remotes::install_github('qingyuanzhao/mr.raps') # CRAN-archived 2025-03-01
remotes::install_github('jean997/cause') # depends on mixsqp; suggests Rfast
remotes::install_github('cnfoley/mrclust') # heterogeneity clusters
remotes::install_github('LizaDarrous/lhcMR') # bidirectional + heritable confounder
remotes::install_github('HDTian/DRMR') # doubly-ranked stratification
remotes::install_github('n-mounier/MRlap') # joint overlap + winner's-curse + weak-IV correction
TwoSampleMR::mr_raps() is a thin wrapper that calls mr.raps::mr.raps() under the hood; the GitHub mr.raps install above is therefore required. The MendelianRandomization package does NOT export mr_raps() (verify with ls('package:MendelianRandomization')); only TwoSampleMR offers a MR-RAPS entry point. For local clumping, install plink2 and download a 1KG EUR (or matched-ancestry) reference bfile.
STROBE-MR (Skrivankova 2021 JAMA 326:1614; BMJ 375:n2233): 20-item checklist required by Eur J Epi / Nat Genet / JAMA / Diabetologia since 2022. See causal-genomics/pleiotropy-detection for the per-item table -- that skill is the consolidated reporting reference for the full MR + sensitivity battery.
development
Find restriction enzyme cut sites in DNA sequences using Biopython Bio.Restriction. Search with single enzymes, batches of enzymes, or commercially available enzyme sets. Returns cut positions for linear or circular DNA. Use when finding restriction enzyme cut sites in sequences.
development
Create restriction maps showing enzyme cut positions on DNA sequences using Biopython Bio.Restriction. Visualize cut sites, calculate distances between sites, and generate text or graphical maps. Use when creating or analyzing restriction maps.
development
Analyze restriction digest fragments using Biopython Bio.Restriction. Predict fragment sizes, get fragment sequences, simulate gel electrophoresis patterns, and perform double digests. Use when analyzing restriction digest fragment patterns.
development
Select restriction enzymes by criteria using Biopython Bio.Restriction. Find enzymes that cut once, don't cut, produce specific overhangs, are commercially available, or have compatible ends for cloning. Use when selecting restriction enzymes for cloning or analysis.