causal-genomics/fine-mapping/SKILL.md
Resolves GWAS associations to candidate causal variants and credible sets via SuSiE, susie_rss, FINEMAP, CAVIAR, DAP-G, PAINTOR, PolyFun, SuSiEx, MultiSuSiE, and FOCUS. Use when narrowing a GWAS lead SNP to a 95 percent credible set, choosing between in-sample and reference LD, calibrating non-sparse loci with SuSiE-inf or FINEMAP-inf, integrating functional priors via PolyFun, fine-mapping across ancestries with SuSiEx, diagnosing LD mismatch via estimate_s_rss and kriging_rss, handling HLA or long-range LD, or feeding credible sets into coloc.susie for colocalization.
npx skillsauth add GPTomics/bioSkills bio-causal-genomics-fine-mappingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: susieR 0.12.27+, coloc 5.2.3+, FINEMAP 1.4.2+, PolyFun (head of omerwe/polyfun 2024), PAINTOR V3.0, SuSiEx (head of getian107/SuSiEx), DAP-G (head of xqwen/dap), pyfocus 0.8+, R 4.3+, PLINK 1.9 / 2.0.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('susieR') then ?susie_rss to confirm argument names (e.g., prior_weights vs prior_variance semantics)finemap --help, SuSiEx --help, PAINTOR --help, dap-g --help to confirm flagspolyfun.py --helpIf a call throws an error about an argument that no longer exists, introspect the installed function and adapt rather than retrying.
"Narrow my GWAS locus to the variants likely to be causal" -> Fit a sparse Bayesian regression that propagates LD into posterior inclusion probabilities (PIPs) and credible sets, then validate that credible sets correspond to physically reasonable haplotypes given the LD reference.
susieR::susie_rss(z, R, n, L=10) + estimate_s_rss LD diagnosticsusieR::susie(X, y, L=10)finemap --sss --in-files master.z --n-causal-snps 5 --sss-tol 0.001SuSiEx --sst_file=eur.sst,eas.sst --n_gwas=N1,N2 --ref_file=eur.bim,eas.bim --ld_file=eur_ld,eas_ld --pop=EUR,EAS --chr_col=1,1 --snp_col=2,2 --bp_col=3,3 --a1_col=4,4 --a2_col=5,5 --eff_col=6,6 --se_col=7,7 --pval_col=8,8 --chr=<chr> --bp=<start,end> --out_dir=<dir> --out_name=<name> (column-number flags and --ld_file are required; see SuSiEx --help)polyfun.py --compute-h2-L2 -> per-SNP priors -> susie_rss with prior_weights=focus finemap on gene-level Z-scoresFine-mapping is a Bayesian model selection problem; LD is not noise but structured prior information. Most failure modes trace back to one of three issues: (a) LD reference mismatched to the GWAS sample; (b) the sparse-effects prior being wrong for the locus (polygenic background); or (c) too small an L cap. The estimate_s_rss() lambda and kriging_rss() per-SNP diagnostic catch (a) before downstream credible sets are reported.
| Tool | Model | Input | Strength | Fails when | |------|-------|-------|----------|------------| | SuSiE / susie_rss (Wang 2020 JRSSB 82:1273; Zou 2022 PLoS Genet) | Iterative Bayesian sum-of-single-effects (IBSS), variational | Individual-level (X, y) or (z, R, n) | Fast; native PIP + credible sets; pluggable priors; default in modern pipelines | Reference LD mismatched to GWAS sample; locus dominated by polygenic background; >L true effects | | SuSiE-inf / FINEMAP-inf (Cui 2024 Nat Genet 56:162) | SuSiE + infinitesimal random-effect component | (z, R, n) | Calibrated credible sets when locus is non-sparse (polygenic shoulder around a sparse causal); recommended for biobank-scale GWAS | Very small loci with truly sparse architecture (over-conservative); slower convergence | | FINEMAP (Benner 2016 Bioinformatics 32:1493) | Shotgun stochastic search over causal configurations | .z + .ld + .master files | Exact Bayes factors at small k; widely cited | Slow at L > 5; binary install only (christianbenner.com); same LD-mismatch fragility as SuSiE | | CAVIAR / CAVIARBF (Hormozdiari 2014 Genetics 198:497) | Exhaustive enumeration up to k causals | (z, R) | Exact posterior at small k | Combinatorial explosion beyond k=6; legacy method largely superseded by SuSiE | | DAP-G (Wen 2016 AJHG 98:1114) | Deterministic posterior approximation with adaptive scan | SBAMS format; TORUS for enrichment priors | Fast at QTL scale (whole-transcriptome); pairs with TORUS hierarchical priors | SBAMS format is awkward; less ubiquitous tooling | | PAINTOR (Kichaev 2014 PLoS Genet 10:e1004722) | EM with binary functional annotations | (z, R, A) per locus | Locus-level functional priors; multi-trait variant | Single-trait mode often matched by PolyFun + SuSiE; slower than SuSiE | | PolyFun + SuSiE/FINEMAP (Weissbrod 2020 Nat Genet 52:1355) | Stratified LDSC genome-wide -> per-SNP prior_weights | GWAS sumstats + pre-baked baseline-LF | Most powerful single-trait functional prior; PIPs sharpen 1.5-3x | Requires matched-ancestry baseline-LF; runs in two stages | | SuSiEx (Yuan 2024 Nat Genet 56:1841) | Joint cross-ancestry SuSiE; shared causal, population-specific LD | Per-pop sumstats + per-pop LD reference | Smaller credible sets than per-ancestry meta or marginal fine-mapping; principled when causal variants are shared | Trans-ethnic heterogeneity violated (population-specific causals); ancestry must be cleanly assigned | | MultiSuSiE (Tashman 2024 medRxiv / 2025 Nat Genet) | Cross-ancestry SuSiE variant; flexible heterogeneity | Per-pop sumstats + per-pop LD | Similar to SuSiEx; alternative implementation | Same as SuSiEx; newer, less battle-tested | | FOCUS / MA-FOCUS (Mancuso 2019 Nat Genet 51:675) | Probabilistic TWAS fine-mapping over gene models | TWAS Z-scores + gene LD (predicted expression) | Identifies likely causal gene among co-regulated TWAS hits; cross-ancestry MA-FOCUS variant | Requires pre-computed expression weights (e.g., FUSION/PrediXcan); gene-level rather than variant-level inference |
Methodology evolves; verify the latest susieR vignette and the SuSiE-inf paper before locking on a single method. Wang Lab maintains susieR; the IBSS algorithm is stable but argument semantics (e.g., prior_weights vs prior_variance) have changed across versions.
| Scenario | Recommended workflow | Why |
|----------|---------------------|-----|
| Individual-level genotypes available (UKB, in-house cohort) | susie(X, y, L=10) | In-sample LD is exact; no mismatch fragility |
| Summary statistics only, ancestry matches reference panel | susie_rss(z, R, n, L=10) + estimate_s_rss diagnostic | Standard external-LD pattern; verify lambda < 0.05 |
| Single-locus EUR GWAS, sparse architecture | susie_rss with L=10, baseline functional priors optional | Most-common setting; SuSiE default works |
| Locus with strong polygenic shoulder (biobank scale) | SuSiE-inf (Cui 2024) | Adds infinitesimal component; calibrates non-sparse PIPs |
| Multi-ancestry GWAS (EUR + EAS + AFR) | SuSiEx with per-pop sumstats and LD | Joint inference shrinks credible sets; per-ancestry meta loses LD information |
| Locus with > 5 expected independent signals (HLA, lipid loci) | susie_rss with L=20-30 | Default L=10 caps signal count; HLA needs extension |
| TWAS hits with co-regulated genes | FOCUS / MA-FOCUS | Variant-level fine-mapping cannot distinguish co-regulated gene candidates |
| Want functional priors (coding, conserved, regulatory) | PolyFun -> susie_rss with prior_weights | Genome-wide SLDSC priors sharpen PIPs more than locus-level annotations |
| QTL fine-mapping (eQTL, sQTL, caQTL) at transcriptome scale | DAP-G + TORUS OR susie_rss per gene | DAP-G is built for QTL throughput; SuSiE works per gene |
| Low-N QTL (GTEx tissue panel, N < 1000) | susie_rss with coverage = 0.9 (or 0.8); document choice | Default 0.95 returns very wide credible sets at low power; report the relaxed coverage explicitly in methods |
| HLA region (chr6:28-34 Mb) or chr8 inversion | Specialized workflow: stratify haplotypes; consider HLA-specific imputation; or exclude | LD structure is too complex; standard methods unreliable |
| Cross-feed into colocalization | susie_rss -> coloc.susie() | Modern coloc operates on credible sets, not single SNPs |
Goal: Detect LD reference mismatch before reporting credible sets.
Approach: estimate_s_rss() quantifies the global Z-score / LD inconsistency as a scalar; kriging_rss() identifies individual SNPs whose Z-scores are inconsistent with the LD reference (typically genotyping errors, strand flips, or wrong reference panel).
library(susieR)
s_hat <- estimate_s_rss(z = z_scores, R = ld_matrix, n = N)
# s_hat is the inferred scale of LD inconsistency.
# Source: susieR vignette "Diagnostic for summary statistic"; Zou 2022 PLoS Genet.
# Rule of thumb: s_hat < 0.05 acceptable; 0.05-0.10 marginal; > 0.10 refit or change LD reference.
cond_z <- kriging_rss(z = z_scores, R = ld_matrix, n = N)
# cond_z$conditional_dist returns per-SNP expected vs observed z; flag |z_obs - z_exp| > 3
# Common cause: strand flip, allele coding mismatch, or single-SNP imputation error.
# If diagnostic fails: refit with explicit scale parameter to absorb LD mismatch
fit <- susie_rss(z = z_scores, R = ld_matrix, n = N, L = 10, estimate_residual_variance = TRUE)
Skipping this block is the dominant cause of irreproducible fine-mapping. Always run before reporting credible sets.
Trigger: External LD matrix from 1000 Genomes / UK Biobank reference used for a GWAS conducted on a different cohort or ancestry mix.
Mechanism: Z-scores reflect the GWAS sample's LD; the reference R does not. The susie_rss likelihood depends on z' R^{-1} z being consistent with the modeled effects, and inconsistency manifests as spurious credible sets containing tag SNPs from the reference but not from the discovery cohort.
Symptom: estimate_s_rss() lambda > 0.05; kriging_rss() flags many SNPs with |z_obs - z_exp| > 3; credible sets contain physically distant SNPs (anti-correlated in LD with the lead) or include all SNPs at the locus.
Fix: Use in-sample LD whenever the cohort genotypes are accessible (compute with plink --r2 square on the GWAS samples themselves). When only summary statistics are available, ancestry-stratify the LD reference exactly (e.g., 1000G EUR FIN+CEU+GBR+IBS+TSI for a Northern European GWAS, not full EUR). For mixed-ancestry GWAS, fine-map per ancestry then meta-analyze, or move to SuSiEx.
Trigger: Locus with one strong signal plus hundreds of weakly associated SNPs (polygenic shoulder); typical at biobank scale.
Mechanism: Vanilla SuSiE assumes a sparse sum-of-single-effects prior. With polygenic background, the model misallocates effects, producing inflated credible sets or many small spurious ones. Cui 2024 (Nat Genet 56:162) showed PIPs from SuSiE in this regime are systematically miscalibrated.
Symptom: Many small credible sets (5-15 per locus); replication in independent cohorts fails for non-lead credible sets; PIP distribution has a heavy tail.
Fix: Use SuSiE-inf or FINEMAP-inf (Cui 2024). These augment the sum-of-single-effects with an infinitesimal random-effect component that absorbs polygenic background. Source: github.com/FinucaneLab/fine-mapping-inf.
Trigger: Locus with > 5 independent signals (HLA region, APOC1/APOE, LPA, IL6R region for some traits).
Mechanism: SuSiE assumes at most L independent effects. When the true number exceeds L, some signals are absorbed into existing components, distorting PIPs and credible sets for the captured signals.
Symptom: length(fit$sets$cs) equals L (all L slots used); credible set purity for higher-indexed sets is low (fit$sets$purity[,'min.abs.corr'] < 0.5); fits with larger L change top-PIP variants.
Fix: Increase L iteratively (L=10 -> 20 -> 30) until length(fit$sets$cs) < L (susieR auto-prunes unsupported effects so the returned CS count is the effective L). For HLA, start at L=30. The cost is mostly computational, not statistical: SuSiE prunes unused slots, so L=30 is safe when L=10 was right.
Trigger: Passing PolyFun output to susie_rss with prior_variance=polyfun_priors (wrong argument).
Mechanism: prior_variance in susie_rss is a single scalar (or vector of length L) for the per-effect variance, NOT a per-SNP probability. prior_weights is the per-SNP causal probability vector (sums to ~1). Passing PolyFun's per-SNP prior to prior_variance is silently accepted but applies a numerically nonsensical per-effect variance.
Symptom: PIPs nearly identical to the uniform-prior fit; functional annotations appear to have no effect.
Fix: Use prior_weights = polyfun_priors$snpvar. Verify with ?susie_rss in the installed version. Reference: github.com/omerwe/polyfun README, Weissbrod 2020 supplementary methods.
Trigger: Reporting per-variant PIP without distinguishing "in credible set" from "high PIP".
Mechanism: The 95 percent credible-set guarantee is P(causal variant in set) >= 0.95. Per-variant PIPs within a set do not necessarily sum to 1 across all variants, and PIPs across overlapping sets can double-count posterior mass.
Symptom: Reporting "the top PIP variant" when the credible set is wide (size > 50); claiming a single variant is causal when the set contains 30 high-LD SNPs.
Fix: Always report (a) number of credible sets, (b) size of each set, (c) purity (fit$sets$purity[,'min.abs.corr']), (d) the top PIP variant within the set as the candidate lead. The credible set is the unit of inference; the top PIP variant is a candidate, not a conclusion.
Trigger: Multi-ancestry meta-analyzed GWAS, then susie_rss with EUR LD.
Mechanism: Meta-analysis z-scores reflect a weighted mix of population LD structures; no single-population LD matrix matches.
Fix: Move to SuSiEx (joint cross-ancestry SuSiE; Yuan 2024). Per-ancestry fine-mapping followed by manual merging loses the shared-causal-variant information that SuSiEx exploits.
Trigger: Passing n = N_total to susie_rss() for case-control GWAS derived from logistic regression.
Mechanism: susie_rss expects the effective sample size that determined the standard errors. For case-control logistic regression, Neff = 4 / (1/Ncase + 1/Ncontrol); when cases are rare, total N can exceed Neff by 25x or more. Passing Ntotal rescales z-scores into a regime SuSiE never sees and makes the implied prior variance wrong.
Symptom: PIPs systematically biased; credible sets either too narrow (PIPs collapse to a single SNP that is not robust) or too wide (PIPs flatten); replication poor; sometimes z-score scale warnings from susieR.
Fix: Neff = 4 / (1/Ncase + 1/Ncontrol). Example: Ncase=5000, Ncontrol=495000 -> Neff ~= 19,800 (NOT 500,000). For quantitative traits from linear regression, n = N_total is correct. Reference: Privé 2022 HGG (bigsnpr documents Neff handling); Willer 2010 Bioinformatics (METAL Neff convention).
Trigger: Effect allele in GWAS sumstats differs from the coding/A1 allele in the LD reference panel; or palindromic SNPs (A/T, C/G) carried without strand resolution.
Mechanism: susie_rss treats z and R as defined on the same allele coding. If the effect allele is swapped relative to the LD-reference A1, the sign of z is wrong and the LD row/column for that SNP is implicitly flipped. SNPs matching by rsID can silently swap alleles between sumstats and reference, breaking the z' R z consistency the model relies on.
Symptom: estimate_s_rss lambda inflated despite ancestry-matched panel; kriging_rss flags many SNPs with |z_obs - z_exp| > 3 clustered at SNPs where reference A1 != GWAS effect allele; credible sets pick up tag-only SNPs anti-correlated with the lead.
Fix: Harmonize before fitting:
harmonize_z_to_ref <- function(z, gwas_a1, gwas_a2, ref_a1, ref_a2) {
palindromic <- (gwas_a1 == 'A' & gwas_a2 == 'T') | (gwas_a1 == 'T' & gwas_a2 == 'A') |
(gwas_a1 == 'C' & gwas_a2 == 'G') | (gwas_a1 == 'G' & gwas_a2 == 'C')
flip <- (gwas_a1 == ref_a2) & (gwas_a2 == ref_a1)
z[flip] <- -z[flip]
drop <- palindromic | !((gwas_a1 == ref_a1 & gwas_a2 == ref_a2) | flip)
list(z = z, keep = !drop)
}
Drop palindromic SNPs at MAF > 0.42 (ambiguous strand); or resolve via external strand info (TopMed, 1000G strand files). TwoSampleMR::harmonise_data() offers an alternative implementation. See causal-genomics/colocalization-analysis for an equivalent harmonize helper used downstream.
| Pattern | Likely cause | Action |
|---------|--------------|--------|
| SuSiE finds 3 credible sets, FINEMAP finds 1 | FINEMAP's stochastic search did not converge OR SuSiE absorbed background into spurious sets | Increase FINEMAP --n-iter; check SuSiE purity (sets with purity < 0.5 are spurious) |
| SuSiE PIPs much sharper than FINEMAP | susie_rss assumes single residual variance; FINEMAP marginalizes over noise | Both can be correct; report the intersection of high-PIP variants from both as primary candidates |
| PolyFun + SuSiE collapses 10-variant credible set to 1 | Functional priors are doing real work (coding variant in set) | Verify with prior_weights plot; if priors are coding-specific the result is interpretable |
| SuSiEx credible set excludes the EUR top-PIP variant | EUR signal is tag, true causal shared across ancestries lies elsewhere | Trust SuSiEx if both populations have well-powered GWAS; verify with conditional analysis |
| HLA gives 50-variant credible set in every method | HLA LD structure cannot be fine-mapped by linear methods | Use HLA-specific imputation (HIBAG, SNP2HLA) and haplotype-level analysis |
Operational rule: For high-confidence reporting, require that (a) estimate_s_rss() lambda < 0.05; (b) at least one credible set has purity > 0.5 (min_abs_corr >= 0.5, equivalent to r2 >= 0.25); (c) the lead PIP variant within that set is reproduced by an independent method (FINEMAP, SuSiEx, or in-sample SuSiE if reference-LD was used). Anything failing these three is exploratory.
| Quantity | Threshold | Source / Rationale |
|----------|-----------|-------------------|
| Credible set coverage (well-powered GWAS) | 0.95 (default) | Wang 2020 JRSSB; standard convention |
| Credible set coverage (low-N eQTL, GTEx tissue) | 0.9 or 0.8 | At N < 1000, default 0.95 returns very wide CS; document choice in methods |
| Credible set purity (rare-variant fine-mapping) | min_abs_corr >= 0.1 | LD genuinely sparse; relax to retain signal |
| Credible set purity (default common-variant) | min_abs_corr >= 0.5 (r2 >= 0.25) | susieR default; below this the set is LD-confounded |
| Credible set purity (publication-strict) | min_abs_corr >= 0.7 | Stringent claim; rare in practice |
| PIP suggestive | > 0.5 | Convention; "more likely than not causal among set" |
| PIP strong | > 0.9 | Convention; high-confidence single candidate |
| PIP very strong | > 0.95 | Convention; near-certain candidate within credible set |
| L (default cap) | 10 | susieR default; sufficient for most non-HLA loci |
| L (HLA / complex loci) | 20-30 | Empirical; HLA hosts > 10 independent signals for many traits |
| n for case-control susie_rss | Neff = 4/(1/Ncase + 1/Ncontrol), NOT Ntotal | Privé 2022 HGG; matches the SE scale of logistic-regression sumstats |
| estimate_s_rss lambda acceptable | < 0.05 | susieR vignette; > 0.10 indicates serious LD mismatch |
| kriging_rss per-SNP flag | |z_obs - z_exp| > 3 | susieR vignette; flag for manual review |
| Locus window (default) | +/- 500 kb from sentinel | Conventional; covers most LD blocks |
| Locus window (conditional-p floor) | Extend until conditional -log10(p) < 4 | Avoids truncating a secondary signal whose conditional evidence leaks into the window edge |
| Locus window (long-range LD) | 5+ Mb or stratify | HLA chr6:25-35Mb, chr8 inversion chr8:8.1-11.9Mb hg38, chr17 H1/H2 inversion |
| FINEMAP --n-causal-snps | 5 | Default; raise for HLA |
| FINEMAP --sss-tol | 0.001 | Convergence tolerance; rarely needs change |
Goal: Use genome-wide stratified LDSC heritability to weight per-SNP causal priors, sharpening PIPs at coding, conserved, and regulatory variants.
Approach: Run PolyFun once genome-wide to estimate per-SNP h2 from the baseline-LF annotation set; extract per-SNP causal prior; pass to susie_rss as prior_weights.
# Stage 1: compute per-SNP h2 contributions genome-wide (one-time)
polyfun.py --compute-h2-L2 \
--output-prefix polyfun_h2 \
--sumstats gwas_munged.sumstats \
--ref-ld-chr UKB_baseline_LF/baselineLF2.2.UKB. \
--w-ld-chr UKB_baseline_LF/weights.UKB.
# Stage 2: extract per-SNP prior for one locus
polyfun.py --compute-h2-bins \
--output-prefix polyfun_priors \
--sumstats gwas_munged.sumstats \
--ref-ld-chr polyfun_h2
library(susieR)
priors <- read.table('polyfun_priors.6.snpvar.gz', header = TRUE)
priors <- priors[match(gwas_df$SNP, priors$SNP), ]
prior_w <- priors$SNPVAR / sum(priors$SNPVAR, na.rm = TRUE)
fit <- susie_rss(z = z_scores, R = ld_matrix, n = N, L = 10,
prior_weights = prior_w)
UKB baseline-LF priors are pre-computed EUR-only at data.broadinstitute.org/alkesgroup/UKBB_LD/ for hg19 and hg38. For EAS, AFR, or SAS GWAS, the EUR weights are NOT valid: use PolyFun-pop (Weissbrod 2022 Nat Genet 54:450) which extends PolyFun via cross-population annotation weighting, OR the cross-ancestry baseline-LF tables from Marquez-Luna 2025. For ancestries lacking matched baseline-LF (admixed, under-represented), accept reduced power and run uniform-prior susie_rss; applying EUR weights to non-EUR sumstats produces miscalibrated PIPs that look sharper than reality.
For postdocs without PolyFun infrastructure or with single-locus inputs, manual annotation-based priors are a reasonable approximation (Hutchinson 2020 Genet Epidemiol 44:313). Coding variants get ~10x uniform weight; broadly conserved variants ~5x (binned by CADD-PHRED quantile).
build_manual_priors <- function(vep_df, cadd) {
w <- rep(1, nrow(vep_df))
w[vep_df$Consequence %in% c('missense_variant', 'stop_gained', 'splice_donor_variant',
'splice_acceptor_variant', 'frameshift_variant')] <- 10
w[cadd >= quantile(cadd, 0.95, na.rm = TRUE)] <- pmax(w[cadd >= quantile(cadd, 0.95, na.rm = TRUE)], 5)
w / sum(w)
}
fit <- susie_rss(z = z_scores, R = ld_matrix, n = Neff, L = 10, prior_weights = build_manual_priors(vep, cadd))
Report the prior construction explicitly; reviewers will ask whether the prior was tuned post hoc.
Goal: Jointly fine-map a locus across multiple ancestries assuming shared causal variants but population-specific LD.
Approach: Per-ancestry summary statistics + per-ancestry LD reference; SuSiEx runs a joint SuSiE model with population-specific R matrices.
SuSiEx \
--sst_file=eur_sumstats.txt,eas_sumstats.txt,afr_sumstats.txt \
--n_gwas=500000,200000,80000 \
--ref_file=1000G_EUR,1000G_EAS,1000G_AFR \
--ld_file=eur_ld,eas_ld,afr_ld \
--out_dir=susiex_out \
--out_name=locus1 \
--pop=EUR,EAS,AFR \
--chr=6 --bp=30000000,31000000 \
--chr_col=1,1,1 --snp_col=2,2,2 --bp_col=3,3,3 \
--a1_col=4,4,4 --a2_col=5,5,5 --eff_col=6,6,6 \
--se_col=7,7,7 --pval_col=8,8,8 \
--level=0.95
The output includes per-population PIPs and a joint credible set. Credible sets from SuSiEx are typically 2-5x smaller than EUR-only susie_rss when AFR is included, because AFR shorter LD blocks resolve EUR-tagged regions.
Goal: Independent confirmation via shotgun stochastic search.
Approach: Build .z, .ld, and master files; run FINEMAP with --sss and parse the .snp and .cred outputs.
# .z file format: snp chromosome position allele1 allele2 maf beta se
# .ld file: square LD matrix, space-separated, no header
cat > locus.master <<'EOF'
z;ld;snp;config;cred;log;n_samples
locus.z;locus.ld;locus.snp;locus.config;locus.cred;locus.log;500000
EOF
finemap --sss \
--in-files locus.master \
--n-causal-snps 5 \
--sss-tol 0.001 \
--n-iter 100000 \
--n-conv-sss 5000
# Parse:
# locus.snp -> per-variant prob (PIP), log10bf
# locus.cred -> credible sets at increasing causal counts
# locus.config -> top configurations
FINEMAP and SuSiE agree when sparsity holds; disagreement often reveals non-sparse loci that need SuSiE-inf.
Goal: Test colocalization between two traits using credible sets, not single SNPs.
Approach: Fit susie_rss separately per trait; pass both susie objects to coloc.susie; per-credible-set colocalization probabilities are returned.
library(coloc)
fit_trait1 <- susie_rss(z = z1, R = ld_matrix, n = N1, L = 10)
fit_trait2 <- susie_rss(z = z2, R = ld_matrix, n = N2, L = 10)
coloc_res <- coloc.susie(fit_trait1, fit_trait2)
# coloc_res$summary: per-credible-set PP.H4 (shared causal probability)
print(coloc_res$summary)
PP.H4 > 0.8 per credible set is the conventional shared-causal threshold; weaker thresholds suggest distinct or conditional signals. See causal-genomics/colocalization-analysis.
The HLA region (chr6:28-34 Mb), chromosome 8 inversion (chr8:8-12 Mb), and a handful of other extended LD blocks violate the assumptions of every fine-mapping method.
Symptoms of irrecoverable LD structure: Credible sets contain 30+ SNPs at low purity even with L=30; SuSiE-inf credible sets remain wide; kriging_rss flags hundreds of SNPs.
Options:
Document the caveat in any methods section; standard PIPs at HLA are not interpretable as causality estimates.
Variant-level fine-mapping cannot distinguish causal genes among co-regulated TWAS hits. FOCUS / MA-FOCUS extend fine-mapping to the predicted-expression level; see causal-genomics/transcriptome-wide-association for the full FOCUS workflow and reconciliation with variant-level credible sets.
Every locus reported should carry these columns; missing fields are the most common reviewer complaint.
| Column | Description |
|--------|-------------|
| locus_id | Locus identifier (chr:start-end or sentinel rsID) |
| method | susie_rss / FINEMAP / PAINTOR / SuSiEx / SuSiE-inf |
| L_used | sum(!fit$sets$pruned) (effective L; not just the cap passed in) |
| n_credible_sets | Number of returned credible sets at the chosen coverage |
| cs_size | Variants per credible set |
| cs_purity_min / cs_purity_mean | min and mean fit$sets$purity[,'min.abs.corr'] per set |
| top_pip_snp | Lead variant in each credible set |
| top_pip | Posterior inclusion probability of top_pip_snp |
| lambda_s | estimate_s_rss diagnostic for the locus |
| kriging_outlier_count | Count of SNPs with |z_obs - z_exp| > 3 |
| ld_panel | 1KG-EUR / UKB-EUR / in-sample / TopMed |
| prior_source | uniform / PolyFun-EUR / PolyFun-pop / manual coding-variant |
| coverage | 0.95 default; 0.9 or 0.8 documented for low-N |
| n_effective | Sample size passed to susie_rss (Neff for case-control) |
| Pushback | Standard response |
|----------|-------------------|
| "In-sample vs reference LD?" | In-sample preferred when cohort genotypes available; if reference, report estimate_s_rss lambda < 0.05 plus kriging_rss outlier count |
| "Credible-set purity?" | min_abs_corr >= 0.5 (r2 >= 0.25) default; reported per set; relaxed only with explicit rationale for rare-variant fine-mapping |
| "Is L set high enough?" | If returned CS count < L cap: OK (susieR auto-prunes); otherwise raise L. HLA needs L=20-30 |
| "Why not SuSiE-inf?" | Polygenic-shoulder test: count SNPs with marginal -log10(p) > 4 outside the lead credible set; > 50 indicates a polygenic shoulder and SuSiE-inf (Cui 2024) should be used |
| "Why no functional priors?" | PolyFun applied (or manual coding-variant prior used) and reported; if uniform, justify (low-N, mismatched-ancestry baseline-LF) |
| "Credible set has 50 SNPs -- is that fine-mapping?" | Acknowledged as imprecise; reported alongside diagnostics; cross-trait colocalization or functional fine-mapping (PolyFun, MPRA, allelic series) recommended for resolution |
| "Was Neff used for case-control?" | Yes: Neff = 4/(1/Ncase + 1/Ncontrol); report the value used |
| "Allele harmonization?" | Yes: flipped z when GWAS effect allele differs from reference A1; palindromic SNPs at MAF > 0.42 dropped |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| IBSS algorithm did not converge warning | L too small OR LD mismatch | Increase L; run estimate_s_rss; check ancestry match |
| Credible set contains all SNPs at locus | LD reference matches discovery poorly; s_hat > 0.1 | Switch to in-sample LD or stratify reference ancestry |
| PIPs identical to GWAS p-value rank | Effectively no LD information used; check LD matrix orientation | Verify SNP order in z and R match exactly; check for transposed R |
| Negative eigenvalues in LD matrix | Numerical PSD violation from finite-precision storage | Add small ridge: R <- R + diag(1e-4, nrow(R)); or use Matrix::nearPD |
| pip all ~ 1/p (uniform) | Convergence failure OR all effects pruned | Check fit$converged; raise L; check Z scale |
| FINEMAP Error: SNP names do not match | .z and .ld SNP order differ | Ensure both are sorted identically; pass matched .snp file |
| Coloc.susie returns NULL | One trait has zero credible sets | Verify both fits succeeded; lower coverage to 0.9 if signal is weak |
| SuSiEx output empty | Population label mismatch with reference panel | Verify --pop matches reference filename convention; check --bp window |
| PolyFun priors do not change PIPs | Passed to prior_variance instead of prior_weights | Read susieR docs; use prior_weights= |
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.