skills/gwas-finemapping/SKILL.md
ToolUniverse workflow — Gwas Finemapping
npx skillsauth add lamm-mit/scienceclaw gwas-finemappingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. Fine-mapping uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.
This skill provides tools to:
A credible set is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a posterior probability of being causal, computed using methods like:
The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.
L2G scores integrate multiple data types to predict which gene is affected by a variant:
L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.
Question: "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?"
from python_implementation import prioritize_causal_variants
# Prioritize variants in TCF7L2 for diabetes
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print(result.get_summary())
# Output shows:
# - Credible sets containing TCF7L2 variants
# - Posterior probabilities (via fine-mapping methods)
# - Top L2G genes (which genes are likely affected)
# - Associated traits
Question: "What do we know about rs429358 (APOE4) from fine-mapping?"
# Fine-map a specific variant
result = prioritize_causal_variants("rs429358")
# Check which credible sets contain this variant
for cs in result.credible_sets:
print(f"Trait: {cs.trait}")
print(f"Fine-mapping method: {cs.finemapping_method}")
print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}")
print(f"Confidence: {cs.confidence}")
Question: "What are all the causal loci from the recent T2D meta-analysis?"
from python_implementation import get_credible_sets_for_study
# Get all fine-mapped loci from a study
credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS
print(f"Found {len(credible_sets)} independent loci")
# Examine each locus
for cs in credible_sets:
print(f"\nRegion: {cs.region}")
print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")
if cs.l2g_genes:
top_gene = cs.l2g_genes[0]
print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")
Question: "What GWAS studies exist for Alzheimer's disease?"
from python_implementation import search_gwas_studies_for_disease
# Search by disease name
studies = search_gwas_studies_for_disease("Alzheimer's disease")
for study in studies[:5]:
print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples")
print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}")
print(f" Has summary stats: {study.get('hasSumstats', False)}")
# Or use precise disease ontology IDs
studies = search_gwas_studies_for_disease(
"Alzheimer's disease",
disease_id="EFO_0000249" # EFO ID for Alzheimer's
)
Question: "How should we validate the top causal variant?"
result = prioritize_causal_variants("APOE", "alzheimer")
# Get experimental validation suggestions
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
# Output includes:
# - CRISPR knock-in experiments
# - Reporter assays
# - eQTL analysis
# - Colocalization studies
from python_implementation import (
prioritize_causal_variants,
search_gwas_studies_for_disease,
get_credible_sets_for_study
)
# Step 1: Find relevant GWAS studies
print("Step 1: Finding T2D GWAS studies...")
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0)
print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")
# Step 2: Get all fine-mapped loci from the study
print("\nStep 2: Getting fine-mapped loci...")
credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100)
print(f"Found {len(credible_sets)} credible sets")
# Step 3: Find loci near genes of interest
print("\nStep 3: Finding TCF7L2 loci...")
tcf7l2_loci = [
cs for cs in credible_sets
if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes)
]
print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")
# Step 4: Prioritize variants at TCF7L2
print("\nStep 4: Prioritizing TCF7L2 variants...")
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
# Step 5: Print summary and validation plan
print("\n" + "="*60)
print("FINE-MAPPING SUMMARY")
print("="*60)
print(result.get_summary())
print("\n" + "="*60)
print("VALIDATION STRATEGY")
print("="*60)
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
FineMappingResultMain result object containing:
query_variant: Variant annotationquery_gene: Gene symbol (if queried by gene)credible_sets: List of fine-mapped lociassociated_traits: All associated traitstop_causal_genes: L2G genes ranked by scoreMethods:
get_summary(): Human-readable summaryget_validation_suggestions(): Experimental validation strategiesCredibleSetRepresents a fine-mapped locus:
study_locus_id: Unique identifierregion: Genomic region (e.g., "10:112861809-113404438")lead_variant: Top variant by posterior probabilityfinemapping_method: Statistical method used (SuSiE, FINEMAP, etc.)l2g_genes: Locus-to-gene predictionsconfidence: Credible set confidence (95%, 99%)L2GGeneLocus-to-gene prediction:
gene_symbol: Gene name (e.g., "TCF7L2")gene_id: Ensembl gene IDl2g_score: Probability score (0-1)VariantAnnotationFunctional annotation for a variant:
variant_id: Open Targets format (chr_pos_ref_alt)rs_ids: dbSNP identifierschromosome, position: Genomic coordinatesmost_severe_consequence: Functional impactallele_frequencies: Population-specific MAFsOpenTargets_get_variant_info: Variant details and allele frequenciesOpenTargets_get_variant_credible_sets: Credible sets containing a variantOpenTargets_get_credible_set_detail: Detailed credible set informationOpenTargets_get_study_credible_sets: All loci from a GWAS studyOpenTargets_search_gwas_studies_by_disease: Find studies by diseasegwas_search_snps: Find SNPs by gene or rsIDgwas_get_snp_by_id: Detailed SNP informationgwas_get_associations_for_snp: All trait associations for a variantgwas_search_studies: Find studies by disease/trait| Method | Approach | Strengths | Use Case | |--------|----------|-----------|----------| | SuSiE | Sum of Single Effects | Handles multiple causal variants | Multi-signal loci | | FINEMAP | Bayesian shotgun stochastic search | Fast, scalable | Large studies | | PAINTOR | Functional annotations | Integrates epigenomics | Regulatory variants | | CAVIAR | Colocalization | Finds shared causal variants | eQTL overlap |
Q: Why don't all variants have credible sets? A: Fine-mapping requires:
Q: Can a variant be in multiple credible sets? A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.
Q: What if the top L2G gene is far from the variant? A: This suggests regulatory effects (enhancers, promoters). Check:
Q: How do I choose between variants in a credible set? A: Prioritize by:
tools
Onboard and manage Paperclip AI for research-paper knowledge and agent orchestration
development
Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.
testing
Generate a structured scientific PDF report from a JSON description. Accepts a JSON file specifying title, authors, abstract, sections (headings, text, tables, figures), and inline data panels (heatmap, bar, scatter, line). Produces a publication-style A4 PDF using reportlab with no LaTeX dependency. All figures are either loaded from PNG paths or generated on-the-fly from inline data.
development
Execute arbitrary Python code and return stdout. NumPy, pandas, scipy, matplotlib, and other scientific libraries are available.