skills/tooluniverse-gwas-finemapping/SKILL.md
Statistical fine-mapping of GWAS loci using credible sets (SuSiE, FINEMAP) and locus-to-gene scoring (Open Targets L2G). Identifies likely causal variants and target genes — distinct from positional 'nearest gene' which is often wrong. Use for prioritizing causal variants at GWAS hits, comparing fine-mapping methods, and converting lead SNPs to target genes.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-gwas-finemappingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. Fine-mapping uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.
REASONING STRATEGY — Start Here: Fine-mapping asks: which variant at this locus is CAUSAL? Work through this chain:
This skill provides tools to:
A credible set is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a posterior probability of being causal, computed using methods like:
The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.
L2G scores integrate multiple data types to predict which gene is affected by a variant:
L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.
LOOK UP DON'T GUESS -- never assume a lead SNP is the causal variant. Always check LD structure, credible sets, and functional annotations via the tools below.
The lead SNP (most significant p-value) is often NOT the causal variant. It is simply the best-tagged variant on the genotyping array. The causal variant may be:
Action: Always call OpenTargets_get_variant_credible_sets for the lead SNP. If the posterior probability is < 0.5, the lead SNP is likely NOT causal -- examine other variants in the credible set.
LD blocks define the resolution limit of fine-mapping:
When interpreting a credible set:
Colocalization asks: do two association signals (e.g., GWAS + eQTL) share the SAME causal variant?
When multiple variants have similar posterior probabilities:
OpenTargets_get_variant_credible_sets or gwas_search_snps with gene=TCF7L2OpenTargets_get_variant_info then OpenTargets_get_variant_credible_setsOpenTargets_get_study_credible_setsOpenTargets_search_gwas_studies_by_disease or gwas_search_studiesOpenTargets_get_variant_info: Variant details and allele frequenciesOpenTargets_get_variant_credible_sets: Credible sets containing a variantOpenTargets_get_credible_set_detail: Detailed credible set informationOpenTargets_get_study_credible_sets: All loci from a GWAS studyOpenTargets_search_gwas_studies_by_disease: Find studies by diseasegwas_search_snps: Find SNPs by gene or rsIDgwas_get_snp_by_id: Detailed SNP informationgwas_get_associations_for_snp: All trait associations for a variantgwas_search_studies: Find studies by disease/traitQ: Why don't all variants have credible sets? A: Fine-mapping requires:
Q: Can a variant be in multiple credible sets? A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.
Q: What if the top L2G gene is far from the variant? A: This suggests regulatory effects (enhancers, promoters). Check:
Q: How do I choose between variants in a credible set? A: Prioritize by:
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.