skills/tooluniverse-gwas-trait-to-gene/SKILL.md
Discover causal genes for diseases/traits from GWAS data using Open Targets L2G (locus-to-gene) scoring — integrates eQTL, chromatin interaction, and distance evidence. Use for trait-to-gene mapping, drug-target hypothesis generation from GWAS, and replacing the 'nearest gene' heuristic with multi-evidence L2G scores.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-gwas-trait-to-geneInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Nearest gene is often wrong. Use L2G (locus-to-gene) scores from Open Targets which integrate eQTL, chromatin interaction, and distance data. L2G > 0.5 is a strong prediction; positional mapping alone should not be used to claim a causal gene. A single GWAS study with p < 5e-8 is suggestive — replication across independent cohorts is required for high confidence. GWAS hits are associations in the studied population; effect sizes and even the implicated gene can differ across ancestries due to differing LD patterns. Treat gene lists from GWAS as ranked candidates for validation, not confirmed causal genes.
LOOK UP DON'T GUESS: never assume trait-to-gene mappings or L2G scores — always call gwas_search_associations and OpenTargets_get_study_credible_sets to retrieve current data; associations are updated as new GWAS are published.
Discover genes associated with diseases and traits using genome-wide association studies (GWAS)
This skill enables systematic discovery of genes linked to diseases/traits by analyzing GWAS data from two major resources:
Clinical Research
Drug Target Discovery
Functional Genomics
1. Trait Search → Search GWAS Catalog by disease/trait name
↓
2. SNP Aggregation → Collect genome-wide significant SNPs (p < 5e-8)
↓
3. Gene Mapping → Extract mapped genes from associations
↓
4. Evidence Ranking → Score by p-value, replication, fine-mapping
↓
5. Annotation (Optional) → Add L2G predictions from Open Targets
Genome-wide Significance
Gene Mapping Methods
Evidence Confidence Levels
gwas_get_associations_for_trait - Get all associations for a trait (sorted by p-value). NOTE: This tool is BROKEN -- use gwas_search_associations(query=trait) as a working alternativegwas_search_snps - Search SNPs by gene mappinggwas_get_snp_by_id - Get SNP details (MAF, consequence, location)gwas_get_study_by_id - Get study metadatagwas_search_associations - Search associations with filters (RECOMMENDED for trait lookups)gwas_search_studies - Search studies by trait/cohortgwas_get_associations_for_snp - Get all associations for a SNPgwas_get_variants_for_trait - Get variants for a trait. Supports p_value_threshold parameter for server-side filtering (see notes below)gwas_get_studies_for_trait - Get studies for a traitgwas_get_snps_for_gene - Get SNPs mapped to a gene. Parameter is gene_symbol (NOT mapped_gene)gwas_get_associations_for_study - Get associations from a studyOpenTargets_search_gwas_studies_by_disease - Search studies by disease ontologyOpenTargets_get_study_credible_sets - Get fine-mapped loci for a studyOpenTargets_get_variant_credible_sets - Get credible sets for a variantOpenTargets_get_variant_info - Get variant annotation (frequencies, consequences)OpenTargets_get_gwas_study - Get study metadataOpenTargets_get_credible_set_detail - Get detailed credible set informationRequired
trait - Disease/trait name (e.g., "type 2 diabetes", "coronary artery disease")Optional
p_value_threshold - Significance threshold (default: 5e-8)min_evidence_count - Minimum number of studies (default: 1)max_results - Maximum genes to return (default: 100)use_fine_mapping - Include L2G predictions (default: true)disease_ontology_id - Disease ontology ID for Open Targets (e.g., "MONDO_0005148"){
"genes": [
{
"symbol": str, # Gene symbol (e.g., "TCF7L2")
"min_p_value": float, # Most significant p-value
"evidence_count": int, # Number of independent studies
"snps": [str], # Associated SNP rs IDs
"studies": [str], # GWAS study accessions
"l2g_score": float | null, # Locus-to-gene score (0-1)
"credible_sets": int, # Number of credible sets
"confidence_level": str # "High", "Medium", or "Low"
}
],
"summary": {
"trait": str,
"total_associations": int,
"significant_genes": int,
"data_sources": ["GWAS Catalog", "Open Targets"]
}
}
Type 2 Diabetes
TCF7L2: p=1.2e-98, 15 studies, L2G=0.82 → High confidence
KCNJ11: p=3.4e-67, 12 studies, L2G=0.76 → High confidence
PPARG: p=2.1e-45, 8 studies, L2G=0.71 → High confidence
FTO: p=5.6e-42, 10 studies, L2G=0.68 → High confidence
IRS1: p=8.9e-38, 6 studies, L2G=0.54 → High confidence
Alzheimer's Disease
APOE: p=1.0e-450, 25 studies, L2G=0.95 → High confidence
BIN1: p=2.3e-89, 18 studies, L2G=0.88 → High confidence
CLU: p=4.5e-67, 16 studies, L2G=0.82 → High confidence
ABCA7: p=6.7e-54, 14 studies, L2G=0.79 → High confidence
CR1: p=8.9e-52, 13 studies, L2G=0.75 → High confidence
1. Use Disease Ontology IDs for Precision
# Instead of:
discover_gwas_genes("diabetes") # Ambiguous
# Use:
discover_gwas_genes(
"type 2 diabetes",
disease_ontology_id="MONDO_0005148" # Specific
)
2. Filter by Evidence Strength
# For drug targets, require strong evidence:
discover_gwas_genes(
"coronary artery disease",
p_value_threshold=5e-10, # Stricter than GWAS threshold
min_evidence_count=3, # Multiple independent studies
use_fine_mapping=True # Include L2G predictions
)
3. Interpret Results Carefully
gwas_get_variants_for_trait -- p-value FilteringThis tool now accepts an optional p_value_threshold parameter for server-side
p-value filtering. When provided, the GWAS Catalog API filters variants to only
return those below the specified threshold.
# Server-side filtering (preferred -- reduces data transfer)
result = tu.tools.gwas_get_variants_for_trait(
trait="type 2 diabetes",
p_value_threshold=5e-8
)
Client-side fallback: When the API returns unfiltered results (some trait queries ignore the threshold parameter), the tool also applies client-side p-value filtering. This means you may see fewer results than expected if the API returned pre-filtered data and the client filter applies again. Always check the actual p-values in the returned data.
gwas_get_associations_for_trait -- BROKENThis tool returns errors for most queries. Use gwas_search_associations(query=<trait>)
as a reliable alternative. The response format is {data: [{...}], metadata: {...}}.
gwas_get_snps_for_gene -- Parameter RenameThe parameter was renamed from mapped_gene to gene_symbol for clarity. Use:
result = tu.tools.gwas_get_snps_for_gene(gene_symbol="TCF7L2")
When ToolUniverse tools return limited results or you need the full GWAS Catalog:
import requests, pandas as pd
# Download full GWAS Catalog (all associations, ~37MB TSV)
url = "https://www.ebi.ac.uk/gwas/api/search/downloads/alternative"
df = pd.read_csv(url, sep="\t")
# Filter locally by trait or gene
hits = df[df["DISEASE/TRAIT"].str.contains("type 2 diabetes", case=False, na=False)]
gene_hits = df[df["MAPPED_GENE"].str.contains("TCF7L2", na=False)]
# Per-study associations via REST
study_id = "GCST001234"
assocs = requests.get(f"https://www.ebi.ac.uk/gwas/rest/api/studies/{study_id}/associations").json()
# Summary statistics (when available)
# Check study page for fullPvalueSet=true, then download from linked FTP
See tooluniverse-data-wrangling skill for pagination, bulk download, and format parsing patterns.
Gene Mapping Uncertainty
Population Bias
Sample Size Dependence
Validation Bug
validate=False parameter if neededGWAS Catalog
Open Targets Genetics
If you use this skill in research, please cite:
Buniello A, et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide
association studies. Nucleic Acids Research, 47(D1):D1005-D1012.
Mountjoy E, et al. (2021) An open approach to systematically prioritize causal
variants and genes at all published human GWAS trait-associated loci.
Nature Genetics, 53:1527-1533.
For issues with:
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).