skills/tooluniverse-gwas-study-explorer/SKILL.md
Compare GWAS studies, perform meta-analyses across cohorts, and assess signal replication. Uses GWAS Catalog metadata, study-level statistics, and cross-cohort comparison. Use for evaluating GWAS reproducibility for a trait, meta-analysis sample size and effect-size aggregation, and detecting study heterogeneity (population, design, ancestry).
npx skillsauth add mims-harvard/tooluniverse tooluniverse-gwas-study-explorerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Compare GWAS studies, perform meta-analyses, and assess replication across cohorts
The GWAS Study Deep Dive & Meta-Analysis skill enables comprehensive comparison of genome-wide association studies (GWAS) for the same trait, meta-analysis of genetic loci across studies, and systematic assessment of replication and study quality. It integrates data from the NHGRI-EBI GWAS Catalog and Open Targets Genetics to provide a complete picture of the genetic architecture of complex traits.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
When comparing GWAS studies for the same trait, ask: do they replicate? The same lead SNPs appearing in independent studies is strong evidence of a true association. Different lead SNPs at the same locus may reflect LD differences between populations — they may tag the same causal variant. Different loci entirely may reflect different study designs, phenotype definitions, or population ancestry. Before concluding that a finding failed to replicate, check whether the SNP was even genotyped or imputed in the replication cohort.
LOOK UP DON'T GUESS: effect sizes, p-values, allele frequencies, and LD structure for specific loci. Do not assume a SNP present in one study is present in another — use gwas_get_associations_for_snp to retrieve cross-study data. Do not infer LD blocks from genomic proximity; use credible sets from Open Targets for fine-mapping results.
Scenario: "I want to understand all available GWAS data for type 2 diabetes"
Workflow:
Outcome: Complete landscape of T2D genetics with replicated findings and population-specific signals
Scenario: "Is the TCF7L2 association with T2D consistent across all studies?"
Workflow:
Outcome: Quantitative assessment of effect size consistency with heterogeneity interpretation
Honesty rule (important): A real inverse-variance meta-analysis needs each study's beta + 95% CI.
python_implementation.pyparses these from the GWAS Catalogbeta/or_value+rangefields and only then pools effect sizes and computes Cochran's-Q I². When the matched associations don't report usable effect sizes (common), it returnsmethod="descriptive",combined_beta=None,heterogeneity_i2=None, andcombined_p_value= the smallest reported p (not a pooled p) — do NOT present a descriptive result as a formal meta-analysis or invent an I².
Scenario: "Which findings from the discovery cohort replicated in the independent sample?"
Workflow:
Outcome: Systematic replication report with success rates and failed findings
Scenario: "Are T2D loci consistent across European and East Asian populations?"
Workflow:
Outcome: Ancestry-specific genetic architecture with transferability assessment
This skill implements standard GWAS meta-analysis methods:
Fixed-Effects Model:
Random-Effects Model (recommended when I² > 50%):
Heterogeneity Assessment:
The I² statistic measures the percentage of variance due to between-study heterogeneity:
I² = [(Q - df) / Q] × 100%
where Q = Cochran's Q statistic
df = degrees of freedom (n_studies - 1)
Interpretation Guidelines:
Common reasons for high I²:
Recommendations:
The skill evaluates studies based on:
1. Sample Size:
2. Ancestry Diversity:
3. Data Availability:
4. Genotyping Quality:
5. Statistical Rigor:
Tier 1 (High Quality):
Tier 2 (Moderate Quality):
Tier 3 (Limited):
❌ Don't:
✅ Do:
When I² > 75%:
When Studies Conflict:
gwas_search_studies: Find studies by traitgwas_get_study_by_id: Get detailed study metadatagwas_get_associations_for_study: Retrieve study associationsgwas_get_associations_for_snp: Get SNP associations across studiesgwas_search_associations: Search associations by traitOpenTargets_search_gwas_studies_by_disease: Disease-based study searchOpenTargets_get_gwas_study: Detailed study information with LD populationsOpenTargets_get_variant_credible_sets: Fine-mapped loci for variantOpenTargets_get_study_credible_sets: All credible sets for studyOpenTargets_get_variant_info: Variant annotation and allele frequenciesCredible Set: Set of variants likely to contain the causal variant (from fine-mapping)
L2G (Locus-to-Gene): Score predicting which gene is affected by a GWAS locus License: Open source (MIT)
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.