skills/tooluniverse-gene-disease-association/SKILL.md
Gene-disease association analysis across DisGeNET, OpenTargets, Monarch, OMIM, GenCC, Orphanet. Cross-references multiple sources for evidence-graded association reports with concordance scoring (5/5 sources agree → strong, 1/5 → weak). Use for 'which diseases is gene X associated with' or 'which genes cause disease Y' queries with quantitative confidence.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-gene-disease-associationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematically query and compare gene-disease associations across 6+ databases to produce a unified, evidence-graded report. Cross-references DisGeNET scores, OpenTargets evidence, Monarch Initiative cross-species data, OMIM Mendelian mappings, GenCC curated validity, and Orphanet rare disease links.
IMPORTANT: Always use English gene names and disease terms in tool calls. Respond in the user's language.
When uncertain about any scientific fact, SEARCH databases first (PubMed, UniProt, ChEMBL, ClinVar, etc.) rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
Phase 1: Gene/Disease Identification & ID Resolution
Resolve gene symbol to Ensembl ID, HGNC CURIE, MIM number
OR resolve disease name to UMLS CUI, EFO ID, MONDO ID, ORPHA code
|
Phase 2: DisGeNET Associations (scored, multi-evidence)
Gene-disease association scores with evidence type filtering
|
Phase 3: OpenTargets Associations (integrated evidence)
Disease phenotypes and genetic associations from OpenTargets
|
Phase 4: Monarch Initiative (cross-species evidence)
Gene-disease associations integrating OMIM, ClinVar, model organisms
|
Phase 5: Mendelian Disease Evidence (curated)
OMIM gene-disease map, GenCC validity classifications, Orphanet rare diseases
|
Phase 6: Variant-Disease Associations (optional, if gene query)
DisGeNET variant-disease links, ClinVar pathogenic variants
|
Phase 7: Evidence Synthesis
Unified table, concordance scoring, confidence levels, final report
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Gene query: resolve IDs
gene_info = tu.tools.MyGene_query_genes(query=f"symbol:{gene_symbol}", species="human",
fields="symbol,ensembl.gene,entrezgene,name", size=5) # -> ensembl_id
Monarch_search = tu.tools.MonarchV3_search(query=gene_symbol, category="biolink:Gene", limit=5) # -> HGNC CURIE
omim_result = tu.tools.OMIM_search(query=gene_symbol, limit=5) # -> MIM number
gene_summary = tu.tools.Harmonizome_get_gene(gene_symbol=gene_symbol)
# Disease query: resolve IDs
monarch_disease = tu.tools.MonarchV3_search(query=disease_name, category="biolink:Disease", limit=5) # -> MONDO CURIE
mappings = tu.tools.MonarchV3_get_mappings(entity_id=mondo_id, limit=20) # -> OMIM, ICD10, SNOMED, Orphanet
API KEY REQUIRED: DisGeNET tools require
DISGENET_API_KEYenvironment variable. Without it, all DisGeNET calls will fail. Register at https://www.disgenet.org/api/#/Authorization for a free academic key. Fallback if no key: Skip this phase and rely on OpenTargets (Phase 3) + Monarch (Phase 4) which are free and cover much of the same data.
# Gene -> diseases
disgenet_diseases = tu.tools.DisGeNET_search_gene(gene=gene_symbol, limit=20)
disgenet_gda = tu.tools.DisGeNET_get_gda(gene=gene_symbol, source="CURATED", min_score=0.3, limit=25)
# Disease -> genes (accepts name or UMLS CUI like "C0006142")
disgenet_genes = tu.tools.DisGeNET_search_disease(disease=disease_name, limit=20)
disgenet_ranked = tu.tools.DisGeNET_get_disease_genes(disease=disease_name, min_score=0.3, limit=50)
Interpreting DisGeNET scores: Higher scores reflect more evidence sources and stronger curation. Rather than memorizing cutoffs, ask: is this score driven by curated sources or text-mining? Use source="CURATED" to distinguish.
ot_diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensembl(ensemblId=ensembl_id)
ot_evidence = tu.tools.OpenTargets_target_disease_evidence(ensemblId=ensembl_id, efoId=efo_id)
# Both require pre-resolved Ensembl/EFO IDs. Use OpenTargets_multi_entity_search_by_query_string to discover IDs.
# Gene -> diseases (integrates OMIM, ClinVar, Orphanet, model organisms)
monarch_diseases = tu.tools.MonarchV3_get_associations(
subject=hgnc_curie, category="biolink:CausalGeneToDiseaseAssociation", limit=20)
# Disease -> genes
monarch_genes = tu.tools.MonarchV3_get_associations(
subject=mondo_id, category="biolink:CorrelatedGeneToDiseaseAssociation", limit=20)
histopheno = tu.tools.MonarchV3_get_histopheno(entity_id=mondo_id) # phenotypes by body system
entity = tu.tools.MonarchV3_get_entity(entity_id=hgnc_curie) # details, synonyms, xrefs
API KEY REQUIRED: OMIM tools require
OMIM_API_KEY. Register at https://omim.org/api for academic access. Fallback if no key: Use Monarch Initiative (biolink:CausalGeneToDiseaseAssociationfrom Phase 4) which includes OMIM data without requiring a key. Also use GenCC (below) which is fully open.
# OMIM: Mendelian gene-disease mapping (use gene MIM number, not phenotype MIM)
omim_entry = tu.tools.OMIM_get_entry(mim_number=mim_number)
omim_gene_map = tu.tools.OMIM_get_gene_map(mim_number=mim_number)
omim_clinical = tu.tools.OMIM_get_clinical_synopsis(mim_number=phenotype_mim)
# GenCC: curated validity (Definitive/Strong/Moderate/Limited/Disputed/Refuted)
gencc_result = tu.tools.GenCC_search_gene(gene_symbol=gene_symbol) # handles gene renames
gencc_disease = tu.tools.GenCC_search_disease(disease="Marfan syndrome") # word-tokenized matching
gencc_classifications = tu.tools.GenCC_get_classifications(gene_symbol="BRCA1", disease="breast cancer")
# Orphanet: rare disease associations (filter results by exact gene.symbol match)
orphanet_result = tu.tools.Orphanet_get_gene_diseases(gene_name=gene_symbol)
Run when the query is gene-based and variant-level evidence adds value.
vda_result = tu.tools.DisGeNET_get_vda(gene=gene_symbol, limit=25) # variant-disease links
clinvar_result = tu.tools.ClinVar_search_variants(gene=gene_symbol, max_results=20)
clinvar_detail = tu.tools.ClinVar_get_variant_details(variant_id="12345") # detailed variant info
Compile all results into a single table per gene-disease pair:
## Gene-Disease Associations for BRCA1
| Disease | DisGeNET Score | OpenTargets Score | Monarch | OMIM | GenCC | Orphanet | Sources |
|---------|---------------|-------------------|---------|------|-------|----------|---------|
| Breast cancer | 0.82 | 0.95 | Yes | #114480 | Definitive | ORPHA:227535 | 6/6 |
| Ovarian cancer | 0.78 | 0.91 | Yes | #604370 | Definitive | ORPHA:213500 | 6/6 |
| Pancreatic cancer | 0.35 | 0.42 | Yes | - | Moderate | - | 3/6 |
| Fanconi anemia | 0.45 | 0.38 | Yes | #605724 | Strong | ORPHA:84 | 5/6 |
Evidence strength reasoning: A gene-disease association supported by multiple independent lines of evidence (genetic, functional, model organism) is stronger than one supported by a single study. Ask: how many independent sources support this link? Do they converge on the same mechanism?
Genetic evidence hierarchy: Mendelian segregation (gene mutation causes disease in family) > GWAS (statistical association in population) > candidate gene study (hypothesis-driven). The first proves causation. The second shows correlation. The third is hypothesis. OMIM/GenCC "Definitive" entries represent the top of this hierarchy; DisGeNET text-mining hits represent the bottom.
Cross-database concordance: If DisGeNET, OpenTargets, AND OMIM all link gene X to disease Y, that's strong concordance. If only one database shows the link, check why -- is it a single study indexed by that database? Concordance across databases does not equal independent evidence if they all cite the same primary study. Count the number of databases supporting each association, but reason about whether they represent truly independent evidence.
Mechanism reasoning: Knowing the gene's function helps evaluate the association. A gene encoding a liver enzyme being linked to liver disease is mechanistically plausible. The same gene being linked to a psychiatric disorder needs stronger evidence because the mechanism is less obvious. Use Harmonizome gene summaries and Monarch phenotype profiles to assess mechanistic plausibility.
_gene_matches(). Other tools require current HGNC symbol from MyGene_query_genes.fields="ensembl.gene".For comprehensive disease reports: tooluniverse-disease-research For rare disease diagnosis: tooluniverse-rare-disease-diagnosis For variant interpretation: tooluniverse-variant-interpretation For drug-target validation: tooluniverse-drug-target-validation
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.