skills/tooluniverse-disease-research/SKILL.md
Generate comprehensive disease research reports covering genetics (causal genes, GWAS, OMIM), pathways (Reactome, KEGG), drugs (existing therapies, repurposing candidates), clinical trials, epidemiology (prevalence, incidence), and phenotypes (HPO). Use for full disease overviews, comprehensive disease characterization, and orphan/rare-disease profiling.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-disease-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate a comprehensive disease research report with full source citations. The report is created as a markdown file and progressively updated during research.
IMPORTANT: Always use English disease names and search terms in tool calls. Respond in the user's language.
When asked about a disease, query Orphanet/OMIM/DisGeNET FIRST. Don't rely on memory for prevalence, genetics, or treatment — these change over time. When you're not sure about a fact, your first instinct should be to SEARCH for it using tools, not to reason harder from memory.
DO NOT show the search process to the user. Instead:
{disease_name}_research_report.mdWhen synthesizing disease etiology, trace the full pathogenic cascade:
This chain structures the Genetic & Molecular Basis (Section 3) and Biological Pathways (Section 5) sections.
| Dim | Section | Key Tools | |-----|---------|-----------| | 1 | Identity & Classification | OSL_get_efo_id_by_disease_name, ols_search_efo_terms, ols_get_efo_term, umls_search_concepts, icd_search_codes, snomed_search_concepts | | 2 | Clinical Presentation | OpenTargets phenotypes, HPO lookup, MedlinePlus | | 3 | Genetic & Molecular Basis | OpenTargets targets, ClinVar variants, GWAS associations, gnomAD | | 4 | Treatment Landscape | OpenTargets drugs, clinical trials, GtoPdb | | 5 | Biological Pathways | Reactome pathways, humanbase_ppi_analysis, GTEx expression, HPA | | 6 | Epidemiology & Literature | PubMed, OpenAlex, Europe PMC, Semantic Scholar | | 7 | Similar Diseases | OpenTargets similar entities | | 8 | Cancer-Specific (if applicable) | CIViC genes/variants/therapies | | 9 | Pharmacology | GtoPdb targets/interactions/ligands | | 10 | Drug Safety | OpenTargets warnings, clinical trial AEs, FAERS |
See: tool_usage_details.md for complete tool calls per section.
Create this file structure at the start:
# Disease Research Report: {Disease Name}
**Report Generated**: {date}
**Disease Identifiers**: (to be filled)
---
## Executive Summary
(Brief 3-5 sentence overview - fill after all research complete)
---
## 1. Disease Identity & Classification
### Ontology Identifiers
| System | ID | Source |
### Synonyms & Alternative Names
### Disease Hierarchy
---
## 2. Clinical Presentation
### Phenotypes (HPO)
| HPO ID | Phenotype | Description | Source |
### Symptoms & Signs
### Diagnostic Criteria
---
## 3. Genetic & Molecular Basis
### Associated Genes
| Gene | Score | Ensembl ID | Evidence | Source |
### GWAS Associations
| SNP | P-value | Odds Ratio | Study | Source |
### Pathogenic Variants (ClinVar)
---
## 4. Treatment Landscape
### Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Target | Source |
### Clinical Trials
| NCT ID | Title | Phase | Status | Source |
---
## 5. Biological Pathways & Mechanisms
## 6. Epidemiology & Risk Factors
## 7. Literature & Research Activity
## 8. Similar Diseases & Comorbidities
## 9. Cancer-Specific Information (if applicable)
## 10. Drug Safety & Adverse Events
---
## References
### Tools Used
| # | Tool | Parameters | Section | Items Retrieved |
Every piece of data MUST include its source:
In tables: Add a Source column with tool name
In lists: - Finding [Source: tool_name]
In prose: (Source: tool_name, query: "...")
References section: Complete tool usage log with parameters
# After each dimension's research:
# 1. Read current report
# 2. Replace placeholder with formatted content
# 3. Write back immediately
# 4. Continue to next dimension
Every finding in the report should be graded:
| Grade | Criteria | Example | |-------|---------|---------| | T1 (Strong) | Replicated genetic evidence (GWAS, rare variants), FDA-approved therapy | BRCA1 → breast cancer; trastuzumab for HER2+ | | T2 (Moderate) | Single genetic study, phase II+ trial data, strong biological evidence | FOXO3 → longevity (centenarian studies) | | T3 (Association) | Observational data, gene expression changes, pathway membership | IL-6 elevated in Alzheimer's CSF | | T4 (Computational) | Network proximity, text mining, predicted associations | DisGeNET text-mined gene-disease link |
After collecting data from all 10 dimensions, the report MUST answer:
When multiple databases provide different data for the same disease:
| Conflict | Resolution | |----------|-----------| | Different prevalence estimates across sources | Report range; note the most recent/largest study | | Drug approved in one country but not another | Note regulatory status per region | | Gene-disease association in one DB but absent in another | Grade by evidence type; text-mining alone is T4 | | Clinical trial results contradict label indications | The trial result is newer evidence; note both |
For a well-studied disease (e.g., Alzheimer's), the final report should include:
Total: 500+ individual data points, each with source citation.
For rare disease differential diagnosis, run: python3 skills/tooluniverse-rare-disease-diagnosis/scripts/clinical_patterns.py --type differential --symptoms 'symptom1,symptom2'
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.