skills/tooluniverse-infectious-disease/SKILL.md
Rapid pathogen characterization and drug repurposing for outbreaks. Combines pathogen genomics (NCBI, BVBRC), host immune response (IEDB), drug-target databases (ChEMBL, DGIdb), and literature surveillance (PubMed/EuropePMC). Use for emerging-pathogen profiling, antiviral candidate identification, and outbreak intelligence reporting.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-infectious-diseaseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Rapid response system for emerging pathogens using taxonomy analysis, target identification, structure prediction, and computational drug repurposing.
KEY PRINCIPLES:
REASONING STRATEGY — Start Here: Start with pathogen identification: What type of organism? (virus, bacteria, fungus, parasite). Then ask:
LOOK UP DON'T GUESS: Never assume a pathogen's taxonomy, genome size, or protein function. Always call BVBRC_search_taxonomy or UniProt_search first. Even well-known pathogens have strains with different drug susceptibility profiles — look up the specific strain when known.
Apply when user asks:
[PATHOGEN]_outbreak_intelligence.md FIRST with section headers[PATHOGEN]_drug_candidates.csv, [PATHOGEN]_target_proteins.csvEvery finding must have inline source attribution:
### Target: RNA-dependent RNA polymerase (RdRp)
- **UniProt**: P0DTD1 (NSP12)
- **Essentiality**: Required for replication
*Source: UniProt via `UniProt_search`, literature review*
| Tool | WRONG Parameter | CORRECT Parameter |
|------|-----------------|-------------------|
| NCBIDatasets_get_taxonomy | name | tax_id (integer) or use BVBRC_search_taxonomy for keyword search |
| UniProt_search | name | query |
| ChEMBL_search_targets | query, target | pref_name__contains (substring match) |
| get_diffdock_info | protein_file | protein (content) |
| drugbank_full_search | (may fail) | Use drugbank_vocab_search as primary DrugBank lookup |
PubMed tip: Use
sort="relevance"(default) notsort="pub_date"— date-sorted queries can return empty for narrow topics. Tool name:PubMed_search_articles. FDA labels: UseFDA_get_drug_label_info_by_field_valuewith targetedreturn_fieldsto avoid oversized responses fromOpenFDA_search_drug_labels.
Phase 1: Pathogen Identification
├── Taxonomic classification (NCBI Taxonomy)
├── Closest relatives (for knowledge transfer)
├── Genome/proteome availability
└── OUTPUT: Pathogen profile
|
Phase 2: Target Identification
├── Essential genes/proteins (UniProt)
├── Conservation across strains
├── Druggability assessment (ChEMBL)
└── OUTPUT: Prioritized target list (scored by essentiality/conservation/druggability/precedent)
|
Phase 3: Structure Prediction (NvidiaNIM)
├── AlphaFold2/ESMFold for targets
├── Binding site identification
├── Quality assessment (pLDDT)
└── OUTPUT: Target structures (docking-ready if pLDDT > 70)
|
Phase 4: Drug Repurposing Screen
├── Approved drugs for related pathogens (ChEMBL)
├── Broad-spectrum antivirals/antibiotics
├── Docking screen (get_diffdock_info)
└── OUTPUT: Ranked candidate drugs
|
Phase 4.5: Pathway Analysis
├── KEGG: Pathogen metabolism pathways
├── Essential metabolic targets
├── Host-pathogen interaction pathways
└── OUTPUT: Pathway-based drug targets
|
Phase 5: Literature Intelligence
├── PubMed: Published outbreak reports
├── BioRxiv/MedRxiv: Recent preprints (CRITICAL for outbreaks)
├── ArXiv: Computational/ML preprints
├── OpenAlex: Citation tracking
├── ClinicalTrials.gov: Active trials
└── OUTPUT: Evidence synthesis
|
Phase 6: Report Synthesis
├── Top drug candidates with evidence grades
├── Clinical trial opportunities
├── Recommended immediate actions
└── OUTPUT: Final report
Classify via NCBI Taxonomy (query param). Identify related pathogens with existing drugs for knowledge transfer. Determine genome/proteome availability.
Genome assembly availability and QC: After classifying the pathogen, use NCBIDatasets_list_genomes_by_taxon (params taxon as tax_id, limit, reference_only) to find the reference genome, NCBIDatasets_get_genome_assembly (param accession, e.g. "GCF_000005845.2") for assembly metrics (length, N50, GC%, contig/chromosome counts), and NCBIDatasets_get_sequence_reports (param accession) to map replicons (chromosomes/plasmids with RefSeq/GenBank accessions). For the full assembly-QC-to-characterization workflow, see the tooluniverse-microbial-genome-characterization skill.
Open pathogen genomic surveillance: For the priority pathogens covered by Pathoplexus (west-nile, ebola-zaire, ebola-sudan, cchf, mpox), use Pathoplexus_count_sequences (params organism, group_by e.g. geoLocCountry or lineage) to gauge sequencing volume and geographic/lineage spread, and Pathoplexus_get_mutations (params organism, min_proportion e.g. 0.95) to pull characteristic high-prevalence mutations for the circulating population. Use early to quantify outbreak footprint and flag conserved mutations before target selection.
Knowledge transfer principle: Drugs effective against related pathogens are the highest-priority repurposing candidates. A protease inhibitor for SARS-CoV-1 is immediately relevant to SARS-CoV-2. Look up the related pathogen's approved drugs in ChEMBL before generating candidates from first principles.
Search UniProt for pathogen proteins (reviewed). Check ChEMBL for drug precedent. Score targets by: Essentiality (30%), Conservation (25%), Druggability (25%), Drug precedent (20%). Aim for 5+ targets.
Use NvidiaNIM AlphaFold2 for top 3 targets. Assess pLDDT confidence. Only dock structures with pLDDT > 70 (active site > 90 preferred). Fallback: alphafold_get_prediction or ESMFold_predict_structure.
Source candidates from: related pathogen drugs, broad-spectrum antivirals, target class drugs (DGIdb). Dock top 20+ candidates via get_diffdock_info. Rank by docking score and evidence tier.
Use KEGG to identify essential metabolic pathways. Map host-pathogen interaction points. Identify pathway-based drug targets beyond direct protein inhibition.
Search PubMed (peer-reviewed), BioRxiv/MedRxiv (preprints - critical for outbreaks), ArXiv (computational), ClinicalTrials.gov (active trials). Track citations via OpenAlex. Note: preprints are NOT peer-reviewed.
Aggregate all findings into final report. Grade every candidate. Provide 3+ immediate actions, clinical trial opportunities, and research priorities.
| Tier | Symbol | Criteria | Example | |------|--------|----------|---------| | T1 | [T1] | FDA approved for this pathogen | Remdesivir for COVID | | T2 | [T2] | Clinical trial evidence OR approved for related pathogen | Favipiravir | | T3 | [T3] | In vitro activity OR strong docking + mechanism | Sofosbuvir | | T4 | [T4] | Computational prediction only | Novel docking hits |
| Primary Tool | Fallback 1 | Fallback 2 |
|--------------|------------|------------|
| NvidiaNIM_alphafold2 (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | alphafold_get_prediction (AlphaFold DB by UniProt) | ESMFold_predict_structure |
| get_diffdock_info | NvidiaNIM_boltz2 (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | Manual docking |
| NCBIDatasets_suggest_taxonomy | UniProtTaxonomy_get_taxon | Manual classification |
| ChEMBL_search_drugs | drugbank_vocab_search | PubChem bioassays |
| File | Contents | |------|----------| | TOOLS_REFERENCE.md | Complete tool documentation | | phase_details.md | Detailed code examples and procedures for each phase | | report_template.md | Report template with section headers, checklist, and evidence grading | | CHECKLIST.md | Pre-delivery verification checklist (quality, citations, docking) | | EXAMPLES.md | Full worked examples (coronavirus, CRKP, limited-info scenarios) |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.