plugin/skills/tooluniverse-microbiome-research/SKILL.md
Microbiome research using MGnify, GTDB, ENA, OLS (ENVO biomes), and EuropePMC. Covers study discovery, taxonomic profiling, host-microbe interaction analysis, and biome-by-condition queries. Use for microbiome study selection, organism-environment associations, and clinical-microbiome literature review. Distinct from analytical workflow (use tooluniverse-metagenomics-analysis for that).
npx skillsauth add mims-harvard/tooluniverse tooluniverse-microbiome-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive microbiome analysis using MGnify (EBI metagenomics), GTDB (genome taxonomy), ENA (sequencing data), OLS (ontology lookup for ENVO biomes), and EuropePMC (literature).
| Tool | Purpose | Auth |
|------|---------|------|
| MGnify_search_studies | Find metagenomics studies by biome/keyword | None |
| MGnify_get_study_detail | Study metadata, abstract, sample counts | None |
| MGnify_list_analyses | List taxonomic/functional analysis outputs for a study | None |
| MGnify_get_taxonomy | Taxonomic composition from an analysis | None |
| MGnify_get_go_terms | GO functional annotations from an analysis | None |
| MGnify_get_interpro | InterPro protein domain annotations | None |
| MGnify_list_biomes | Browse MGnify biome hierarchy | None |
| MGnify_search_genomes | Search metagenome-assembled genomes (MAGs) | None |
| MGnify_get_genome | Genome quality metrics (completeness, contamination) | None |
| GTDB_search_genomes | Search bacterial/archaeal genomes by taxonomy | None |
| GTDB_get_species | Species cluster details from GTDB | None |
| GTDB_get_taxon_info | Taxonomic rank info in GTDB hierarchy | None |
| GTDB_search_taxon | Search taxa by partial name across all ranks | None |
| ENAPortal_search_studies | Find sequencing studies in ENA. Query format: description="keyword" | None |
| ENAPortal_search_samples | Find samples with environmental metadata | None |
| ols_search_terms | Search ENVO ontology for biome/environment terms | None |
| EuropePMC_search_articles | Find microbiome publications | None |
| PubMed_search_articles | Literature search (different coverage than EuropePMC) | None |
For drug-microbiome studies, also use:
PubChem_get_CID_by_compound_name / PubChem_get_compound_properties_by_CID — drug identityCTD_get_chemical_gene_interactions — drug-gene interactions (e.g., metformin affects 1,175+ genes)kegg_search_pathway / kegg_get_pathway_info — microbial metabolic pathways (butanoate, propanoate)ReactomeAnalysis_pathway_enrichment — host pathway enrichment for drug-affected genesdrugbank_vocab_search — drug mechanism and targetsMGnify tip: Use concise single-keyword searches (e.g., "metformin") — multi-word queries may timeout. The MGnify API can be slow for broad searches.
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# 1. Search for gut microbiome studies
studies = tu.run_one_function({
'name': 'MGnify_search_studies',
'arguments': {'search': 'gut microbiome', 'size': 5}
})
# 2. Get study details
detail = tu.run_one_function({
'name': 'MGnify_get_study_detail',
'arguments': {'study_accession': 'MGYS00006860'}
})
# 3. List analyses for a study
analyses = tu.run_one_function({
'name': 'MGnify_list_analyses',
'arguments': {'study_accession': 'MGYS00006860', 'size': 5}
})
# 4. Get taxonomic profile from an analysis
taxonomy = tu.run_one_function({
'name': 'MGnify_get_taxonomy',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# 5. Get functional annotations
go_terms = tu.run_one_function({
'name': 'MGnify_get_go_terms',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
Find studies for a specific biome using MGnify's biome hierarchy:
# Browse biome hierarchy
biomes = tu.run_one_function({
'name': 'MGnify_list_biomes',
'arguments': {'lineage': 'root:Host-associated:Human', 'depth': 3}
})
# Search studies in a specific biome
studies = tu.run_one_function({
'name': 'MGnify_search_studies',
'arguments': {'biome': 'root:Host-associated:Human:Digestive system', 'size': 10}
})
# Look up ENVO ontology terms for environment metadata
envo = tu.run_one_function({
'name': 'ols_search_terms',
'arguments': {'query': 'human gut', 'ontology': 'envo', 'rows': 5}
})
Get the microbial composition of a metagenomics sample:
# Get analyses for a study
analyses = tu.run_one_function({
'name': 'MGnify_list_analyses',
'arguments': {'study_accession': 'MGYS00006860', 'size': 3}
})
# Get taxonomy for a specific analysis
taxonomy = tu.run_one_function({
'name': 'MGnify_get_taxonomy',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# Returns organisms with lineage, abundance counts, and taxonomy rank
Evaluate metagenome-assembled genomes (MAGs):
# Search for genomes from a specific taxon
genomes = tu.run_one_function({
'name': 'MGnify_search_genomes',
'arguments': {'search': 'Faecalibacterium prausnitzii', 'size': 5}
})
# Get quality metrics for a genome
genome = tu.run_one_function({
'name': 'MGnify_get_genome',
'arguments': {'genome_accession': 'MGYG000000001'}
})
# Returns completeness, contamination, N50, genome length, taxonomy
# Cross-reference with GTDB taxonomy
gtdb = tu.run_one_function({
'name': 'GTDB_search_genomes',
'arguments': {'operation': 'search_genomes', 'query': 'Faecalibacterium', 'items_per_page': 5}
})
Discover functional potential of a metagenome:
# GO terms from an analysis
go_terms = tu.run_one_function({
'name': 'MGnify_get_go_terms',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
# InterPro domains
interpro = tu.run_one_function({
'name': 'MGnify_get_interpro',
'arguments': {'analysis_accession': 'MGYA00612683'}
})
Combine metagenomics data with published research:
# Find relevant publications
papers = tu.run_one_function({
'name': 'EuropePMC_search_articles',
'arguments': {'query': 'gut microbiome AND Faecalibacterium AND (IBD OR "Crohn")', 'limit': 10}
})
# Find sequencing data in ENA
ena_studies = tu.run_one_function({
'name': 'ENAPortal_search_studies',
'arguments': {'query': 'description="gut microbiome 16S"', 'limit': 5}
})
Key biome lineages (use MGnify_list_biomes to discover others):
root:Host-associated:Human:Digestive systemroot:Host-associated:Human:Oral / root:Host-associated:Human:Skinroot:Environmental:Terrestrial:Soilroot:Environmental:Aquatic:Marine / root:Environmental:Aquatic:Freshwaterroot:Engineered:WastewaterMGnify: studies=MGYS*, analyses=MGYA*, genomes=MGYG*. ENA studies=PRJEB*. GTDB genomes=GCA_*. ENVO terms=ENVO:* (e.g. ENVO:00002041).
Microbiome analysis starts with: what is the question? LOOK UP DON'T GUESS — always check the study type and sequencing method before interpreting results.
Decision tree for data type:
Before calling any tool, determine which data type the user has via MGnify_get_study_detail — the pipeline type (amplicon vs shotgun) determines which analyses are valid. Do not apply 16S diversity metrics to metagenomic data or vice versa.
Dysbiosis (microbial imbalance) is context-dependent — there is no universal "healthy" microbiome. LOOK UP DON'T GUESS — compare to study-matched controls, not general population references.
MGnify_get_taxonomy to get community profiles, then assess richness and evenness.GTDB_get_species and literature via EuropePMC_search_articles.MGnify_get_go_terms and MGnify_get_interpro for the affected samples.MGnify_get_taxonomy + GTDB_search_genomes.MGnify_get_go_terms + MGnify_get_interpro + kegg_search_pathway.| Tier | Description | Example | |------|-------------|---------| | T1 | Replicated finding across multiple cohorts with consistent effect | Reduced Faecalibacterium in IBD (>10 independent studies) | | T2 | Single well-powered study (n > 100) with appropriate controls | Metformin-associated Akkermansia enrichment in a controlled trial | | T3 | Pilot study or observational association, small sample size | Taxonomic shift in n=15 case-control, no validation cohort | | T4 | Computational prediction or single-sample observation | Novel MAG with predicted function, no culture confirmation |
Alpha diversity (within-sample): Shannon index measures richness and evenness. Higher Shannon (>3.0 for gut) suggests a stable community. Reduced alpha diversity is associated with dysbiosis (IBD, antibiotics). Always compare to study-matched controls — diversity varies by body site, sequencing depth, and geography.
Beta diversity (between-sample): Bray-Curtis (abundance-based) or UniFrac (phylogenetic). PERMANOVA p < 0.05 with R-squared > 0.05 indicates condition-driven clustering. Low R-squared (<0.02) even with significant p suggests the effect is small relative to inter-individual variation. Choose weighted UniFrac when abundant taxa matter most; unweighted when rare taxa are important.
Taxonomic composition: Relative abundance at phylum level (Firmicutes/Bacteroidetes ratio) is a coarse indicator; genus- or species-level resolution is preferred. A taxon present at >1% relative abundance in multiple samples is reliably detected. Taxa at <0.1% may be noise or sequencing artifacts. GTDB taxonomy may reclassify NCBI names (e.g., Firmicutes split into multiple phyla).
Functional profiling: GO terms and InterPro domains from MGnify reflect the metabolic potential (not necessarily activity) of the community. Enrichment of specific pathways (e.g., butyrate production, LPS biosynthesis) should be interpreted alongside taxonomic data to identify which organisms contribute the functions.
A complete microbiome report should answer:
MGYS, analyses with MGYA, genomes with MGYGMGnify_list_biomes first to find the correct biome lineage stringMGnify_get_taxonomy returns phylum-level to species-level compositionsize parameter in MGnify tools controls results per page (max 100)tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.