skills/tooluniverse-sdk/SKILL.md
Build AI scientist systems with the ToolUniverse Python SDK for scientific research. Covers the 3 calling patterns (`tu.run` portable dict API, `tu.tools.X` function API, direct class instantiation), tool loading, batch execution, MCP server integration, and embedding-based tool search. Use for SDK programming, custom tool composition, benchmarking pipelines, and integrating ToolUniverse into research workflows.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-sdkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
3 calling patterns -- start with pattern 1:
tu.run({"name": ..., "arguments": ...}) -- single tool call, dict API (most portable)tu.tools.ToolName(param=value) -- function API (recommended for interactive use)pip install tooluniverse # Standard
pip install tooluniverse[embedding] # Embedding search (GPU)
pip install tooluniverse[all] # All features
export OPENAI_API_KEY="sk-..." # Required for LLM tool search
export NCBI_API_KEY="..." # Optional
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools() # REQUIRED before any tool call
# Find tools
tools = tu.run({"name": "Tool_Finder_Keyword", "arguments": {"description": "protein structure", "limit": 10}})
# Execute (dict API)
result = tu.run({"name": "UniProt_get_entry_by_accession", "arguments": {"accession": "P05067"}})
# Execute (function API)
result = tu.tools.UniProt_get_entry_by_accession(accession="P05067")
calls = [
{"name": "UniProt_get_entry_by_accession", "arguments": {"accession": "P05067"}},
{"name": "UniProt_get_entry_by_accession", "arguments": {"accession": "P12345"}},
]
results = tu.run_batch(calls)
def drug_discovery_pipeline(disease_id):
tu = ToolUniverse(use_cache=True)
tu.load_tools()
try:
targets = tu.tools.OpenTargets_get_associated_targets_by_disease_efoId(efoId=disease_id)
compound_calls = [
{"name": "ChEMBL_search_molecule_by_target",
"arguments": {"target_id": t['id'], "limit": 10}}
for t in targets['data'][:5]
]
compounds = tu.run_batch(compound_calls)
return {"targets": targets, "compounds": compounds}
finally:
tu.close()
# Caching
tu = ToolUniverse(use_cache=True)
stats = tu.get_cache_stats()
tu.clear_cache()
# Hooks (auto-summarization of large outputs)
tu = ToolUniverse(hooks_enabled=True)
# Load specific categories
tu.load_tools(categories=["proteins", "drugs"])
load_tools() before using any toolstools['tools'] after isinstance(tools, dict) checkUniProt_get_entry_by_accession not uniprot_get_...tu.all_tool_dict["ToolName"]['parameter'].get('required', [])from tooluniverse.exceptions import ToolError, ToolUnavailableError, ToolValidationError
try:
result = tu.tools.some_tool(param="value")
except ToolUnavailableError:
... # Tool service down
except ToolValidationError as e:
tool_info = tu.all_tool_dict["some_tool"]
print(f"Required: {tool_info['parameter'].get('required', [])}")
| Category | Tools | Use Cases | |----------|-------|-----------| | Proteins | UniProt, RCSB PDB, AlphaFold | Protein analysis, structure | | Drugs | DrugBank, ChEMBL, PubChem | Drug discovery, compounds | | Genomics | Ensembl, NCBI Gene, gnomAD | Gene analysis, variants | | Diseases | OpenTargets, ClinVar | Disease-target associations | | Literature | PubMed, Europe PMC | Literature search | | ML Models | ADMET-AI, AlphaFold | Predictions, modeling | | Pathways | KEGG, Reactome | Pathway analysis |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.