
Diagnostic test / biomarker accuracy — sensitivity, specificity, PPV, NPV, likelihood ratios, accuracy from a 2x2 table; ROC curve, AUC, and the optimal cutoff (Youden) for a continuous biomarker; and post-test probability via Bayes. Use when you have test results vs a gold standard (binary 2x2, or a continuous score + true labels) and need to judge how good the test is, pick a threshold, or compute the probability of disease given a result. Emphasizes the prevalence-dependence of PPV/NPV.
Statistical modeling — linear/logistic/ordinal/Poisson regression, ANOVA, Kruskal-Wallis, chi-square, Mann-Whitney, Cox survival, spline fits (R `ns()`), odds ratios, Cohen's d, F-statistic, p-value computation. Specializes in clinical-trial AE analysis (SDTM DM/AE), severity ordinal regression, and per-feature stat workflows.
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
Enzyme kinetics — Michaelis-Menten Km, Vmax, kcat (turnover), and kcat/Km (catalytic efficiency / specificity constant) from substrate-velocity data, plus inhibition-mechanism analysis (competitive / uncompetitive / non-competitive, Ki). Fits the MM equation by nonlinear regression (and reports Lineweaver-Burk for reference). Use when you have substrate concentrations and initial reaction velocities and need kinetic parameters or to classify an inhibitor. NOT for BRENDA database lookups of published constants (use the BRENDA tools).
ToolUniverse plugin router. STEP 1 BEFORE ANY ANALYSIS: if the data folder contains `*_executed.ipynb`, run `tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}'` to extract its cell outputs and apply EVERY filter/sample-exclusion the notebook used — even when the question says 'Using DESeq2/Run X/Compute Y' (this describes the METHOD the notebook used, not a request to rerun). The notebook's cell outputs are the only published authoritative answers; reimplementing or reading stale pre-computed CSVs in the data folder produces different numbers because of outlier-sample removal, library version, and filter steps you don't see by skimming. STEP 2 routing — pick a sub-skill name from this exact list (never invent): tooluniverse-rnaseq-deseq2 (RNA/miRNA-seq DE, correlation, PCA, clustering, dispersion), tooluniverse-gene-enrichment (GO/KEGG/Reactome/GSEA/pathway enrichment), tooluniverse-statistical-modeling (regression, ANOVA, ordinal/logistic, chi-square, correlation, power), tooluniverse-image-analysis (microscopy, colony, fluorescence, dose-response, .tif — including ANOVA / Dunnett / power-analysis on image-derived measurements), tooluniverse-epigenomics (DNA methylation, CpG, m6A, MeRIP-seq, bisulfite, ChIP-seq, chromatin), tooluniverse-sequence-analysis (FASTQ, Trimmomatic, BWA, samtools, coverage), tooluniverse-variant-analysis (VCF, VAF, SNP, mutation), tooluniverse-phylogenetics (treeness, PhyKIT, parsimony), tooluniverse-single-cell (scRNA, h5ad, scanpy), tooluniverse-crispr-screen-analysis (MAGeCK, sgRNA), tooluniverse-proteomics-analysis (mass spec, TMT). Use for CSV/Excel/VCF/FASTA/h5ad and any biology/chemistry/medicine analysis question.
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
Drug-combination synergy analysis — quantify whether two drugs together are synergistic, additive, or antagonistic using the standard reference models (Bliss independence, HSA / highest single agent, Loewe additivity, ZIP, and the Chou-Talalay Combination Index). Use when you have measured single-drug and combination effects (inhibition/viability) and need a synergy score. Explains which model to use, what data each one needs, and how to read the score. NOT for looking up pre-computed synergy in a database (use the SYNERGxDB tool / cell-line-profiling skill).
Diagnostic test / biomarker accuracy — sensitivity, specificity, PPV, NPV, likelihood ratios, accuracy from a 2x2 table; ROC curve, AUC, and the optimal cutoff (Youden) for a continuous biomarker; and post-test probability via Bayes. Use when you have test results vs a gold standard (binary 2x2, or a continuous score + true labels) and need to judge how good the test is, pick a threshold, or compute the probability of disease given a result. Emphasizes the prevalence-dependence of PPV/NPV.
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).
Drug-combination synergy analysis — quantify whether two drugs together are synergistic, additive, or antagonistic using the standard reference models (Bliss independence, HSA / highest single agent, Loewe additivity, ZIP, and the Chou-Talalay Combination Index). Use when you have measured single-drug and combination effects (inhibition/viability) and need a synergy score. Explains which model to use, what data each one needs, and how to read the score. NOT for looking up pre-computed synergy in a database (use the SYNERGxDB tool / cell-line-profiling skill).
Dose-response / concentration-response curve fitting — IC50, EC50, Hill slope, Emax/Emin efficacy, and relative potency from paired concentration vs response data (enzyme/cell assays, drug screening, agonist/antagonist pharmacology). Fits the 4-parameter logistic (Hill sigmoidal) model. Use when you have concentrations + responses and need a potency value, to compare two compounds' potency, or to judge curve quality. NOT for image-derived dose-response (use tooluniverse-image-analysis) and NOT for survival/regression (use tooluniverse-statistical-modeling).
ToolUniverse plugin router. STEP 1 BEFORE ANY ANALYSIS: if the data folder contains `*_executed.ipynb`, run `tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}'` to extract its cell outputs and apply EVERY filter/sample-exclusion the notebook used — even when the question says 'Using DESeq2/Run X/Compute Y' (this describes the METHOD the notebook used, not a request to rerun). The notebook's cell outputs are the only published authoritative answers; reimplementing or reading stale pre-computed CSVs in the data folder produces different numbers because of outlier-sample removal, library version, and filter steps you don't see by skimming. STEP 2 routing — pick a sub-skill name from this exact list (never invent): tooluniverse-rnaseq-deseq2 (RNA/miRNA-seq DE, correlation, PCA, clustering, dispersion), tooluniverse-gene-enrichment (GO/KEGG/Reactome/GSEA/pathway enrichment), tooluniverse-statistical-modeling (regression, ANOVA, ordinal/logistic, chi-square, correlation, power), tooluniverse-image-analysis (microscopy, colony, fluorescence, dose-response, .tif — including ANOVA / Dunnett / power-analysis on image-derived measurements), tooluniverse-epigenomics (DNA methylation, CpG, m6A, MeRIP-seq, bisulfite, ChIP-seq, chromatin), tooluniverse-sequence-analysis (FASTQ, Trimmomatic, BWA, samtools, coverage), tooluniverse-variant-analysis (VCF, VAF, SNP, mutation), tooluniverse-phylogenetics (treeness, PhyKIT, parsimony), tooluniverse-single-cell (scRNA, h5ad, scanpy), tooluniverse-crispr-screen-analysis (MAGeCK, sgRNA), tooluniverse-proteomics-analysis (mass spec, TMT). Use for CSV/Excel/VCF/FASTA/h5ad and any biology/chemistry/medicine analysis question.
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).
Enzyme kinetics — Michaelis-Menten Km, Vmax, kcat (turnover), and kcat/Km (catalytic efficiency / specificity constant) from substrate-velocity data, plus inhibition-mechanism analysis (competitive / uncompetitive / non-competitive, Ki). Fits the MM equation by nonlinear regression (and reports Lineweaver-Burk for reference). Use when you have substrate concentrations and initial reaction velocities and need kinetic parameters or to classify an inhibitor. NOT for BRENDA database lookups of published constants (use the BRENDA tools).
Universal data access patterns for downloading and parsing scientific data when ToolUniverse tools don't cover the source, only return metadata, or you need bulk records. Use for VCF/h5ad/BAM/SDF/GCT parsing, multi-step API workflows (search to filter to download to parse), thousands of records at once, or sources with no dedicated tool. Write Python code via Bash for every step.
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
Statistical modeling — linear/logistic/ordinal/Poisson regression, ANOVA, Kruskal-Wallis, chi-square, Mann-Whitney, Cox survival, spline fits (R `ns()`), odds ratios, Cohen's d, F-statistic, p-value computation. Specializes in clinical-trial AE analysis (SDTM DM/AE), severity ordinal regression, and per-feature stat workflows.
Mendelian randomization (MR) causal inference — does an exposure, risk factor, or biomarker CAUSALLY affect a disease/outcome, using genetic variants as instrumental variables (IEU OpenGWAS / EpiGraphDB MR-EvE). Use this whenever the user asks if X causes Y, whether an observational association is actually causal or just correlation, if a biomarker/trait is a causal risk factor, wants to triangulate epidemiology against genetic evidence, or mentions Mendelian randomization, instrumental-variable analysis, two-sample MR, or genetic causal evidence — even if they never say "MR" (e.g. "is LDL cholesterol actually causal for heart disease?", "does BMI cause type 2 diabetes or just correlate?", "is CRP a causal driver of stroke?"). Covers trait-label resolution, MR effect direction/magnitude, instrument quality (MOE score), method agreement (IVW vs MR-Egger vs weighted median), bidirectional MR for reverse causation, and distinguishing causation from genetic correlation. Not for plain GWAS association lookups (use the GWAS skills) or fitting your own instruments from raw summary statistics.
Deep literature review — PubMed, EuropePMC, bioRxiv preprints, citation networks, evidence synthesis. Disambiguates queries, runs collision-aware searches, grades evidence T1-T4, and produces structured reports. Use for systematic literature review, meta-analysis evidence collection, and detailed answer-with-citations workflows.
Install the ToolUniverse Claude Code plugin in one step — provides MCP server with 1000+ scientific tools, 120+ research skills, slash commands, hooks, and the research agent. Use for first-time plugin install, troubleshooting plugin not loading, verifying MCP server connection, listing API key requirements, or configuring auto-update.
Dose-response / concentration-response curve fitting — IC50, EC50, Hill slope, Emax/Emin efficacy, and relative potency from paired concentration vs response data (enzyme/cell assays, drug screening, agonist/antagonist pharmacology). Fits the 4-parameter logistic (Hill sigmoidal) model. Use when you have concentrations + responses and need a potency value, to compare two compounds' potency, or to judge curve quality. NOT for image-derived dose-response (use tooluniverse-image-analysis) and NOT for survival/regression (use tooluniverse-statistical-modeling).
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
Deep literature review — PubMed, EuropePMC, bioRxiv preprints, citation networks, evidence synthesis. Disambiguates queries, runs collision-aware searches, grades evidence T1-T4, and produces structured reports. Use for systematic literature review, meta-analysis evidence collection, and detailed answer-with-citations workflows.
Mendelian randomization (MR) causal inference — does an exposure, risk factor, or biomarker CAUSALLY affect a disease/outcome, using genetic variants as instrumental variables (IEU OpenGWAS / EpiGraphDB MR-EvE). Use this whenever the user asks if X causes Y, whether an observational association is actually causal or just correlation, if a biomarker/trait is a causal risk factor, wants to triangulate epidemiology against genetic evidence, or mentions Mendelian randomization, instrumental-variable analysis, two-sample MR, or genetic causal evidence — even if they never say "MR" (e.g. "is LDL cholesterol actually causal for heart disease?", "does BMI cause type 2 diabetes or just correlate?", "is CRP a causal driver of stroke?"). Covers trait-label resolution, MR effect direction/magnitude, instrument quality (MOE score), method agreement (IVW vs MR-Egger vs weighted median), bidirectional MR for reverse causation, and distinguishing causation from genetic correlation. Not for plain GWAS association lookups (use the GWAS skills) or fitting your own instruments from raw summary statistics.
Install the ToolUniverse Claude Code plugin in one step — provides MCP server with 1000+ scientific tools, 120+ research skills, slash commands, hooks, and the research agent. Use for first-time plugin install, troubleshooting plugin not loading, verifying MCP server connection, listing API key requirements, or configuring auto-update.
Clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Use for VUS classification, pathogenicity assessment with cited criteria, structure-based variant impact (AlphaFold/PDB), and producing clinical-grade variant reports for return of results or molecular tumor boards.
Functional annotation of protein variants — ProtVar structural/functional context, ClinVar clinical classifications, gnomAD population frequencies, CADD deleteriousness, ClinGen gene-disease validity. Use for variant annotation pipelines, missense effect prediction, and protein-level variant interpretation with functional context.
Comprehensive drug-target intelligence — tissue expression (GTEx, HPA), pathways, protein interactions (STRING), variant landscape (ClinVar, gnomAD), druggability (DGIdb, ChEMBL approved drugs). 9 parallel research paths with citations. Use for full target profile reports, target characterization for drug discovery, and 'tell me about target X' queries.
Computational vaccine candidate design: peptide/subunit vaccines via MHC-I/MHC-II epitope prediction (IEDB), population HLA coverage optimization, B-cell epitope identification, and cross-strain conservation analysis. Use for vaccine epitope prediction, HLA allele coverage, multi-epitope construct design, and immunogenicity assessment. Combines predicted MHC binding with experimentally validated IEDB epitopes for higher-confidence designs.
Structural variant (SV) clinical interpretation: deletions, duplications, inversions, translocations, complex rearrangements. Applies ACMG-adapted criteria with ClinGen HI/TS dosage scores, gnomAD frequencies, and ClinVar evidence. Produces 5-tier classification with explicit per-criterion evidence. Use for clinical genomics SV review, dosage-sensitivity assessment, breakpoint analysis, and CNV pathogenicity calls. Gene-dosage-driven reasoning.
Structural biology plus proteomics integration for drug target validation. Combines PDB experimental structures, AlphaFold predictions, GPCRdb, SAbDab antibody structures, ProteinsPlus binding-site prediction, and BindingDB ligand-affinity data. Use for druggability assessment, binding-site characterization, ligand-pocket analysis, structural-confidence scoring (resolution, pLDDT), and antibody-target interface analysis.
Validate a variant-effect predictor (AlphaMissense, ESM-C SAE, ESM logits, EVE, conservation scores, or any per-variant numeric score) against experimental deep mutational scanning (DMS) data. Computes per-variant predictor scores, splits variants into neutral vs disruptive groups by DMS effect, runs a Mann-Whitney U test on the predictor scores, and sweeps the stratification thresholds for robustness. Use when you need to know whether a predictor's scores track real functional disruption on a specific protein.
End-to-end variant-to-mechanism analysis — trace a variant (rsID/coordinates) through regulatory context, target gene(s), molecular pathway(s), and phenotypic consequences. Integrates 7+ databases across 3 evidence layers (regulatory, molecular, disease) for a mechanistic model. Use for GWAS-hit-to-mechanism, eQTL-causal-gene tracing, and full causal-chain reports.
Stem cell, iPSC, and organoid research — pluripotency markers, differentiation protocol pathways, lineage commitment factors, organoid model selection. Use for iPSC characterization, differentiation protocol design via developmental-pathway recapitulation, and organoid-model selection for disease modeling.
Drug and chemical toxicity assessment via adverse outcome pathways (AOPs), real-world FAERS adverse event signals, FDA labels, and toxicogenomic associations. Triangulates molecular initiating event to cellular outcome to organ-level toxicity to clinical adverse event. Use for hepatotoxicity/cardiotoxicity/nephrotoxicity prediction and toxicology reports.
VCF and variant analysis — parsing, annotation, classification (synonymous, missense, frameshift, stop_gained), VAF filtering, coding vs non-coding categorization, multi-condition variant comparison. Use for VCF parsing, variant fraction calculations (denominator = coding subset only, NOT all variants), and per-sample mutation profiling.
Systems biology and pathway analysis integrating Reactome, KEGG, WikiPathways, BioCarta, NCI-Nature Pathway Interaction Database. Multi-database pathway enrichment, protein-pathway relationships, network reasoning. Use for pathway analysis on a gene list, multi-source pathway concordance, and systems-level interpretation across databases.
Biological sequence analysis — gene/protein sequence retrieval (NCBI, Ensembl, UniProt), nucleotide/protein search, ortholog discovery, and FASTQ QC + alignment workflows (Trimmomatic, BWA, samtools, coverage depth). Use for sequence retrieval, sequence comparison, FASTQ QC analysis, and read alignment pre-processing.
Microbiome and metagenomics analysis using MGnify, GTDB taxonomy, ENA sequencing data, and EuropePMC literature. Covers taxonomic classification, genome quality assessment, biome-clinical phenotype linkage, and pathway interpretation. Use for amplicon/shotgun metagenomics study analysis.
Microbiome research using MGnify, GTDB, ENA, OLS (ENVO biomes), and EuropePMC. Covers study discovery, taxonomic profiling, host-microbe interaction analysis, and biome-by-condition queries. Use for microbiome study selection, organism-environment associations, and clinical-microbiome literature review. Distinct from analytical workflow (use tooluniverse-metagenomics-analysis for that).
Cross-species genetic analysis using model organism databases (MGI mouse, ZFIN zebrafish, FlyBase fruit fly, WormBase worm, SGD yeast, RGD rat, GBIF taxonomy). Maps human genes to orthologs, retrieves phenotype/expression/functional data, assesses gene function conservation, and identifies the best animal model for studying a human gene or disease.
Multi-omics integration — orchestrate per-layer analysis (transcriptomics, proteomics, epigenomics, genomics, metabolomics) then perform cross-omics correlation, multi-omics clustering, and pathway-level integration. Use for integrative systems-biology analysis, multi-modal disease characterization, and cross-omics biomarker discovery.
Metabolomics research — metabolite identification, study analysis, and database searches across HMDB, MetaboLights, Metabolomics Workbench, KEGG. Use for annotating mass-spec features to known metabolites, finding metabolomics studies of a disease, and structured metabolomics research reports with metabolite-pathway mapping.
Neuroscience research workflows: neuroanatomy, neural circuits, neurotransmitter biology, neurological/psychiatric disease genetics, neural-protein function. Uses Allen Brain Atlas, WormBase (C. elegans connectome), UniProt for neural proteins, PubMed for primary literature. Use for brain-region biology, neural development, neurodegeneration mechanisms (Alzheimer's, Parkinson's, ALS), and synaptic-protein characterization.
Non-coding RNA analysis — miRNAs (miRBase, miRDB targets), lncRNAs (LNCipedia, RNAcentral), circRNAs, snoRNAs, and other ncRNA classes. Distinct mechanisms per class — miRNAs repress mRNA; lncRNAs scaffold/decoy/enhance. Use for ncRNA function prediction, miRNA-target prediction, lncRNA functional annotation, and ncRNA-disease association queries.
Compound-target-disease network construction and analysis for drug repurposing, polypharmacology discovery, and multi-target drug design. Uses STRING, BioGRID, ChEMBL, DGIdb, OMIM, OpenTargets. Use for off-target effect prediction, network-based drug repurposing, and identifying molecules with desired multi-target profile.
Comprehensive disease characterization across genomics, transcriptomics, proteomics, and pathways for systems-level understanding. Identifies therapeutic opportunities and biomarker candidates by integrating multi-layer molecular data. Use for full-omics disease deep-dive reports, mechanism mapping, and biomarker-and-target identification from multi-omics data.
Connect GWAS variants to biological pathways and druggable targets. Maps GWAS hits to causal genes (via fine-mapping/eQTL), then to pathways (Reactome, KEGG, WikiPathways), then to existing drugs hitting those pathways. Use for pathway-level disease mechanisms, druggable-pathway prioritization from GWAS, SNP-to-pathway-to-target tracing, and tissue-specific eQTL evidence for drug target hypotheses.
Organic chemistry reasoning guide for reaction product prediction, mechanism analysis (electrophilic/nucleophilic substitution, addition, elimination, pericyclic, radical), and spectroscopy interpretation (1H/13C NMR, IR, MS). Reasons from first principles (electron flow, kinetic vs thermodynamic) rather than pattern-matching named reactions. Use for organic synthesis problems and mechanism explanations.
Pharmacogenomics (PGx) research — drug-gene interactions (CPIC, PharmGKB), CPIC dosing guidelines, variant-drug-response associations, ethnic-allele-frequency considerations, and metabolizer-status scoring. Use for PGx-informed dosing recommendations, CYP/HLA pharmacogenomic allele interpretation, and clinically-actionable PGx report generation.
Plant genomics and biology research — PlantReactome pathways, Ensembl Plants gene structure, POWO species taxonomy, UniProt annotation, KEGG plant pathways. Handles polyploidy (wheat hexaploidy etc.) and homeologous gene copies. Use for crop-gene annotation, plant secondary metabolism queries, and plant-disease/stress-response biology.
Drug safety and adverse event analysis — FAERS spontaneous-report mining, FDA black-box warnings, signal detection (PRR, ROR, IC), risk factors by demographic/comorbidity, and label change tracking. Use for post-market safety surveillance, AE signal investigation, drug-AE association strength scoring, and pharmacovigilance reports.
Phylogenetic analysis — tree analysis, treeness, saturation (PhyKIT), parsimony-informative sites, alignment gap analysis, MAFFT alignment, DVMC, long-branch detection, BUSCO orthologs. Uses PhyKIT, Biopython, DendroPy. Use for phylogenetic tree QC, multi-gene phylogenomics, evolutionary-rate analysis, and comparative-genomics studies.
Population genetics using the 1000 Genomes Project (IGSR) — superpopulation/population search, sample metadata, variant frequencies across AFR/AMR/EAS/EUR/SAS, ancestry-specific analyses. Use for ancestry comparison, population-aware allele frequency lookups, and 1000-Genomes-cohort-specific analyses (distinct from gnomAD which has different sample composition).
Build and interpret polygenic risk scores (PRS) for complex diseases using GWAS summary statistics. Covers PRS construction (clumping/thresholding, PRS-CS), validation in independent cohorts, ancestry-aware adjustment, and clinical interpretation (population-relative risk, not absolute prediction). Use for PRS-based risk stratification.
Cancer treatment recommendations from molecular profile (mutations + cancer type + biomarkers) — FDA-approved + investigational therapies, resistance mechanisms, matching clinical trials, prognosis. Uses CIViC, ClinVar, OpenTargets, ClinicalTrials.gov. Use for tumor-board treatment recommendations, evidence-tiered actionability assessment, and FDA-precedent-driven therapy selection.
Population genetics analysis — allele frequencies (gnomAD, 1000 Genomes), Hardy-Weinberg equilibrium testing, Fst between populations, GWAS associations, evolutionary constraint scores. Use for cross-population variant comparison, ancestry-aware allele frequency lookups, and population-level evolutionary analysis.
Patient stratification for precision medicine — integrate genomic, clinical, and therapeutic data to split patients into responder/non-responder groups, risk tiers, or treatment-decision groups. Use for stratification-by-biomarker, treatment-selection logic, and personalized therapeutic strategy reports per patient subgroup.
Propose the mechanism by which a missense variant causes loss-of-function (LoF), synthesizing evidence from 5 independent layers: AlphaMissense pathogenicity, AlphaFold structural context, ESMC sequence likelihood, SAE feature disruption, and DynaMut2 stability ΔΔG. Distinguishes 'structural stability LoF' (mis-folding) from 'direct functional disruption' (catalytic / binding / PTM site damage). Use for coding missense variants where you need a mechanistic causal model, not just a pathogenicity score.
Given a PDB structure, produce a per-residue annotation table: which residues sit at a binding interface (vs a partner chain), which line a ligand pocket, which are buried (core) vs solvent-exposed (surface), and optionally secondary structure. This is the structural track drawn under a DMS heatmap and the structural prior SAE feature drops are read against. Use when you need to anchor a variant-interpretation or DMS analysis to the protein's actual physical context.
Protein 3D structure prediction from sequence — ESMFold de novo prediction, AlphaFold database retrieval, experimental structures from RCSB, ProtVar variant impact assessment, ProtParam sequence properties. Use for structure prediction when no experimental structure exists, fold-confidence scoring, and structure-guided variant interpretation.
Post-translational modification (PTM) analysis — phosphorylation, ubiquitination, acetylation, glycosylation, methylation. Uses iPTMnet (sites + enzymes), ProtVar (functional consequences), UniProt (baseline), STRING, ELM (linear motifs), MassIVE/ProteomeXchange (experimental). Use for PTM site annotation, kinase-substrate identification, and PTM-disease associations.
Interpret a missense variant via ESMC-6B Sparse Autoencoder (SAE) feature activations. For a given protein + variant, computes which interpretable SAE features (catalytic, ligand-binding, PTM, structural motif, domain, etc.) are lost or gained at the mutation site. Use when standard pathogenicity scores (AlphaMissense, ClinVar) say a variant is damaging but you need a MECHANISTIC explanation — e.g. 'why is this variant LoF?' Complements (does not replace) variant-interpretation and variant-to-mechanism skills, which focus on ACMG classification or regulatory mechanism.
Protein structure retrieval from RCSB PDB, PDBe, and AlphaFold with disambiguation, quality assessment (resolution, R-factor, pLDDT), and metadata. Distinguishes high-quality experimental (X-ray under 2 Angstrom) vs predicted vs medium-quality structures. Use for fetching protein structures, structure-quality comparison, and selecting structures for drug design or modeling.
Protein-protein interaction (PPI) network analysis — STRING (predicted + experimental), BioGRID (curated), SASBDB (small-angle scattering). Distinguishes physical interactions (binding) from functional associations (co-expression, co-regulation). Use for interactome queries, complex partner identification, and pathway-level interaction analysis.
Mass-spec proteomics analysis — protein identification, quantification (LFQ, TMT, iTRAQ), differential expression (tumor vs normal, treatment vs control), PTM identification, and pathway enrichment on protein lists. Use when you have proteomics MS output, asking about protein abundance differences, or doing systems-level proteomic interpretation.
AI-guided de novo protein design — RFdiffusion backbone generation, ProteinMPNN sequence design, structure validation (pLDDT, pTM, MPNN scores). Use for designing therapeutic protein binders, novel scaffolds, enzyme variants, and miniprotein/protein-interface design before experimental validation.
Find and retrieve proteomics datasets from MassIVE and ProteomeXchange. Search by species, keyword, or accession; retrieve detailed metadata (instruments, publications, species, PTMs studied). Use for locating public proteomics datasets to reanalyze, comparing instrument/protocol coverage across studies, and pre-download dataset evaluation.
Rare disease differential diagnosis from patient phenotype — HPO term matching to candidate diseases (Orphanet, OMIM), gene panel prioritization, ACMG variant interpretation, and structure-based variant analysis. Use for diagnostic odyssey assistance, phenotype-to-disease ranking, and genetic-counseling differential generation.
Rare disease genomics — disease identification (Orphanet), causative gene discovery, gene-disease validity (GenCC), variant interpretation (ClinVar), and translational research (ClinicalTrials.gov, drug repurposing for orphans). Use for rare-disease-gene curation, novel-gene-discovery analysis, and rare-disease drug-development support.
Transcription factor binding, cis-regulatory elements (cCREs), chromatin accessibility, and regulatory annotation using JASPAR (motifs), ENCODE (cCREs, ChIP-seq), RegulomeDB (regulatory variant scoring), UCSC. Use for regulatory element annotation, TF-binding-site prediction, and regulatory-region functional impact assessment.
Non-coding/regulatory variant interpretation — GWAS association lookup, eQTL evidence (GTEx), chromatin state (ENCODE), regulatory variant scoring (RegulomeDB, CADD), and TF-binding disruption. Use for non-coding GWAS hit interpretation, eQTL-based gene assignment, and regulatory mechanism reasoning. Distinct from coding-variant tools.
Given a set of residues in a protein, explain WHY they are functionally critical by combining structural context (binding interface, ligand pocket, core, secondary structure), UniProt features (active sites, binding sites, PTM sites, disulfides), optional SAE feature evidence, and optional DMS data. Accepts residues from any source: DMS hotspots (top-K by max effect), ClinVar recurrent variants, literature-reported hot regions, evolutionarily conserved positions, or user-curated lists. Returns a per-cluster mechanism call: catalytic / ligand-binding / interface / structural-core / PTM / regulatory / unknown.
Build AI scientist systems with the ToolUniverse Python SDK for scientific research. Covers the 3 calling patterns (`tu.run` portable dict API, `tu.tools.X` function API, direct class instantiation), tool loading, batch execution, MCP server integration, and embedding-based tool search. Use for SDK programming, custom tool composition, benchmarking pipelines, and integrating ToolUniverse into research workflows.
Retrieve DNA/RNA/protein sequences from NCBI and ENA with disambiguation. Quality hierarchy: RefSeq (NM_/NP_) > RefSeq predicted (XM_/XP_) > GenBank submissions. Use for fetching specific sequences by accession, gene-symbol-to-sequence lookup, transcript-isoform retrieval, and curated-vs-raw-submission preference.
Small molecule identification, characterization, and procurement — PubChem, ChEMBL, BindingDB, ADMET-AI, SwissADME, eMolecules, Enamine. Covers compound name to structure to activity to ADMET properties to commercial sourcing. Use for chemical biology, lead identification, probe selection, and the full small-molecule discovery pipeline.
Spatial multi-omics interpretation pipeline. Transforms spatially variable genes (SVGs), domain annotations, and tissue context into biological insights via domain-by-domain characterization, cell-type composition, spatial gene expression patterns, RNA+protein+metabolite integration. Use for Visium, MERFISH, seqFISH, Slide-seq, spatial proteomics, and spatial multi-omics interpretation. Goes beyond statistics to disease mechanisms and therapeutic opportunities.
RNA-seq differential expression analysis with DESeq2 — DEG lists, fold changes, dispersion estimation, design formulas including covariates, multi-condition contrasts, and Venn-set operations across groups. Use when you have a count matrix + metadata, want to find DEGs, or need dispersion/PCA/clustering analysis. Includes RULE ZERO precedence (read executed.ipynb if present).
Single-cell RNA-seq analysis with scanpy/anndata — h5ad data loading, QC (mitochondrial fraction, doublets), normalization, dimensionality reduction (PCA, UMAP, t-SNE), clustering (Leiden, Louvain), marker gene identification, cell-type annotation, pseudotime/trajectory analysis. Use for any scRNA-seq workflow.
Spatial transcriptomics analysis — Visium, MERFISH, seqFISH, Slide-seq. Maps gene expression to tissue architecture, identifies spatially variable genes (SVGs), tissue-domain segmentation, and cell-cell interaction inference. Use for spatial gene-expression questions, tissue architecture analysis, and SVG identification.
Find commercial sources for chemical compounds — PubChem/ChEMBL identity resolution then vendor catalog search across ZINC, Enamine, eMolecules, Mcule. Compares pricing, availability, and identifies purchasable analogs when an exact compound is not in stock. Use for chemical procurement, virtual library curation, and 'where can I buy X' questions for synthesis planning.
Cancer cell-line selection and profiling for experimental model choice. Cross-references DepMap, Cellosaurus, COSMIC, PharmacoDB to deliver identity verification, mutation/CNV profile, gene dependencies, drug sensitivities, and druggable targets. Use to answer 'which cell line should I use for studying gene X?' or 'is this cell line a good model for cancer Y?'. Outputs ranked recommendations with rationale, growth characteristics, and known pitfalls.
Analyze metabolomics data end-to-end — metabolite identification, quantification (TIC normalization, batch correction), differential analysis, and pathway interpretation. Use for processing mass-spec metabolomics output, normalization choice, untargeted metabolomics workflows, and integrating with other omics layers.
Integrate computed statistical results (DEGs, GWAS hits, associations) with biological context from ToolUniverse databases (UniProt, GO, Reactome, ClinVar, OpenTargets). Use for adding gene function/pathway/disease annotations to a result list, building biological narrative around statistical findings, and going beyond p-values to mechanism.
Search and retrieve clinical practice guidelines from 12+ authoritative sources — NICE, WHO, NCCN, AHA, ADA, SIGN, USPSTF, IDSA, NIH consensus, ESMO/ESC/EASL European societies, and US specialty associations. Use for evidence-graded treatment recommendations, dosing protocols, screening guidance, and authoritative-source-prioritized clinical guidance (NICE/WHO ranked above society guidelines).
End-to-end drug safety review integrating FDA labels, FAERS adverse event reports, PRR/ROR disproportionality, pharmacogenomic biomarkers, clinical trial data, and published literature. Use for regulatory drug safety reviews, comprehensive pharmacovigilance reports, label-vs-real-world AE comparison, and clinical decision support for drug safety.
Statistical fine-mapping of GWAS loci using credible sets (SuSiE, FINEMAP) and locus-to-gene scoring (Open Targets L2G). Identifies likely causal variants and target genes — distinct from positional 'nearest gene' which is often wrong. Use for prioritizing causal variants at GWAS hits, comparing fine-mapping methods, and converting lead SNPs to target genes.
Cross-species gene comparison and ortholog analysis. Integrates Ensembl Compara orthologs, NCBI Gene, UniProt, OLS, Monarch, and OpenTargets to identify orthologs, paralogs, sequence conservation, functional conservation across species, and lineage-specific gene gains/losses. Use for phylogenetic gene tracing, model-organism mapping, and evolutionary-genomics queries.
Solve quantitative problems in biophysics — pharmacokinetics (PK volume of distribution, clearance, half-life), epidemiology (R0, attack rate), toxicology (LD50, NOAEL), population genetics (Hardy-Weinberg, Fst), enzyme kinetics (Michaelis-Menten), thermodynamics. Use for first-principles quantitative biology calculations, dose calculations, exposure assessment, and biophysical-property estimation.
Analyze CRISPR-Cas9 genetic screens — MAGeCK gene-level scores, sgRNA count QC, replicate correlation, hit prioritization, and pathway GSEA on screen output. Use for genome-wide essentiality screens, synthetic-lethality discovery, dropout vs positive-selection screen analysis, target identification, and resistance-screen interpretation. Includes screen-QC and statistical thresholds.
Add custom local tools to ToolUniverse alongside the 1000+ built-in tools. Covers JSON-config tools (simplest, no code), Python class tools (REST/SOAP/GraphQL APIs, computational logic), and best-practices for return schemas. Use for wrapping new APIs, adding domain-specific computations, or contributing tools to the registry.
Universal data access patterns for downloading and parsing scientific data when ToolUniverse tools don't cover the source, only return metadata, or you need bulk records. Use for VCF/h5ad/BAM/SDF/GCT parsing, multi-step API workflows (search to filter to download to parse), thousands of records at once, or sources with no dedicated tool. Write Python code via Bash for every step.
Assess drug-drug interactions — CYP metabolic interactions (substrate/inhibitor/inducer), transporter (P-gp, BCRP, OATP) effects, pharmacodynamic synergy/antagonism, clinical significance scoring, and management recommendations. Use for polypharmacy review, prescribing decision support, and safety analysis when adding or switching drugs.
Trace drug mechanism of action — primary target → downstream signaling → pathway perturbation → tissue/organ effect → clinical outcome. Uses DrugBank, ChEMBL, KEGG, Reactome, STRING. Use for understanding how a drug works, identifying off-target effects, mechanism-based combination therapy design, and writing mechanism sections of reports.
Comprehensive drug profiling — mechanism, primary/secondary targets, drug interactions, clinical-trial status, adverse events (FAERS), pharmacogenomics, and approval history. Use for full drug investigation reports, 'tell me about drug X' queries, and assembling drug profiles for clinicians, researchers, or regulatory work.
Search and analyze electron microscopy data — cryo-EM density maps (EMDB), fitted atomic models (PDB), raw micrograph datasets (EMPIAR), and cryo-electron tomography volumes (CryoET Data Portal). Use for finding 3D structural data on a protein/complex, comparing experimental EM resolution to AlphaFold confidence, and accessing raw EM data for re-processing.
Drug regulatory and approval research — FDA substance registry, ATC/EPC classification, EMA decisions, generic-drug status, FDA Orange Book exclusivity, NDA/BLA pathways. Use for jurisdiction-aware approval status (FDA vs EMA), generic vs brand availability, exclusivity expiry tracking, and regulatory pathway selection. Always specifies the market when reporting status.
Identify drug repurposing candidates via target-based, compound-based, and disease-based strategies. Combines drug-target-disease network reasoning with mechanism rationale, clinical-trial precedent, and patent/regulatory feasibility. Use for hypothesis-generating repurposing for orphan diseases, finding existing drugs for new indications, and prioritizing candidates by evidence and feasibility.
Ecology, biodiversity, and conservation biology research — species identification (GBIF, NCBI Taxonomy), invasive species impact, ecosystem dynamics, conservation status (IUCN), niche ecology. Use for biodiversity questions, species comparison, invasion biology, conservation prioritization, and ecology-related literature search.
End-to-end observational epidemiology analysis — from research question (PECO Population/Exposure/Comparator/Outcome) to publication-ready statistical report. Covers cohort/case-control/cross-sectional design, regression with confounders, propensity scoring, sensitivity analysis. Writes Python code for every step. Use for epidemiology study analysis, NHANES/UK-Biobank-style analyses.
Genomics and epigenomics analysis: DNA methylation (CpG, 5mC, 5hmC, bisulfite, RRBS), m6A RNA modification (MeRIP-seq), ChIP-seq peaks, ATAC-seq accessibility, histone modifications, chromatin state, multi-omics integration. Combines pandas/scipy/pysam computation with ToolUniverse annotation tools. Use for genome-wide epigenomic statistics, methylation analysis, and chromatin-genome integration.
Retrieve gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation and quality assessment. Use for finding RNA-seq/microarray datasets by organism/tissue/condition, comparing across studies (case-control, time-series, dose-response), and assessing dataset suitability before downloading. Always uses English search terms.
Interpret hits from CRISPR-KO/CRISPRi/shRNA screens by integrating DepMap essentiality, gnomAD constraint scores, pathway context (Reactome, STRING), druggability (DGIdb), and clinical evidence (CIViC, COSMIC). Use for screen-hit prioritization, essentiality ranking, and turning a list of screen hits into a prioritized target shortlist.
Gene-disease association analysis across DisGeNET, OpenTargets, Monarch, OMIM, GenCC, Orphanet. Cross-references multiple sources for evidence-graded association reports with concordance scoring (5/5 sources agree → strong, 1/5 → weak). Use for 'which diseases is gene X associated with' or 'which genes cause disease Y' queries with quantitative confidence.
Chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, PubChemTox experimental data, GHS/IARC hazard classification, and exposure-context analysis. Use for chemical hazard identification, occupational/consumer-product toxicity, dose-response evaluation, and acute (LD50) vs chronic toxicity assessment. Distinguishes drug toxicity from environmental chemical toxicity.
Gene regulatory network analysis — TF-target inference (JASPAR motifs, ChIP-seq), motif scanning, eQTL integration, perturbation evidence (knockout/overexpression). Use for 'which TF regulates gene X', 'which genes does TF Y target', regulatory pathway reconstruction. Distinguishes direct (binding) vs indirect (co-expression) regulatory evidence.
GPCR receptor pharmacology — agonist/antagonist/inverse-agonist/biased-agonist classification, GPCRdb structural data, receptor-ligand binding analysis, antibody-target interface (SAbDab). Use for GPCR drug discovery, biased-agonism analysis, receptor subtype selectivity questions, and orthosteric vs allosteric pocket characterization.
KEGG-based disease-drug-variant network research. Connects diseases to causal genes, drugs to molecular targets, and variants to pathways using KEGG's editorially curated databases (KEGG Disease, Drug, Network, Variant, Pathway). Use for drug repurposing via shared pathways, mechanistic disease-gene-drug networks, and pathway-based target discovery. Distinguishes direct (binding) vs indirect (pathway co-membership) drug-target relationships.
HLA gene-family analysis and MHC-peptide binding for transplant compatibility, vaccine epitope coverage, and cancer immunotherapy. Uses IMGT (HLA polymorphism), IEDB (epitope-MHC binding), UniProt (annotation), DGIdb (druggability). Use for HLA typing/imputation review, vaccine HLA coverage, and immunotherapy prediction biomarkers (HLA-LOH, neoantigen presentation).
Interpret a single GWAS SNP across multiple databases — GWAS Catalog hits, LD/haplotype context, eQTL evidence, regulatory annotation, ClinVar pathogenicity, gnomAD frequency. Use for 'what does this SNP do', SNP-to-mechanism tracing, and resolving lead-SNP-vs-causal-variant ambiguity. Always considers LD structure before claiming a SNP is mechanistically responsible.
Discover causal genes for diseases/traits from GWAS data using Open Targets L2G (locus-to-gene) scoring — integrates eQTL, chromatin interaction, and distance evidence. Use for trait-to-gene mapping, drug-target hypothesis generation from GWAS, and replacing the 'nearest gene' heuristic with multi-evidence L2G scores.
AI-driven patient-to-trial matching for precision oncology and rare-disease care. Transforms a patient's molecular profile (mutations, biomarkers, expression) and clinical state into ranked clinical-trial recommendations with evidence tiers. Searches ClinicalTrials.gov plus cross-references CIViC, OpenTargets, ChEMBL, and FDA labels. Use for matching patients to trials by genotype, biomarker-driven trial selection, and trial-eligibility scoring.
TCR/BCR repertoire analysis — V(D)J segment usage, CDR3 sequence diversity, clonality scoring, antigen specificity matching to IEDB, public-clone identification. Use for adaptive immune response characterization, post-treatment immune monitoring, antigen-specific clone tracking, and clonal-expansion analysis in immunotherapy or vaccination studies.
Immunology research workflows: antibody-antigen interactions, T/B cell repertoire, MHC/HLA binding prediction, autoimmune disease genetics, vaccine epitope mapping. Uses IEDB, IMGT, SAbDab, UniProt. Use for adaptive immunity questions, immune response analysis, antibody/TCR/BCR characterization, immunogenicity prediction, and immune-pathway-to-disease mapping.
Detect and auto-install missing ToolUniverse research skills. Checks common Claude Code/Cursor/Codex skill directories for the canary file, and installs any missing skills if none found. Use when the plugin's research skills aren't loading, when migrating between clients, or when verifying a skill installation.
Predict patient response to immune checkpoint inhibitors (ICIs) by integrating tumor mutational burden (TMB), microsatellite instability (MSI), PD-L1 expression, HLA status, and immune-related gene expression. Outputs ICI Response Score with drug-specific recommendations and resistance-risk assessment. Use for melanoma/NSCLC/RCC immunotherapy decision support.
Rapid pathogen characterization and drug repurposing for outbreaks. Combines pathogen genomics (NCBI, BVBRC), host immune response (IEDB), drug-target databases (ChEMBL, DGIdb), and literature surveillance (PubMed/EuropePMC). Use for emerging-pathogen profiling, antiviral candidate identification, and outbreak intelligence reporting.
Metabolomics pathway analysis — metabolite identification (HMDB, KEGG, ChEBI), pathway mapping (Reactome, KEGG, MetaCyc), disease associations, enzyme/gene linkage. Use for metabolite-to-pathway-to-disease connections, BridgeDb-based ID conversion, and integrating metabolomics with gene-level pathway analyses.
Microscopy and quantitative imaging analysis — colony morphometry, fluorescence intensity quantification, cell-count statistics, dose-response curves, and ANOVA/Dunnett on image-derived measurements. Uses pandas/numpy/scipy/scikit-image. Use for analyzing tabular outputs from CellProfiler/ImageJ, image-derived measurement statistics, and image-based assay quantification.
Gene-set enrichment analysis — GO (Biological Process, Molecular Function, Cellular Component), KEGG, Reactome pathway enrichment via clusterProfiler, gseapy, ORA, GSEA. Use for interpreting DEG lists, screen hit lists, or any gene-list-to-pathways query. Includes simplify-cutoff handling and union-vs-total denominator conventions for percent-DE questions.
Retrieve chemical compound data from PubChem and ChEMBL with disambiguation, cross-referencing, and stereochemistry handling. Use for resolving compound names to SMILES/InChI/CID/ChEMBL IDs, fetching molecular properties, distinguishing isomers/stereo forms, and cross-validating identity across databases. Always use English compound names; flags ambiguous queries (e.g., Vitamin D has multiple forms).
Inorganic chemistry, physical chemistry, and materials science — crystal structures, coordination chemistry, lattice parameters, thermodynamic properties, electronic structure. Use for unit cell volume calculations, coordination geometry, materials property estimation, and inorganic-mechanism reasoning. Complementary to tooluniverse-organic-chemistry.
Lipid analysis and lipid-disease associations using LIPID MAPS classification, HMDB metabolite data, KEGG/Reactome lipid pathways (sphingolipid, eicosanoid, steroid, fatty acid), and PubChem chemical info. Use for lipid identification, lipid metabolism pathway mapping, and lipid-associated disease analysis (cardiovascular, diabetes, NAFLD).
Compare GWAS studies, perform meta-analyses across cohorts, and assess signal replication. Uses GWAS Catalog metadata, study-level statistics, and cross-cohort comparison. Use for evaluating GWAS reproducibility for a trait, meta-analysis sample size and effect-size aggregation, and detecting study heterogeneity (population, design, ancestry).
Strategic clinical trial design feasibility assessment. Analyzes 6 dimensions (endpoint, population, comparator, effect size, duration, regulatory pathway) using precedent trials and FDA guidance. Produces enrollment projections, endpoint recommendations, and approval-pathway analysis. Use for trial-protocol design, power/sample-size estimation, comparator selection, and FDA submission strategy. Driven by precedent-based reasoning rather than first-principles math.
Generate comprehensive disease research reports covering genetics (causal genes, GWAS, OMIM), pathways (Reactome, KEGG), drugs (existing therapies, repurposing candidates), clinical trials, epidemiology (prevalence, incidence), and phenotypes (HPO). Use for full disease overviews, comprehensive disease characterization, and orphan/rare-disease profiling.
Find and evaluate research datasets for any scientific question. Maps research questions to required study designs (longitudinal vs cross-sectional, observational vs experimental, single-cohort vs multi-cohort). Use when the user asks 'find data about X', 'where can I get data on Y', or needs a specific cohort/survey/repository. Covers GEO, ArrayExpress, dbGaP, NHANES, UK Biobank, ClinicalTrials.gov, GWAS Catalog, and 30+ scientific repositories.
Quantitative drug-target validation pipeline. Scores druggability, selectivity, safety profile, ADMET feasibility, and structural tractability with a composite Target Validation Score (0-100) and GO/NO-GO recommendation. Use for go/no-go decisions on a target before commit-to-medchem, target prioritization across a list, and target-deselection rationale.
Histone-modification ChIP-seq, ATAC-seq accessibility, chromatin state, and TF binding analysis from ENCODE, Roadmap Epigenomics, ChIP-Atlas. Use for chromatin-state-by-tissue queries, TF-binding-by-region, regulatory landscape mapping, and ENCODE-cCRE annotations. For DNA methylation use tooluniverse-epigenomics; for RNA-seq use tooluniverse-rnaseq-deseq2.
Transform GWAS signals into drug targets and repurposing opportunities. Connects GWAS-significant loci to causal genes via fine-mapping/eQTL, then to druggable proteins via DGIdb/OpenTargets, then to existing drugs via ChEMBL. Use for GWAS-to-target hypothesis generation, druggable-fraction analysis of disease loci, and human-genetics-validated drug-repurposing prioritization.
TOP PRIORITY skill — find and immediately fix or remove every piece of wrong, outdated, or redundant information in ToolUniverse docs. Wrong code, broken links, incorrect counts, and overlapping instructions must be fixed or removed — never left in place. Runs five phases: (D) static method scan, (C) live code execution, (A) automated validation, (B) ToolUniverse audit, (E) less-is-more simplification. Core philosophy: each concept appears exactly once; remove don't add; no emojis; single setup entry point. Use when reviewing docs, before releases, after API changes, or when asked to audit, fix, or simplify documentation.
Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).
Clinical interpretation of somatic cancer mutations for precision oncology. Transforms a gene + variant + cancer-type input into an actionable report: clinical evidence tier (CIViC, OncoKB), therapeutic options (FDA-approved + investigational), resistance mechanisms, prognosis, and matching clinical trials. Use for tumor-board variant calls, somatic-mutation actionability assessment, and treatment selection. Always cancer-type-specific.
Structural variant (SV) clinical interpretation: deletions, duplications, inversions, translocations, complex rearrangements. Applies ACMG-adapted criteria with ClinGen HI/TS dosage scores, gnomAD frequencies, and ClinVar evidence. Produces 5-tier classification with explicit per-criterion evidence. Use for clinical genomics SV review, dosage-sensitivity assessment, breakpoint analysis, and CNV pathogenicity calls. Gene-dosage-driven reasoning.
VCF and variant analysis — parsing, annotation, classification (synonymous, missense, frameshift, stop_gained), VAF filtering, coding vs non-coding categorization, multi-condition variant comparison. Use for VCF parsing, variant fraction calculations (denominator = coding subset only, NOT all variants), and per-sample mutation profiling.
Computational vaccine candidate design: peptide/subunit vaccines via MHC-I/MHC-II epitope prediction (IEDB), population HLA coverage optimization, B-cell epitope identification, and cross-strain conservation analysis. Use for vaccine epitope prediction, HLA allele coverage, multi-epitope construct design, and immunogenicity assessment. Combines predicted MHC binding with experimentally validated IEDB epitopes for higher-confidence designs.
Structural biology plus proteomics integration for drug target validation. Combines PDB experimental structures, AlphaFold predictions, GPCRdb, SAbDab antibody structures, ProteinsPlus binding-site prediction, and BindingDB ligand-affinity data. Use for druggability assessment, binding-site characterization, ligand-pocket analysis, structural-confidence scoring (resolution, pLDDT), and antibody-target interface analysis.
Create high-quality ToolUniverse skills following test-driven, implementation-agnostic methodology.
Automatically discover life science APIs online, create ToolUniverse tools, validate them, and prepare integration PRs. Performs gap analysis to identify missing tool categories, web searches for APIs, automated tool creation using devtu-create-tool patterns, validation with devtu-fix-tool, and git workflow management. Use when expanding ToolUniverse coverage, adding new API integrations, or systematically discovering scientific resources.
Create new scientific tools for ToolUniverse framework with proper structure, validation, and testing. Use when users need to add tools to ToolUniverse, implement new API integrations, create tool wrappers for scientific databases/services, expand ToolUniverse capabilities, or follow ToolUniverse contribution guidelines. Supports creating tool classes, JSON configurations, validation, error handling, and test examples.
Validate a variant-effect predictor (AlphaMissense, ESM-C SAE, ESM logits, EVE, conservation scores, or any per-variant numeric score) against experimental deep mutational scanning (DMS) data. Computes per-variant predictor scores, splits variants into neutral vs disruptive groups by DMS effect, runs a Mann-Whitney U test on the predictor scores, and sweeps the stratification thresholds for robustness. Use when you need to know whether a predictor's scores track real functional disruption on a specific protein.
--- name: tooluniverse-[domain-name] description: [Complete description of what the skill does, which databases it uses, and when to use it. Include specific trigger phrases like "analyze [domain]", "find [data type]", etc. This description is the primary way Claude determines when to use your skill.] --- # [Domain Name] Analysis [One paragraph overview describing what this skill does, what problems it solves, and what outputs it provides.] ## When to Use This Skill **Triggers**: - "[Trigger
GitHub workflow for ToolUniverse - push code safely by moving temp files, activating pre-commit hooks, running tests, and cleaning staged files. Use when pushing to GitHub, fixing CI failures, or cleaning up before commits.
Fix failing ToolUniverse tools by diagnosing test failures, identifying root causes, implementing fixes, and validating solutions. Use when ToolUniverse tools fail tests, return errors, have schema validation issues, or when asked to debug or fix tools in the ToolUniverse framework.
Optimize ToolUniverse skills for better report quality, evidence handling, and user experience. Apply patterns like tool verification, foundation data layers, disambiguation-first, evidence grading, quantified completeness, and report-only output. Use when reviewing skills, improving existing skills, or creating new ToolUniverse research skills.
Optimize tool descriptions in ToolUniverse JSON configs for clarity and usability. Reviews descriptions for missing prerequisites, unexpanded abbreviations, unclear parameters, and missing usage guidance. Use when reviewing tool descriptions, improving API documentation, or when user asks to check if tools are easy to understand.
Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.
Translate free-text tumor descriptions to OncoTree codes and resolve cancer subtypes/tissue hierarchy. Cross-references UMLS/NCI vocabularies. Use for standardizing cancer-type nomenclature in EHR free-text, building cohorts in OncoKB or GDC, mapping tumor-board notes to ontology codes, and ensuring consistent terminology across cancer-genomics pipelines.
Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.
Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).
Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.
Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.
TCGA/GDC cancer genomics analysis — cohort construction, clinical metadata retrieval, somatic mutation frequencies, survival analysis, and multi-omics integration. Use for TCGA-BRCA-style cohort studies, mutation prevalence by cancer type, survival-by-mutation analysis, and pan-cancer driver discovery. Always cancer-type-specific (don't use pan-cancer counts without cohort context).
Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".
Continuous improvement system for ToolUniverse tools, skills, and plugin. Run benchmarks, diagnose failures, route fixes to devtu skills, retest. Use after skill optimization, tool additions, or as regression check.
Orchestrate the full ToolUniverse self-improvement cycle: discover APIs, create tools, test with researcher personas, fix issues, optimize skills, and push via git. References and dispatches to all other devtu skills. Use when asked to: run the self-improvement loop, do a debug/test round, expand tool coverage, improve tool quality, or evolve ToolUniverse.
Functional annotation of protein variants — ProtVar structural/functional context, ClinVar clinical classifications, gnomAD population frequencies, CADD deleteriousness, ClinGen gene-disease validity. Use for variant annotation pipelines, missense effect prediction, and protein-level variant interpretation with functional context.
Comprehensive drug-target intelligence — tissue expression (GTEx, HPA), pathways, protein interactions (STRING), variant landscape (ClinVar, gnomAD), druggability (DGIdb, ChEMBL approved drugs). 9 parallel research paths with citations. Use for full target profile reports, target characterization for drug discovery, and 'tell me about target X' queries.
Clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Use for VUS classification, pathogenicity assessment with cited criteria, structure-based variant impact (AlphaFold/PDB), and producing clinical-grade variant reports for return of results or molecular tumor boards.
Systems biology and pathway analysis integrating Reactome, KEGG, WikiPathways, BioCarta, NCI-Nature Pathway Interaction Database. Multi-database pathway enrichment, protein-pathway relationships, network reasoning. Use for pathway analysis on a gene list, multi-source pathway concordance, and systems-level interpretation across databases.
Stem cell, iPSC, and organoid research — pluripotency markers, differentiation protocol pathways, lineage commitment factors, organoid model selection. Use for iPSC characterization, differentiation protocol design via developmental-pathway recapitulation, and organoid-model selection for disease modeling.
End-to-end variant-to-mechanism analysis — trace a variant (rsID/coordinates) through regulatory context, target gene(s), molecular pathway(s), and phenotypic consequences. Integrates 7+ databases across 3 evidence layers (regulatory, molecular, disease) for a mechanistic model. Use for GWAS-hit-to-mechanism, eQTL-causal-gene tracing, and full causal-chain reports.
Drug and chemical toxicity assessment via adverse outcome pathways (AOPs), real-world FAERS adverse event signals, FDA labels, and toxicogenomic associations. Triangulates molecular initiating event to cellular outcome to organ-level toxicity to clinical adverse event. Use for hepatotoxicity/cardiotoxicity/nephrotoxicity prediction and toxicology reports.
Code quality patterns and guidelines for ToolUniverse tool development. Apply when writing, fixing, or refactoring tool Python code in the ToolUniverse project. Encodes lessons from 80+ debug rounds. Use alongside devtu-fix-tool and devtu-self-evolve. Triggers: implementing tool fixes, writing new tool classes, reviewing tool code quality, checking schema correctness, looking up API-specific bug fixes.
Add custom local tools to ToolUniverse alongside the 1000+ built-in tools. Covers JSON-config tools (simplest, no code), Python class tools (REST/SOAP/GraphQL APIs, computational logic), and best-practices for return schemas. Use for wrapping new APIs, adding domain-specific computations, or contributing tools to the registry.
Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.
Spatial transcriptomics analysis — Visium, MERFISH, seqFISH, Slide-seq. Maps gene expression to tissue architecture, identifies spatially variable genes (SVGs), tissue-domain segmentation, and cell-cell interaction inference. Use for spatial gene-expression questions, tissue architecture analysis, and SVG identification.
Retrieve chemical compound data from PubChem and ChEMBL with disambiguation, cross-referencing, and stereochemistry handling. Use for resolving compound names to SMILES/InChI/CID/ChEMBL IDs, fetching molecular properties, distinguishing isomers/stereo forms, and cross-validating identity across databases. Always use English compound names; flags ambiguous queries (e.g., Vitamin D has multiple forms).
Quantitative drug-target validation pipeline. Scores druggability, selectivity, safety profile, ADMET feasibility, and structural tractability with a composite Target Validation Score (0-100) and GO/NO-GO recommendation. Use for go/no-go decisions on a target before commit-to-medchem, target prioritization across a list, and target-deselection rationale.
Gene-disease association analysis across DisGeNET, OpenTargets, Monarch, OMIM, GenCC, Orphanet. Cross-references multiple sources for evidence-graded association reports with concordance scoring (5/5 sources agree → strong, 1/5 → weak). Use for 'which diseases is gene X associated with' or 'which genes cause disease Y' queries with quantitative confidence.
Lipid analysis and lipid-disease associations using LIPID MAPS classification, HMDB metabolite data, KEGG/Reactome lipid pathways (sphingolipid, eicosanoid, steroid, fatty acid), and PubChem chemical info. Use for lipid identification, lipid metabolism pathway mapping, and lipid-associated disease analysis (cardiovascular, diabetes, NAFLD).
Microbiome and metagenomics analysis using MGnify, GTDB taxonomy, ENA sequencing data, and EuropePMC literature. Covers taxonomic classification, genome quality assessment, biome-clinical phenotype linkage, and pathway interpretation. Use for amplicon/shotgun metagenomics study analysis.
Metabolomics research — metabolite identification, study analysis, and database searches across HMDB, MetaboLights, Metabolomics Workbench, KEGG. Use for annotating mass-spec features to known metabolites, finding metabolomics studies of a disease, and structured metabolomics research reports with metabolite-pathway mapping.
Cancer treatment recommendations from molecular profile (mutations + cancer type + biomarkers) — FDA-approved + investigational therapies, resistance mechanisms, matching clinical trials, prognosis. Uses CIViC, ClinVar, OpenTargets, ClinicalTrials.gov. Use for tumor-board treatment recommendations, evidence-tiered actionability assessment, and FDA-precedent-driven therapy selection.
Analyze metabolomics data end-to-end — metabolite identification, quantification (TIC normalization, batch correction), differential analysis, and pathway interpretation. Use for processing mass-spec metabolomics output, normalization choice, untargeted metabolomics workflows, and integrating with other omics layers.
Comprehensive disease characterization across genomics, transcriptomics, proteomics, and pathways for systems-level understanding. Identifies therapeutic opportunities and biomarker candidates by integrating multi-layer molecular data. Use for full-omics disease deep-dive reports, mechanism mapping, and biomarker-and-target identification from multi-omics data.
Cross-species genetic analysis using model organism databases (MGI mouse, ZFIN zebrafish, FlyBase fruit fly, WormBase worm, SGD yeast, RGD rat, GBIF taxonomy). Maps human genes to orthologs, retrieves phenotype/expression/functional data, assesses gene function conservation, and identifies the best animal model for studying a human gene or disease.
Compound-target-disease network construction and analysis for drug repurposing, polypharmacology discovery, and multi-target drug design. Uses STRING, BioGRID, ChEMBL, DGIdb, OMIM, OpenTargets. Use for off-target effect prediction, network-based drug repurposing, and identifying molecules with desired multi-target profile.
Microbiome research using MGnify, GTDB, ENA, OLS (ENVO biomes), and EuropePMC. Covers study discovery, taxonomic profiling, host-microbe interaction analysis, and biome-by-condition queries. Use for microbiome study selection, organism-environment associations, and clinical-microbiome literature review. Distinct from analytical workflow (use tooluniverse-metagenomics-analysis for that).
Organic chemistry reasoning guide for reaction product prediction, mechanism analysis (electrophilic/nucleophilic substitution, addition, elimination, pericyclic, radical), and spectroscopy interpretation (1H/13C NMR, IR, MS). Reasons from first principles (electron flow, kinetic vs thermodynamic) rather than pattern-matching named reactions. Use for organic synthesis problems and mechanism explanations.
Pharmacogenomics (PGx) research — drug-gene interactions (CPIC, PharmGKB), CPIC dosing guidelines, variant-drug-response associations, ethnic-allele-frequency considerations, and metabolizer-status scoring. Use for PGx-informed dosing recommendations, CYP/HLA pharmacogenomic allele interpretation, and clinically-actionable PGx report generation.
Multi-omics integration — orchestrate per-layer analysis (transcriptomics, proteomics, epigenomics, genomics, metabolomics) then perform cross-omics correlation, multi-omics clustering, and pathway-level integration. Use for integrative systems-biology analysis, multi-modal disease characterization, and cross-omics biomarker discovery.
Drug safety and adverse event analysis — FAERS spontaneous-report mining, FDA black-box warnings, signal detection (PRR, ROR, IC), risk factors by demographic/comorbidity, and label change tracking. Use for post-market safety surveillance, AE signal investigation, drug-AE association strength scoring, and pharmacovigilance reports.
Connect GWAS variants to biological pathways and druggable targets. Maps GWAS hits to causal genes (via fine-mapping/eQTL), then to pathways (Reactome, KEGG, WikiPathways), then to existing drugs hitting those pathways. Use for pathway-level disease mechanisms, druggable-pathway prioritization from GWAS, SNP-to-pathway-to-target tracing, and tissue-specific eQTL evidence for drug target hypotheses.
Non-coding RNA analysis — miRNAs (miRBase, miRDB targets), lncRNAs (LNCipedia, RNAcentral), circRNAs, snoRNAs, and other ncRNA classes. Distinct mechanisms per class — miRNAs repress mRNA; lncRNAs scaffold/decoy/enhance. Use for ncRNA function prediction, miRNA-target prediction, lncRNA functional annotation, and ncRNA-disease association queries.
Plant genomics and biology research — PlantReactome pathways, Ensembl Plants gene structure, POWO species taxonomy, UniProt annotation, KEGG plant pathways. Handles polyploidy (wheat hexaploidy etc.) and homeologous gene copies. Use for crop-gene annotation, plant secondary metabolism queries, and plant-disease/stress-response biology.
Population genetics using the 1000 Genomes Project (IGSR) — superpopulation/population search, sample metadata, variant frequencies across AFR/AMR/EAS/EUR/SAS, ancestry-specific analyses. Use for ancestry comparison, population-aware allele frequency lookups, and 1000-Genomes-cohort-specific analyses (distinct from gnomAD which has different sample composition).
Post-translational modification (PTM) analysis — phosphorylation, ubiquitination, acetylation, glycosylation, methylation. Uses iPTMnet (sites + enzymes), ProtVar (functional consequences), UniProt (baseline), STRING, ELM (linear motifs), MassIVE/ProteomeXchange (experimental). Use for PTM site annotation, kinase-substrate identification, and PTM-disease associations.
Phylogenetic analysis — tree analysis, treeness, saturation (PhyKIT), parsimony-informative sites, alignment gap analysis, MAFFT alignment, DVMC, long-branch detection, BUSCO orthologs. Uses PhyKIT, Biopython, DendroPy. Use for phylogenetic tree QC, multi-gene phylogenomics, evolutionary-rate analysis, and comparative-genomics studies.
Protein-protein interaction (PPI) network analysis — STRING (predicted + experimental), BioGRID (curated), SASBDB (small-angle scattering). Distinguishes physical interactions (binding) from functional associations (co-expression, co-regulation). Use for interactome queries, complex partner identification, and pathway-level interaction analysis.
Patient stratification for precision medicine — integrate genomic, clinical, and therapeutic data to split patients into responder/non-responder groups, risk tiers, or treatment-decision groups. Use for stratification-by-biomarker, treatment-selection logic, and personalized therapeutic strategy reports per patient subgroup.
Build and interpret polygenic risk scores (PRS) for complex diseases using GWAS summary statistics. Covers PRS construction (clumping/thresholding, PRS-CS), validation in independent cohorts, ancestry-aware adjustment, and clinical interpretation (population-relative risk, not absolute prediction). Use for PRS-based risk stratification.
Propose the mechanism by which a missense variant causes loss-of-function (LoF), synthesizing evidence from 5 independent layers: AlphaMissense pathogenicity, AlphaFold structural context, ESMC sequence likelihood, SAE feature disruption, and DynaMut2 stability ΔΔG. Distinguishes 'structural stability LoF' (mis-folding) from 'direct functional disruption' (catalytic / binding / PTM site damage). Use for coding missense variants where you need a mechanistic causal model, not just a pathogenicity score.
Given a PDB structure, produce a per-residue annotation table: which residues sit at a binding interface (vs a partner chain), which line a ligand pocket, which are buried (core) vs solvent-exposed (surface), and optionally secondary structure. This is the structural track drawn under a DMS heatmap and the structural prior SAE feature drops are read against. Use when you need to anchor a variant-interpretation or DMS analysis to the protein's actual physical context.
Protein structure retrieval from RCSB PDB, PDBe, and AlphaFold with disambiguation, quality assessment (resolution, R-factor, pLDDT), and metadata. Distinguishes high-quality experimental (X-ray under 2 Angstrom) vs predicted vs medium-quality structures. Use for fetching protein structures, structure-quality comparison, and selecting structures for drug design or modeling.
Interpret a missense variant via ESMC-6B Sparse Autoencoder (SAE) feature activations. For a given protein + variant, computes which interpretable SAE features (catalytic, ligand-binding, PTM, structural motif, domain, etc.) are lost or gained at the mutation site. Use when standard pathogenicity scores (AlphaMissense, ClinVar) say a variant is damaging but you need a MECHANISTIC explanation — e.g. 'why is this variant LoF?' Complements (does not replace) variant-interpretation and variant-to-mechanism skills, which focus on ACMG classification or regulatory mechanism.
AI-guided de novo protein design — RFdiffusion backbone generation, ProteinMPNN sequence design, structure validation (pLDDT, pTM, MPNN scores). Use for designing therapeutic protein binders, novel scaffolds, enzyme variants, and miniprotein/protein-interface design before experimental validation.
Mass-spec proteomics analysis — protein identification, quantification (LFQ, TMT, iTRAQ), differential expression (tumor vs normal, treatment vs control), PTM identification, and pathway enrichment on protein lists. Use when you have proteomics MS output, asking about protein abundance differences, or doing systems-level proteomic interpretation.
Protein 3D structure prediction from sequence — ESMFold de novo prediction, AlphaFold database retrieval, experimental structures from RCSB, ProtVar variant impact assessment, ProtParam sequence properties. Use for structure prediction when no experimental structure exists, fold-confidence scoring, and structure-guided variant interpretation.
Find and retrieve proteomics datasets from MassIVE and ProteomeXchange. Search by species, keyword, or accession; retrieve detailed metadata (instruments, publications, species, PTMs studied). Use for locating public proteomics datasets to reanalyze, comparing instrument/protocol coverage across studies, and pre-download dataset evaluation.
Rare disease differential diagnosis from patient phenotype — HPO term matching to candidate diseases (Orphanet, OMIM), gene panel prioritization, ACMG variant interpretation, and structure-based variant analysis. Use for diagnostic odyssey assistance, phenotype-to-disease ranking, and genetic-counseling differential generation.
Non-coding/regulatory variant interpretation — GWAS association lookup, eQTL evidence (GTEx), chromatin state (ENCODE), regulatory variant scoring (RegulomeDB, CADD), and TF-binding disruption. Use for non-coding GWAS hit interpretation, eQTL-based gene assignment, and regulatory mechanism reasoning. Distinct from coding-variant tools.
RNA-seq differential expression analysis with DESeq2 — DEG lists, fold changes, dispersion estimation, design formulas including covariates, multi-condition contrasts, and Venn-set operations across groups. Use when you have a count matrix + metadata, want to find DEGs, or need dispersion/PCA/clustering analysis. Includes RULE ZERO precedence (read executed.ipynb if present).
Given a set of residues in a protein, explain WHY they are functionally critical by combining structural context (binding interface, ligand pocket, core, secondary structure), UniProt features (active sites, binding sites, PTM sites, disulfides), optional SAE feature evidence, and optional DMS data. Accepts residues from any source: DMS hotspots (top-K by max effect), ClinVar recurrent variants, literature-reported hot regions, evolutionarily conserved positions, or user-curated lists. Returns a per-cluster mechanism call: catalytic / ligand-binding / interface / structural-core / PTM / regulatory / unknown.
Small molecule identification, characterization, and procurement — PubChem, ChEMBL, BindingDB, ADMET-AI, SwissADME, eMolecules, Enamine. Covers compound name to structure to activity to ADMET properties to commercial sourcing. Use for chemical biology, lead identification, probe selection, and the full small-molecule discovery pipeline.
Build AI scientist systems with the ToolUniverse Python SDK for scientific research. Covers the 3 calling patterns (`tu.run` portable dict API, `tu.tools.X` function API, direct class instantiation), tool loading, batch execution, MCP server integration, and embedding-based tool search. Use for SDK programming, custom tool composition, benchmarking pipelines, and integrating ToolUniverse into research workflows.
Single-cell RNA-seq analysis with scanpy/anndata — h5ad data loading, QC (mitochondrial fraction, doublets), normalization, dimensionality reduction (PCA, UMAP, t-SNE), clustering (Leiden, Louvain), marker gene identification, cell-type annotation, pseudotime/trajectory analysis. Use for any scRNA-seq workflow.
Retrieve DNA/RNA/protein sequences from NCBI and ENA with disambiguation. Quality hierarchy: RefSeq (NM_/NP_) > RefSeq predicted (XM_/XP_) > GenBank submissions. Use for fetching specific sequences by accession, gene-symbol-to-sequence lookup, transcript-isoform retrieval, and curated-vs-raw-submission preference.
Spatial multi-omics interpretation pipeline. Transforms spatially variable genes (SVGs), domain annotations, and tissue context into biological insights via domain-by-domain characterization, cell-type composition, spatial gene expression patterns, RNA+protein+metabolite integration. Use for Visium, MERFISH, seqFISH, Slide-seq, spatial proteomics, and spatial multi-omics interpretation. Goes beyond statistics to disease mechanisms and therapeutic opportunities.
Transcription factor binding, cis-regulatory elements (cCREs), chromatin accessibility, and regulatory annotation using JASPAR (motifs), ENCODE (cCREs, ChIP-seq), RegulomeDB (regulatory variant scoring), UCSC. Use for regulatory element annotation, TF-binding-site prediction, and regulatory-region functional impact assessment.
Rare disease genomics — disease identification (Orphanet), causative gene discovery, gene-disease validity (GenCC), variant interpretation (ClinVar), and translational research (ClinicalTrials.gov, drug repurposing for orphans). Use for rare-disease-gene curation, novel-gene-discovery analysis, and rare-disease drug-development support.
Rapid pathogen characterization and drug repurposing for outbreaks. Combines pathogen genomics (NCBI, BVBRC), host immune response (IEDB), drug-target databases (ChEMBL, DGIdb), and literature surveillance (PubMed/EuropePMC). Use for emerging-pathogen profiling, antiviral candidate identification, and outbreak intelligence reporting.
KEGG-based disease-drug-variant network research. Connects diseases to causal genes, drugs to molecular targets, and variants to pathways using KEGG's editorially curated databases (KEGG Disease, Drug, Network, Variant, Pathway). Use for drug repurposing via shared pathways, mechanistic disease-gene-drug networks, and pathway-based target discovery. Distinguishes direct (binding) vs indirect (pathway co-membership) drug-target relationships.
Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.
Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.
Population genetics analysis — allele frequencies (gnomAD, 1000 Genomes), Hardy-Weinberg equilibrium testing, Fst between populations, GWAS associations, evolutionary constraint scores. Use for cross-population variant comparison, ancestry-aware allele frequency lookups, and population-level evolutionary analysis.
Inorganic chemistry, physical chemistry, and materials science — crystal structures, coordination chemistry, lattice parameters, thermodynamic properties, electronic structure. Use for unit cell volume calculations, coordination geometry, materials property estimation, and inorganic-mechanism reasoning. Complementary to tooluniverse-organic-chemistry.
Detect and auto-install missing ToolUniverse research skills. Checks common Claude Code/Cursor/Codex skill directories for the canary file, and installs any missing skills if none found. Use when the plugin's research skills aren't loading, when migrating between clients, or when verifying a skill installation.
Metabolomics pathway analysis — metabolite identification (HMDB, KEGG, ChEBI), pathway mapping (Reactome, KEGG, MetaCyc), disease associations, enzyme/gene linkage. Use for metabolite-to-pathway-to-disease connections, BridgeDb-based ID conversion, and integrating metabolomics with gene-level pathway analyses.
Neuroscience research workflows: neuroanatomy, neural circuits, neurotransmitter biology, neurological/psychiatric disease genetics, neural-protein function. Uses Allen Brain Atlas, WormBase (C. elegans connectome), UniProt for neural proteins, PubMed for primary literature. Use for brain-region biology, neural development, neurodegeneration mechanisms (Alzheimer's, Parkinson's, ALS), and synaptic-protein characterization.
Biological sequence analysis — gene/protein sequence retrieval (NCBI, Ensembl, UniProt), nucleotide/protein search, ortholog discovery, and FASTQ QC + alignment workflows (Trimmomatic, BWA, samtools, coverage depth). Use for sequence retrieval, sequence comparison, FASTQ QC analysis, and read alignment pre-processing.
Clinical interpretation of somatic cancer mutations for precision oncology. Transforms a gene + variant + cancer-type input into an actionable report: clinical evidence tier (CIViC, OncoKB), therapeutic options (FDA-approved + investigational), resistance mechanisms, prognosis, and matching clinical trials. Use for tumor-board variant calls, somatic-mutation actionability assessment, and treatment selection. Always cancer-type-specific.
Translate free-text tumor descriptions to OncoTree codes and resolve cancer subtypes/tissue hierarchy. Cross-references UMLS/NCI vocabularies. Use for standardizing cancer-type nomenclature in EHR free-text, building cohorts in OncoKB or GDC, mapping tumor-board notes to ontology codes, and ensuring consistent terminology across cancer-genomics pipelines.
Find commercial sources for chemical compounds — PubChem/ChEMBL identity resolution then vendor catalog search across ZINC, Enamine, eMolecules, Mcule. Compares pricing, availability, and identifies purchasable analogs when an exact compound is not in stock. Use for chemical procurement, virtual library curation, and 'where can I buy X' questions for synthesis planning.
Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).
Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.
Gene regulatory network analysis — TF-target inference (JASPAR motifs, ChIP-seq), motif scanning, eQTL integration, perturbation evidence (knockout/overexpression). Use for 'which TF regulates gene X', 'which genes does TF Y target', regulatory pathway reconstruction. Distinguishes direct (binding) vs indirect (co-expression) regulatory evidence.
Immunology research workflows: antibody-antigen interactions, T/B cell repertoire, MHC/HLA binding prediction, autoimmune disease genetics, vaccine epitope mapping. Uses IEDB, IMGT, SAbDab, UniProt. Use for adaptive immunity questions, immune response analysis, antibody/TCR/BCR characterization, immunogenicity prediction, and immune-pathway-to-disease mapping.
Predict patient response to immune checkpoint inhibitors (ICIs) by integrating tumor mutational burden (TMB), microsatellite instability (MSI), PD-L1 expression, HLA status, and immune-related gene expression. Outputs ICI Response Score with drug-specific recommendations and resistance-risk assessment. Use for melanoma/NSCLC/RCC immunotherapy decision support.
TCR/BCR repertoire analysis — V(D)J segment usage, CDR3 sequence diversity, clonality scoring, antigen specificity matching to IEDB, public-clone identification. Use for adaptive immune response characterization, post-treatment immune monitoring, antigen-specific clone tracking, and clonal-expansion analysis in immunotherapy or vaccination studies.
HLA gene-family analysis and MHC-peptide binding for transplant compatibility, vaccine epitope coverage, and cancer immunotherapy. Uses IMGT (HLA polymorphism), IEDB (epitope-MHC binding), UniProt (annotation), DGIdb (druggability). Use for HLA typing/imputation review, vaccine HLA coverage, and immunotherapy prediction biomarkers (HLA-LOH, neoantigen presentation).
Statistical fine-mapping of GWAS loci using credible sets (SuSiE, FINEMAP) and locus-to-gene scoring (Open Targets L2G). Identifies likely causal variants and target genes — distinct from positional 'nearest gene' which is often wrong. Use for prioritizing causal variants at GWAS hits, comparing fine-mapping methods, and converting lead SNPs to target genes.
Interpret a single GWAS SNP across multiple databases — GWAS Catalog hits, LD/haplotype context, eQTL evidence, regulatory annotation, ClinVar pathogenicity, gnomAD frequency. Use for 'what does this SNP do', SNP-to-mechanism tracing, and resolving lead-SNP-vs-causal-variant ambiguity. Always considers LD structure before claiming a SNP is mechanistically responsible.
Transform GWAS signals into drug targets and repurposing opportunities. Connects GWAS-significant loci to causal genes via fine-mapping/eQTL, then to druggable proteins via DGIdb/OpenTargets, then to existing drugs via ChEMBL. Use for GWAS-to-target hypothesis generation, druggable-fraction analysis of disease loci, and human-genetics-validated drug-repurposing prioritization.
Compare GWAS studies, perform meta-analyses across cohorts, and assess signal replication. Uses GWAS Catalog metadata, study-level statistics, and cross-cohort comparison. Use for evaluating GWAS reproducibility for a trait, meta-analysis sample size and effect-size aggregation, and detecting study heterogeneity (population, design, ancestry).
Trace drug mechanism of action — primary target → downstream signaling → pathway perturbation → tissue/organ effect → clinical outcome. Uses DrugBank, ChEMBL, KEGG, Reactome, STRING. Use for understanding how a drug works, identifying off-target effects, mechanism-based combination therapy design, and writing mechanism sections of reports.
Retrieve gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation and quality assessment. Use for finding RNA-seq/microarray datasets by organism/tissue/condition, comparing across studies (case-control, time-series, dose-response), and assessing dataset suitability before downloading. Always uses English search terms.
End-to-end drug safety review integrating FDA labels, FAERS adverse event reports, PRR/ROR disproportionality, pharmacogenomic biomarkers, clinical trial data, and published literature. Use for regulatory drug safety reviews, comprehensive pharmacovigilance reports, label-vs-real-world AE comparison, and clinical decision support for drug safety.
GPCR receptor pharmacology — agonist/antagonist/inverse-agonist/biased-agonist classification, GPCRdb structural data, receptor-ligand binding analysis, antibody-target interface (SAbDab). Use for GPCR drug discovery, biased-agonism analysis, receptor subtype selectivity questions, and orthosteric vs allosteric pocket characterization.
Discover causal genes for diseases/traits from GWAS data using Open Targets L2G (locus-to-gene) scoring — integrates eQTL, chromatin interaction, and distance evidence. Use for trait-to-gene mapping, drug-target hypothesis generation from GWAS, and replacing the 'nearest gene' heuristic with multi-evidence L2G scores.
Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.
Histone-modification ChIP-seq, ATAC-seq accessibility, chromatin state, and TF binding analysis from ENCODE, Roadmap Epigenomics, ChIP-Atlas. Use for chromatin-state-by-tissue queries, TF-binding-by-region, regulatory landscape mapping, and ENCODE-cCRE annotations. For DNA methylation use tooluniverse-epigenomics; for RNA-seq use tooluniverse-rnaseq-deseq2.
Gene-set enrichment analysis — GO (Biological Process, Molecular Function, Cellular Component), KEGG, Reactome pathway enrichment via clusterProfiler, gseapy, ORA, GSEA. Use for interpreting DEG lists, screen hit lists, or any gene-list-to-pathways query. Includes simplify-cutoff handling and union-vs-total denominator conventions for percent-DE questions.
Interpret hits from CRISPR-KO/CRISPRi/shRNA screens by integrating DepMap essentiality, gnomAD constraint scores, pathway context (Reactome, STRING), druggability (DGIdb), and clinical evidence (CIViC, COSMIC). Use for screen-hit prioritization, essentiality ranking, and turning a list of screen hits into a prioritized target shortlist.
Identify drug repurposing candidates via target-based, compound-based, and disease-based strategies. Combines drug-target-disease network reasoning with mechanism rationale, clinical-trial precedent, and patent/regulatory feasibility. Use for hypothesis-generating repurposing for orphan diseases, finding existing drugs for new indications, and prioritizing candidates by evidence and feasibility.
Genomics and epigenomics analysis: DNA methylation (CpG, 5mC, 5hmC, bisulfite, RRBS), m6A RNA modification (MeRIP-seq), ChIP-seq peaks, ATAC-seq accessibility, histone modifications, chromatin state, multi-omics integration. Combines pandas/scipy/pysam computation with ToolUniverse annotation tools. Use for genome-wide epigenomic statistics, methylation analysis, and chromatin-genome integration.
End-to-end observational epidemiology analysis — from research question (PECO Population/Exposure/Comparator/Outcome) to publication-ready statistical report. Covers cohort/case-control/cross-sectional design, regression with confounders, propensity scoring, sensitivity analysis. Writes Python code for every step. Use for epidemiology study analysis, NHANES/UK-Biobank-style analyses.
Search and analyze electron microscopy data — cryo-EM density maps (EMDB), fitted atomic models (PDB), raw micrograph datasets (EMPIAR), and cryo-electron tomography volumes (CryoET Data Portal). Use for finding 3D structural data on a protein/complex, comparing experimental EM resolution to AlphaFold confidence, and accessing raw EM data for re-processing.
Ecology, biodiversity, and conservation biology research — species identification (GBIF, NCBI Taxonomy), invasive species impact, ecosystem dynamics, conservation status (IUCN), niche ecology. Use for biodiversity questions, species comparison, invasion biology, conservation prioritization, and ecology-related literature search.
Drug regulatory and approval research — FDA substance registry, ATC/EPC classification, EMA decisions, generic-drug status, FDA Orange Book exclusivity, NDA/BLA pathways. Use for jurisdiction-aware approval status (FDA vs EMA), generic vs brand availability, exclusivity expiry tracking, and regulatory pathway selection. Always specifies the market when reporting status.
Generate comprehensive disease research reports covering genetics (causal genes, GWAS, OMIM), pathways (Reactome, KEGG), drugs (existing therapies, repurposing candidates), clinical trials, epidemiology (prevalence, incidence), and phenotypes (HPO). Use for full disease overviews, comprehensive disease characterization, and orphan/rare-disease profiling.
Comprehensive drug profiling — mechanism, primary/secondary targets, drug interactions, clinical-trial status, adverse events (FAERS), pharmacogenomics, and approval history. Use for full drug investigation reports, 'tell me about drug X' queries, and assembling drug profiles for clinicians, researchers, or regulatory work.
Solve quantitative problems in biophysics — pharmacokinetics (PK volume of distribution, clearance, half-life), epidemiology (R0, attack rate), toxicology (LD50, NOAEL), population genetics (Hardy-Weinberg, Fst), enzyme kinetics (Michaelis-Menten), thermodynamics. Use for first-principles quantitative biology calculations, dose calculations, exposure assessment, and biophysical-property estimation.
AI-driven patient-to-trial matching for precision oncology and rare-disease care. Transforms a patient's molecular profile (mutations, biomarkers, expression) and clinical state into ranked clinical-trial recommendations with evidence tiers. Searches ClinicalTrials.gov plus cross-references CIViC, OpenTargets, ChEMBL, and FDA labels. Use for matching patients to trials by genotype, biomarker-driven trial selection, and trial-eligibility scoring.
Analyze CRISPR-Cas9 genetic screens — MAGeCK gene-level scores, sgRNA count QC, replicate correlation, hit prioritization, and pathway GSEA on screen output. Use for genome-wide essentiality screens, synthetic-lethality discovery, dropout vs positive-selection screen analysis, target identification, and resistance-screen interpretation. Includes screen-QC and statistical thresholds.
Cross-species gene comparison and ortholog analysis. Integrates Ensembl Compara orthologs, NCBI Gene, UniProt, OLS, Monarch, and OpenTargets to identify orthologs, paralogs, sequence conservation, functional conservation across species, and lineage-specific gene gains/losses. Use for phylogenetic gene tracing, model-organism mapping, and evolutionary-genomics queries.
Find and evaluate research datasets for any scientific question. Maps research questions to required study designs (longitudinal vs cross-sectional, observational vs experimental, single-cohort vs multi-cohort). Use when the user asks 'find data about X', 'where can I get data on Y', or needs a specific cohort/survey/repository. Covers GEO, ArrayExpress, dbGaP, NHANES, UK Biobank, ClinicalTrials.gov, GWAS Catalog, and 30+ scientific repositories.
Assess drug-drug interactions — CYP metabolic interactions (substrate/inhibitor/inducer), transporter (P-gp, BCRP, OATP) effects, pharmacodynamic synergy/antagonism, clinical significance scoring, and management recommendations. Use for polypharmacy review, prescribing decision support, and safety analysis when adding or switching drugs.
Integrate computed statistical results (DEGs, GWAS hits, associations) with biological context from ToolUniverse databases (UniProt, GO, Reactome, ClinVar, OpenTargets). Use for adding gene function/pathway/disease annotations to a result list, building biological narrative around statistical findings, and going beyond p-values to mechanism.
Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".
Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.
Strategic clinical trial design feasibility assessment. Analyzes 6 dimensions (endpoint, population, comparator, effect size, duration, regulatory pathway) using precedent trials and FDA guidance. Produces enrollment projections, endpoint recommendations, and approval-pathway analysis. Use for trial-protocol design, power/sample-size estimation, comparator selection, and FDA submission strategy. Driven by precedent-based reasoning rather than first-principles math.
Cancer cell-line selection and profiling for experimental model choice. Cross-references DepMap, Cellosaurus, COSMIC, PharmacoDB to deliver identity verification, mutation/CNV profile, gene dependencies, drug sensitivities, and druggable targets. Use to answer 'which cell line should I use for studying gene X?' or 'is this cell line a good model for cancer Y?'. Outputs ranked recommendations with rationale, growth characteristics, and known pitfalls.
Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).
Search and retrieve clinical practice guidelines from 12+ authoritative sources — NICE, WHO, NCCN, AHA, ADA, SIGN, USPSTF, IDSA, NIH consensus, ESMO/ESC/EASL European societies, and US specialty associations. Use for evidence-graded treatment recommendations, dosing protocols, screening guidance, and authoritative-source-prioritized clinical guidance (NICE/WHO ranked above society guidelines).
Chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, PubChemTox experimental data, GHS/IARC hazard classification, and exposure-context analysis. Use for chemical hazard identification, occupational/consumer-product toxicity, dose-response evaluation, and acute (LD50) vs chronic toxicity assessment. Distinguishes drug toxicity from environmental chemical toxicity.
Microscopy and quantitative imaging analysis — colony morphometry, fluorescence intensity quantification, cell-count statistics, dose-response curves, and ANOVA/Dunnett on image-derived measurements. Uses pandas/numpy/scipy/scikit-image. Use for analyzing tabular outputs from CellProfiler/ImageJ, image-derived measurement statistics, and image-based assay quantification.
TCGA/GDC cancer genomics analysis — cohort construction, clinical metadata retrieval, somatic mutation frequencies, survival analysis, and multi-omics integration. Use for TCGA-BRCA-style cohort studies, mutation prevalence by cancer type, survival-by-mutation analysis, and pan-cancer driver discovery. Always cancer-type-specific (don't use pan-cancer counts without cohort context).