skills/tooluniverse-model-organism-genetics/SKILL.md
Cross-species genetic analysis using model organism databases (MGI mouse, ZFIN zebrafish, FlyBase fruit fly, WormBase worm, SGD yeast, RGD rat, GBIF taxonomy). Maps human genes to orthologs, retrieves phenotype/expression/functional data, assesses gene function conservation, and identifies the best animal model for studying a human gene or disease.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-model-organism-geneticsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Map human genes to model organism orthologs and retrieve phenotype, expression, and functional data across six species. Synthesize cross-species evidence to assess gene function conservation and identify the best animal models for studying human genes and diseases.
Not for: human variant interpretation (tooluniverse-variant-analysis), drug target validation (tooluniverse-drug-target-validation), human disease characterization (tooluniverse-multiomic-disease-characterization).
LOOK UP, DON'T GUESS: When asked about a species' taxonomy, ecology, or biology, search GBIF/NCBI Taxonomy first. For GBIF: use GBIF_search_species(query="species name"), then use the nubKey (not key) from the result to call GBIF_get_species(speciesKey=nubKey) for full taxonomy (kingdom, phylum, class, order, family). The nubKey is the GBIF backbone key; the key is dataset-specific and often lacks higher taxonomy.
Sequence conservation across species implies functional conservation — but not always. A highly conserved gene in mouse and human likely has the same function. But regulatory differences (when/where a gene is expressed) can cause different phenotypes even from the same gene. Always check: is the protein domain conserved, or just raw sequence? Are there known regulatory differences? A 40% identity ortholog with a conserved catalytic domain can be more functionally equivalent than a 90% identity paralog in the same species.
Paralog contamination is a common pitfall. Gene families (e.g., FOXP1/2/3/4, HOX clusters) generate false ortholog hits. Distinguish true orthologs from paralogs by checking synteny (conserved gene neighborhood) and homology type: 1:1 = likely true ortholog; 1:many or many:many = likely paralog expansion. If the target species has a single gene where humans have multiple (e.g., one fly FoxP vs four human FOXPs), it is the co-ortholog of all human paralogs — note this explicitly.
Choose your model by the question:
Invertebrates (fly, worm, yeast) lack adaptive immunity and many vertebrate-specific organs — if the question involves those systems, they will be uninformative.
A knockout phenotype in mouse does not automatically predict the human phenotype. Ask three questions before inferring cross-species relevance:
When phenotypes differ across species, consider regulatory divergence: the coding sequence may be conserved while the expression pattern has shifted. This can produce organisms with the "same gene" but different tissues of expression and therefore different phenotypes.
MyGene_query_genes(query="<gene>") — get Ensembl ID, Entrez ID, UniProt, symbol (filter by symbol match; first hit may be a pseudogene)ensembl_lookup_gene(gene_id="<ensembl_id>", species="homo_sapiens") — validateHPO_search_terms(query="<disease>") — get HPO terms for phenotype matchingFallback if gene not found: UniProt_search(query="<gene>", organism="9606")
Output: canonical symbol, Ensembl ID (ENSG), Entrez ID, UniProt accession.
Primary: EnsemblCompara_get_orthologues(gene="<ENSG>", species="human", target_species="<species>")
Accepted target_species values: "mouse", "zebrafish", "drosophila_melanogaster" (NOT "fruitfly" — returns HTTP 400), "caenorhabditis_elegans", "saccharomyces_cerevisiae", "xenopus_tropicalis"
Fallbacks (if Ensembl Compara returns no results):
PANTHER_ortholog(gene_id="<symbol>", organism=9606, target_organism=<taxon>) — taxon IDs: mouse=10090, fly=7227, worm=6239, zebrafish=7955, yeast=559292, frog=8364NCBIDatasets_get_orthologs(gene_id="<entrez_id>") — broad, all vertebratesFlyMine_search(query="<human_gene_symbol>") — text search finds distant orthologs that automated tools miss; confirm with FlyBase_get_gene_orthologsWormBase_get_gene(gene_id="<gene_symbol>") — gene record often contains ortholog infoCross-reference via Monarch:
Monarch_search_gene(query="<gene_symbol>") — get Monarch gene entityMonarchV3_get_associations(subject="HGNC:<id>", category="biolink:GeneHomologAssociation") — all orthologsNote: "No ortholog found by tools" is not the same as "no ortholog exists." Sequence divergence does not equal functional divergence. Try manual search before concluding absence.
MGI_search_genes(query="<mouse_symbol>") — confirm MGI IDMGI_get_gene(gene_id="MGI:XXXXXXX") — full gene detailsMGI_get_phenotypes(gene_id="MGI:XXXXXXX", limit=50) — knockout/transgenic phenotypesExtract: MP ontology terms, allele types (null KO, conditional KO, point mutation), zygosity, lethality, disease model relevance.
Supplement via Monarch:
MonarchV3_get_associations(subject="MGI:XXXXXXX", category="biolink:GeneToPhenotypicFeatureAssociation")MonarchV3_get_associations(subject="MGI:XXXXXXX", category="biolink:GeneToDiseaseAssociation")FlyBase_get_gene(gene_id="FB:FBgnXXX") — gene details, function summaryFlyBase_get_gene_alleles(gene_id="FB:FBgnXXX", limit=20) — LOF, GOF, RNAi linesFlyBase_get_gene_disease_models(gene_id="FB:FBgnXXX") — human disease models in flyFlyBase_get_gene_expression(gene_id="FB:FBgnXXX") — tissue/stage expressionFlyBase_get_gene_interactions(gene_id="FB:FBgnXXX") — genetic and physical interactionsWormBase_get_gene(gene_id="WBGene00XXXXXX") — gene details, concise descriptionWormBase_get_phenotypes(gene_id="WBGene00XXXXXX") — RNAi and mutant phenotypesWormBase_get_expression(gene_id="WBGene00XXXXXX") — expression patternZFIN_get_gene(gene_id="ZFIN:ZDB-GENE-XXXXXX-X")ZFIN_get_gene_phenotypes(gene_id="...", limit=30) — morpholino/CRISPR/mutant phenotypesZFIN_get_gene_expression(gene_id="...") — spatiotemporal expressionDistinguish: morpholino knockdown (rapid, potential off-target), CRISPR mutant (more reliable), ENU mutant (unbiased forward genetics).
Xenbase_search_genes(query="<gene_symbol>")Xenbase_get_gene(gene_id="<xenbase_id>") — gene details, expression, phenotypesSGD_search(query="<gene_symbol>", category="gene")SGD_get_gene(sgd_id="<sgd_id>") — function, pathwaySGD_get_phenotypes(sgd_id="<sgd_id>") — deletion and overexpression phenotypesSGD_get_go_annotations(sgd_id="<sgd_id>") — GO terms (often best-characterized for conserved genes)SGD_get_interactions(sgd_id="<sgd_id>") — synthetic lethal partners = potential drug targetsMost informative for: cell cycle, DNA repair, protein folding, metabolism, autophagy, secretory pathway, chromatin. Not informative for: multicellular processes (development, immunity, neural function).
This phase transforms per-organism data into biological insight.
Step 1: Build the phenotype matrix
| Feature | Human | Mouse | Fly | Worm | Zebrafish | Yeast | |---------|-------|-------|-----|------|-----------|-------| | Ortholog present? | — | | | | | | | LOF lethality | | | | | | | | Primary phenotype | | | | | | | | Expression domain | | | | | | |
Step 2: Identify the core/ancestral function Look for the phenotype that is most consistent across species. Abstract from species-specific terms:
Step 3: Cross-species phenotype mapping
Different species use different ontologies (HPO, MP, FBcv, WBPhenotype, ZP). Use MonarchV3_phenotype_similarity_search to find equivalent phenotypes via the uPheno ontology. When automated mapping fails, use biological reasoning to find conceptual equivalents.
Step 4: Conservation assessment
Step 5: Pathway conservation check
STRING_get_network(identifiers="<human_gene> <mouse_ortholog> <fly_ortholog>", species=9606) — check if interaction partners are also conservedReactomeAnalysis_pathway_enrichment(identifiers="<human_gene> <ortholog1> <ortholog2>") — shared pathway membershipStep 6: Organism recommendation Recommend which organism(s) to use for further study. Consider: phenotype match to human condition, available genetic tools, complementary models (e.g., mouse for physiology + fly for genetic screens), practical considerations (cost, throughput, imaging).
OMIM_search(query="<gene_symbol>") — Mendelian disease associationsClinVar_search_variants(query="<gene_symbol>") — pathogenic variantsClinGen_search_gene_validity(gene="<gene_symbol>") — gene-disease validity (Definitive/Strong/Moderate/Limited)HPO_search_terms(query="<disease_name>") — phenotype terms for cross-species comparisonMap HPO terms back to model organism phenotypes (Phase 6) to assess model fidelity.
These problems require computation and logical deduction, not database lookups. Work through the logic step by step.
Time-of-entry mapping: In Hfr x F- crosses, genes transfer in a fixed linear order from the integrated F factor origin. Interrupted mating at different times reveals gene order and map distances (1 minute ~ 1 map unit on the circular E. coli chromosome, ~47 kb).
Key reasoning steps:
lac operon logic (negative inducible):
trp operon attenuation (leader peptide mechanism):
Catabolite repression: Even with inducer present, lac operon requires cAMP-CAP for full expression. High glucose -> low cAMP -> low expression. This is POSITIVE regulation layered on top of the negative repressor system.
Three-point cross (most common exam problem):
Cotransduction frequency (phage P1 mapping in bacteria):
Before finalizing any report:
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.