skills/tooluniverse-data-integration-analysis/SKILL.md
Integrate computed statistical results (DEGs, GWAS hits, associations) with biological context from ToolUniverse databases (UniProt, GO, Reactome, ClinVar, OpenTargets). Use for adding gene function/pathway/disease annotations to a result list, building biological narrative around statistical findings, and going beyond p-values to mechanism.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-data-integration-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do -- execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Bridge the gap between statistical results and biological understanding. After any computational analysis produces significant findings, this skill teaches how to interpret them using ToolUniverse's biological knowledge tools -- the key advantage over platforms that only do data analysis.
IMPORTANT: Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Respond in the user's language.
Apply when:
NOT for (use other skills instead):
tooluniverse-statistical-modeling or tooluniverse-rnaseq-deseq2tooluniverse-gene-enrichmenttooluniverse-literature-deep-researchtooluniverse-variant-interpretationMap each type of significant finding to the right biological question:
| Finding Type | Biological Question | Tool Discovery Query |
|---|---|---|
| Significant gene list | What pathways are enriched? What functions converge? | find_tools("gene enrichment pathway analysis") |
| Significant variant (rsID) | What is the functional impact? Which gene is affected? | find_tools("variant annotation functional impact") |
| Significant exposure/chemical | What is the biological mechanism? Which pathways? | find_tools("chemical gene pathway toxicology") |
| Significant drug association | What is the molecular target? What is the MOA? | find_tools("drug target mechanism action") |
| Significant metabolite | Which metabolic pathway is perturbed? | find_tools("metabolite pathway identification") |
Key principle: Do not stop at "gene X is significant." Ask: significant in what context? Through what mechanism? With what downstream consequence?
For each significant finding, query multiple sources and synthesize. The pattern:
Evidence grading (grade each piece of evidence):
| Grade | Source Type | Example | |---|---|---| | T1 (Strong) | Randomized clinical trial, Mendelian randomization | "RCT showed drug X reduces outcome Y" | | T2 (Moderate) | Large cohort study, GWAS with replication | "GWAS meta-analysis in 500k subjects" | | T3 (Suggestive) | Case-control study, animal model | "Mouse knockout shows phenotype" | | T4 (Hypothesis) | In silico prediction, pathway inference | "Network analysis suggests involvement" |
Statistical association is not causation. Apply these reasoning frameworks:
DAG construction: Before interpreting, sketch the causal directed acyclic graph (DAG).
Triangulation: The same finding supported by different methods with different biases strengthens causal inference.
Mendelian randomization logic: Genetic variants (instruments) are assigned at conception, so they are not confounded by lifestyle or reverse causation. If a genetic variant that increases exposure X also increases disease Y, this supports X causing Y. Check instrument strength (F-statistic > 10), exclusion restriction (variant affects Y only through X), and pleiotropy (MR-Egger intercept).
Mediation analysis: If gene G is associated with both exposure and outcome, ask: does the exposure effect on outcome go through G? Use the finding's pathway context (Step 2) to propose mediators, then check if adjusting for the mediator attenuates the effect.
Before reporting a finding as robust, attempt to falsify it:
Structure the integrated report as follows:
For each significant finding, produce one row:
| Finding | Statistical Evidence | Biological Mechanism | Literature Support | Genetic Support | Evidence Grade | |---|---|---|---|---|---| | Gene X upregulated | FDR=0.001, log2FC=2.3 | PI3K/AKT pathway | 12 papers, 2 RCTs | GWAS: rs123 (p=5e-8) | Strong | | Variant rs456 | OR=1.4, p=2e-6 | Splicing disruption | 3 case reports | eQTL in GTEx | Moderate |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.