Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

mims-harvard/tooluniverse-data-integration-analysis

Name: tooluniverse-data-integration-analysis
Author: mims-harvard

skills/tooluniverse-data-integration-analysis/SKILL.md

npx skillsauth add mims-harvard/tooluniverse tooluniverse-data-integration-analysis

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

COMPUTE, DON'T DESCRIBE

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do -- execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

Data Integration Analysis

Bridge the gap between statistical results and biological understanding. After any computational analysis produces significant findings, this skill teaches how to interpret them using ToolUniverse's biological knowledge tools -- the key advantage over platforms that only do data analysis.

IMPORTANT: Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Respond in the user's language.

When to Use This Skill

Apply when:

Statistical analysis produced a list of significant genes, variants, metabolites, or exposures
Users want to go beyond p-values to understand WHY something is significant
Combining computational results with published evidence
Interpreting differential expression, GWAS hits, or association study results biologically
Users ask "what does this result mean?" after running an analysis

NOT for (use other skills instead):

Running the statistical analysis itself --> Use tooluniverse-statistical-modeling or tooluniverse-rnaseq-deseq2
Pure gene enrichment without prior analysis --> Use tooluniverse-gene-enrichment
Pure literature review --> Use tooluniverse-literature-deep-research
Single variant interpretation --> Use tooluniverse-variant-interpretation

Step 1: Statistical Results to Biological Questions

Map each type of significant finding to the right biological question:

| Finding Type | Biological Question | Tool Discovery Query | |---|---|---| | Significant gene list | What pathways are enriched? What functions converge? | find_tools("gene enrichment pathway analysis") | | Significant variant (rsID) | What is the functional impact? Which gene is affected? | find_tools("variant annotation functional impact") | | Significant exposure/chemical | What is the biological mechanism? Which pathways? | find_tools("chemical gene pathway toxicology") | | Significant drug association | What is the molecular target? What is the MOA? | find_tools("drug target mechanism action") | | Significant metabolite | Which metabolic pathway is perturbed? | find_tools("metabolite pathway identification") |

Key principle: Do not stop at "gene X is significant." Ask: significant in what context? Through what mechanism? With what downstream consequence?

Step 2: Multi-Database Evidence Integration

For each significant finding, query multiple sources and synthesize. The pattern:

Literature evidence: Search PubMed/EuropePMC for published studies linking your finding to the phenotype. Look for meta-analyses and systematic reviews first.
Genetic association evidence: Query GWAS Catalog or OpenTargets to check whether genetic evidence independently supports the association.
Pathway context: Query KEGG, Reactome, or WikiPathways to place the finding in a biological pathway. Identify upstream regulators and downstream effectors.
Interaction networks: Query STRING or BioGRID for protein-protein interactions. Look for whether your significant genes cluster in the same network neighborhood.
Clinical relevance: Check ClinVar for variant clinical significance, DGIdb or ChEMBL for druggability, or ClinicalTrials.gov for ongoing interventions.

Evidence grading (grade each piece of evidence):

| Grade | Source Type | Example | |---|---|---| | T1 (Strong) | Randomized clinical trial, Mendelian randomization | "RCT showed drug X reduces outcome Y" | | T2 (Moderate) | Large cohort study, GWAS with replication | "GWAS meta-analysis in 500k subjects" | | T3 (Suggestive) | Case-control study, animal model | "Mouse knockout shows phenotype" | | T4 (Hypothesis) | In silico prediction, pathway inference | "Network analysis suggests involvement" |

Step 3: Causal Reasoning

Statistical association is not causation. Apply these reasoning frameworks:

DAG construction: Before interpreting, sketch the causal directed acyclic graph (DAG).

Identify potential confounders (common causes of exposure and outcome) -- these must be adjusted for.
Identify potential mediators (on the causal path) -- do NOT adjust for these if estimating total effect.
Identify colliders (common effects) -- conditioning on colliders introduces bias.

Triangulation: The same finding supported by different methods with different biases strengthens causal inference.

Observational association + Mendelian randomization + animal experiment = strong triangulated evidence
If MR contradicts observational data, suspect confounding in the observational study

Mendelian randomization logic: Genetic variants (instruments) are assigned at conception, so they are not confounded by lifestyle or reverse causation. If a genetic variant that increases exposure X also increases disease Y, this supports X causing Y. Check instrument strength (F-statistic > 10), exclusion restriction (variant affects Y only through X), and pleiotropy (MR-Egger intercept).

Mediation analysis: If gene G is associated with both exposure and outcome, ask: does the exposure effect on outcome go through G? Use the finding's pathway context (Step 2) to propose mediators, then check if adjusting for the mediator attenuates the effect.

Step 4: Cross-Validation

Before reporting a finding as robust, attempt to falsify it:

Replication: Search literature and datasets (DataCite, GEO, ArrayExpress) for independent datasets where the same finding can be tested. A finding that replicates in an independent cohort is much stronger.
Biological plausibility: Does the mechanism make biological sense? Check if animal or cell models support it (PubMed search for "[gene] knockout [phenotype]" or "[chemical] exposure [cell type]").
Genetic support: Check if GWAS evidence supports the direction of effect. If your analysis says gene X is protective but GWAS shows risk alleles increase X expression, there is a contradiction to resolve.
Dose-response: If available, check whether the effect increases with dose. A dose-response relationship strengthens causal inference.
Negative controls: If possible, test the same analysis on a finding where you expect no association. If the negative control also shows an association, suspect a methodological artifact.

Step 5: Actionable Reporting

Structure the integrated report as follows:

Evidence Summary Table

For each significant finding, produce one row:

| Finding | Statistical Evidence | Biological Mechanism | Literature Support | Genetic Support | Evidence Grade | |---|---|---|---|---|---| | Gene X upregulated | FDR=0.001, log2FC=2.3 | PI3K/AKT pathway | 12 papers, 2 RCTs | GWAS: rs123 (p=5e-8) | Strong | | Variant rs456 | OR=1.4, p=2e-6 | Splicing disruption | 3 case reports | eQTL in GTEx | Moderate |

Strength Assessment

Strong: Statistical significance + biological mechanism + independent replication + genetic support
Moderate: Statistical significance + biological mechanism + partial replication
Weak: Statistical significance + plausible mechanism but no independent support
Suggestive: Statistical trend + computational prediction only

Knowledge Gaps and Next Steps

Which findings lack replication? Propose specific datasets to test in.
Which mechanisms are inferred but not experimentally validated? Propose experiments.
Which findings have conflicting evidence? State the contradiction explicitly.
Generate testable hypotheses: "If finding X is causal, then [experimental prediction]."

Clinical or Public Health Implications

State whether the finding is actionable now or requires further validation.
If druggable, identify existing therapeutics and their development stage.
If a biomarker, assess sensitivity/specificity for clinical utility.

mims-harvard/tooluniverse-data-integration-analysis

skills/tooluniverse-data-integration-analysis/SKILL.md

Integrate computed statistical results (DEGs, GWAS hits, associations) with biological context from ToolUniverse databases (UniProt, GO, Reactome, ClinVar, OpenTargets). Use for adding gene function/pathway/disease annotations to a result list, building biological narrative around statistical findings, and going beyond p-values to mechanism.

1,384 stars

tools

Updated May 28, 2026

$ install --global

skillsauth

npx skillsauth add mims-harvard/tooluniverse tooluniverse-data-integration-analysis

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 22, 2026, 6:16 AM131.7s1 file scanned

SKILL.md

name:: tooluniverse-data-integration-analysis
description:: Integrate computed statistical results (DEGs, GWAS hits, associations) with biological context from ToolUniverse databases (UniProt, GO, Reactome, ClinVar, OpenTargets). Use for adding gene function/pathway/disease annotations to a result list, building biological narrative around statistical findings, and going beyond p-values to mechanism.
disable-model-invocation:: true

COMPUTE, DON'T DESCRIBE

Data Integration Analysis

IMPORTANT: Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Respond in the user's language.

When to Use This Skill

Apply when:

Statistical analysis produced a list of significant genes, variants, metabolites, or exposures
Users want to go beyond p-values to understand WHY something is significant
Combining computational results with published evidence
Interpreting differential expression, GWAS hits, or association study results biologically
Users ask "what does this result mean?" after running an analysis

NOT for (use other skills instead):

Running the statistical analysis itself --> Use tooluniverse-statistical-modeling or tooluniverse-rnaseq-deseq2
Pure gene enrichment without prior analysis --> Use tooluniverse-gene-enrichment
Pure literature review --> Use tooluniverse-literature-deep-research
Single variant interpretation --> Use tooluniverse-variant-interpretation

Step 1: Statistical Results to Biological Questions

Map each type of significant finding to the right biological question:

Key principle: Do not stop at "gene X is significant." Ask: significant in what context? Through what mechanism? With what downstream consequence?

Step 2: Multi-Database Evidence Integration

For each significant finding, query multiple sources and synthesize. The pattern:

Literature evidence: Search PubMed/EuropePMC for published studies linking your finding to the phenotype. Look for meta-analyses and systematic reviews first.
Genetic association evidence: Query GWAS Catalog or OpenTargets to check whether genetic evidence independently supports the association.
Pathway context: Query KEGG, Reactome, or WikiPathways to place the finding in a biological pathway. Identify upstream regulators and downstream effectors.
Interaction networks: Query STRING or BioGRID for protein-protein interactions. Look for whether your significant genes cluster in the same network neighborhood.
Clinical relevance: Check ClinVar for variant clinical significance, DGIdb or ChEMBL for druggability, or ClinicalTrials.gov for ongoing interventions.

Evidence grading (grade each piece of evidence):

Step 3: Causal Reasoning

Statistical association is not causation. Apply these reasoning frameworks:

DAG construction: Before interpreting, sketch the causal directed acyclic graph (DAG).

Identify potential confounders (common causes of exposure and outcome) -- these must be adjusted for.
Identify potential mediators (on the causal path) -- do NOT adjust for these if estimating total effect.
Identify colliders (common effects) -- conditioning on colliders introduces bias.

Triangulation: The same finding supported by different methods with different biases strengthens causal inference.

Observational association + Mendelian randomization + animal experiment = strong triangulated evidence
If MR contradicts observational data, suspect confounding in the observational study

Step 4: Cross-Validation

Before reporting a finding as robust, attempt to falsify it:

Replication: Search literature and datasets (DataCite, GEO, ArrayExpress) for independent datasets where the same finding can be tested. A finding that replicates in an independent cohort is much stronger.
Biological plausibility: Does the mechanism make biological sense? Check if animal or cell models support it (PubMed search for "[gene] knockout [phenotype]" or "[chemical] exposure [cell type]").
Genetic support: Check if GWAS evidence supports the direction of effect. If your analysis says gene X is protective but GWAS shows risk alleles increase X expression, there is a contradiction to resolve.
Dose-response: If available, check whether the effect increases with dose. A dose-response relationship strengthens causal inference.
Negative controls: If possible, test the same analysis on a finding where you expect no association. If the negative control also shows an association, suspect a methodological artifact.

Step 5: Actionable Reporting

Structure the integrated report as follows:

Evidence Summary Table

For each significant finding, produce one row:

Strength Assessment

Strong: Statistical significance + biological mechanism + independent replication + genetic support
Moderate: Statistical significance + biological mechanism + partial replication
Weak: Statistical significance + plausible mechanism but no independent support
Suggestive: Statistical trend + computational prediction only

Knowledge Gaps and Next Steps

Which findings lack replication? Propose specific datasets to test in.
Which mechanisms are inferred but not experimentally validated? Propose experiments.
Which findings have conflicting evidence? State the contradiction explicitly.
Generate testable hypotheses: "If finding X is causal, then [experimental prediction]."

Clinical or Public Health Implications

State whether the finding is actionable now or requires further validation.
If druggable, identify existing therapeutics and their development stage.
If a biomarker, assess sensitivity/specificity for clinical utility.

Related Skills

mims-harvard/tooluniverse-self-review

tools

VerifiedTrustedCommunity

Generate the success criteria for a task or question, then review work against them. Given a task, goal, or open-ended question, decompose it into scenarios, evaluation perspectives, and fine-grained weighted YES/NO criteria using the Recursive Expansion Tree (RET) method; if work is supplied, score it criterion-by-criterion and surface what is missing or could be better. Use when asked to self-review or check your own work, judge whether a task is done well or completely, build a definition-of-done or completeness checklist, create an evaluation rubric or grading criteria, score or grade answers to a question, set up an LLM-as-judge rubric, or when the user mentions self-review, completeness check, success criteria, evaluation criteria, scoring rubric, Qworld, or the RET algorithm.

1,583SKILL.mdUpdated Jul 22, 2026

mims-harvard/tooluniverse-self-review

mims-harvard/tooluniverse-peptide-target-deorphanization

tools

VerifiedTrustedCommunity

Find the real protein target(s) of a peptide from its sequence — peptide target deorphanization / off-target identification, for ANY target class (GPCR, ion channel, protease, cytokine/growth-factor receptor, enzyme, integrin), not only GPCRs. Use when a peptide has a phenotype but does not bind its hypothesized target, when a peptide binds a target in one species or assay but not another, or to screen candidate targets for an orphan peptide. A target-class router steers a multi-route keyless pipeline (PROSITE/ELM motif, BLAST homology, HGNC/InterPro/GPCRdb/GtoPdb target-family enumeration, OpenTargets phenotype anchor, EnsemblCompara/Alliance cross-species reconciliation) plus optional NVIDIA-NIM co-folding (Boltz2, AlphaFold2-Multimer, OpenFold3) for structural confirmation.

1,583SKILL.mdUpdated Jul 22, 2026

mims-harvard/tooluniverse-peptide-target-deorphanization

mims-harvard/tooluniverse-cs-setup

tools

VerifiedTrustedCommunity

Install or update ToolUniverse in Claude Science — create the conda env, install the tooluniverse pip package, and (re)build the tooluniverse-research skill by fetching the current workflow library from GitHub. Use for first-time setup, upgrading the ToolUniverse version, refreshing the bundled workflows after an upstream release, or reinstalling on a new machine.

1,583SKILL.mdUpdated Jul 22, 2026

mims-harvard/tooluniverse-cs-setup

mims-harvard/tooluniverse-codex-plugin

tools

VerifiedTrustedCommunity

Install, set up, verify, update, pin, uninstall, or troubleshoot the ToolUniverse plugin on OpenAI Codex. ALWAYS consult this skill for any of those — don't answer from memory, because the exact marketplace name (mims-harvard/ToolUniverse), the "codex plugin marketplace add" then "codex plugin add -m tooluniverse" flow, Codex's startup auto-upgrade behavior, the uvx tooluniverse MCP server, and the API-key env vars are easy to get wrong. Use it whenever someone wants to get ToolUniverse (or "the 1000+ scientific tools" / "the harvard tools") working on Codex, says the Codex plugin or its tools/skills won't load, hits a uvx or MCP-server startup error, asks how Codex updates it, wants to pin or remove it, or finds it running an old tool version — even if they never say the word "plugin". Not for the Claude Code plugin (use tooluniverse-claude-code-plugin), for running research with the tools, or for authoring new tools or skills.

1,583SKILL.mdUpdated Jul 22, 2026

mims-harvard/tooluniverse-codex-plugin

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/mims-harvard/tooluniverse.git

# Copy into Claude Code skills folder (global)
cp -r tooluniverse/skills/tooluniverse-data-integration-analysis ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

mims-harvard/tooluniverse

1,384 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT