plugin/skills/tooluniverse-data-integration-analysis/SKILL.md
Integrate computed statistical results (DEGs, GWAS hits, associations) with biological context from ToolUniverse databases (UniProt, GO, Reactome, ClinVar, OpenTargets). Use for adding gene function/pathway/disease annotations to a result list, building biological narrative around statistical findings, and going beyond p-values to mechanism.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-data-integration-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do -- execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Bridge the gap between statistical results and biological understanding. After any computational analysis produces significant findings, this skill teaches how to interpret them using ToolUniverse's biological knowledge tools -- the key advantage over platforms that only do data analysis.
IMPORTANT: Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Respond in the user's language.
Apply when:
NOT for (use other skills instead):
tooluniverse-statistical-modeling or tooluniverse-rnaseq-deseq2tooluniverse-gene-enrichmenttooluniverse-literature-deep-researchtooluniverse-variant-interpretationMap each type of significant finding to the right biological question:
| Finding Type | Biological Question | Tool Discovery Query |
|---|---|---|
| Significant gene list | What pathways are enriched? What functions converge? | find_tools("gene enrichment pathway analysis") |
| Significant variant (rsID) | What is the functional impact? Which gene is affected? | find_tools("variant annotation functional impact") |
| Significant exposure/chemical | What is the biological mechanism? Which pathways? | find_tools("chemical gene pathway toxicology") |
| Significant drug association | What is the molecular target? What is the MOA? | find_tools("drug target mechanism action") |
| Significant metabolite | Which metabolic pathway is perturbed? | find_tools("metabolite pathway identification") |
Key principle: Do not stop at "gene X is significant." Ask: significant in what context? Through what mechanism? With what downstream consequence?
For each significant finding, query multiple sources and synthesize. The pattern:
Evidence grading (grade each piece of evidence):
| Grade | Source Type | Example | |---|---|---| | T1 (Strong) | Randomized clinical trial, Mendelian randomization | "RCT showed drug X reduces outcome Y" | | T2 (Moderate) | Large cohort study, GWAS with replication | "GWAS meta-analysis in 500k subjects" | | T3 (Suggestive) | Case-control study, animal model | "Mouse knockout shows phenotype" | | T4 (Hypothesis) | In silico prediction, pathway inference | "Network analysis suggests involvement" |
Statistical association is not causation. Apply these reasoning frameworks:
DAG construction: Before interpreting, sketch the causal directed acyclic graph (DAG).
Triangulation: The same finding supported by different methods with different biases strengthens causal inference.
Mendelian randomization logic: Genetic variants (instruments) are assigned at conception, so they are not confounded by lifestyle or reverse causation. If a genetic variant that increases exposure X also increases disease Y, this supports X causing Y. Check instrument strength (F-statistic > 10), exclusion restriction (variant affects Y only through X), and pleiotropy (MR-Egger intercept).
Mediation analysis: If gene G is associated with both exposure and outcome, ask: does the exposure effect on outcome go through G? Use the finding's pathway context (Step 2) to propose mediators, then check if adjusting for the mediator attenuates the effect.
Before reporting a finding as robust, attempt to falsify it:
Structure the integrated report as follows:
For each significant finding, produce one row:
| Finding | Statistical Evidence | Biological Mechanism | Literature Support | Genetic Support | Evidence Grade | |---|---|---|---|---|---| | Gene X upregulated | FDR=0.001, log2FC=2.3 | PI3K/AKT pathway | 12 papers, 2 RCTs | GWAS: rs123 (p=5e-8) | Strong | | Variant rs456 | OR=1.4, p=2e-6 | Splicing disruption | 3 case reports | eQTL in GTEx | Moderate |
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).