skills/tooluniverse-multi-omics-integration/SKILL.md
Multi-omics integration — orchestrate per-layer analysis (transcriptomics, proteomics, epigenomics, genomics, metabolomics) then perform cross-omics correlation, multi-omics clustering, and pathway-level integration. Use for integrative systems-biology analysis, multi-modal disease characterization, and cross-omics biomarker discovery.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-multi-omics-integrationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Coordinate and integrate multiple omics datasets for comprehensive systems biology analysis. Orchestrates specialized ToolUniverse skills to perform cross-omics correlation, multi-omics clustering, pathway-level integration, and unified interpretation.
Multi-omics integration asks whether different molecular layers tell a concordant story. If a gene is upregulated in RNA-seq AND its protein is elevated in proteomics, that is concordant evidence of true biological change. Discordance — high mRNA but low protein, or elevated protein without matching mRNA — may indicate post-transcriptional regulation (miRNA silencing, protein degradation, translational control) and is itself a meaningful finding worth reporting. Not every discordance is noise; some are the most interesting biology.
ReactomeAnalysis_pathway_enrichment or gseapy on the actual gene lists; never list enriched pathways from memory.Phase 1: Data Loading & QC
Load each omics type, format-specific QC, normalize
Supported: RNA-seq, proteomics, methylation, CNV/SNV, metabolomics
Phase 2: Sample Matching
Harmonize sample IDs, find common samples, handle missing omics
Phase 3: Feature Mapping
Map features to common gene-level identifiers
CpG->gene (promoter), CNV->gene, metabolite->enzyme
Phase 4: Cross-Omics Correlation
RNA vs Protein (translation efficiency)
Methylation vs Expression (epigenetic regulation)
CNV vs Expression (dosage effect)
eQTL variants vs Expression (genetic regulation)
Phase 5: Multi-Omics Clustering
MOFA+, NMF, SNF for patient subtyping
Phase 6: Pathway-Level Integration
Aggregate omics evidence at pathway level
Score pathway dysregulation with combined evidence
Phase 7: Biomarker Discovery
Feature selection across omics, multi-omics classification
Phase 8: Integrated Report
Summary, correlations, clusters, pathways, biomarkers
See: phase_details.md for complete code and implementation details.
| Omics | Formats | QC Focus | |-------|---------|----------| | Transcriptomics | CSV/TSV, HDF5, h5ad | Low-count filter, normalize (TPM/DESeq2), log-transform | | Proteomics | MaxQuant, Spectronaut, DIA-NN | Missing value imputation, median/quantile normalization | | Methylation | IDAT, beta matrices | Failed probes, batch correction, cross-reactive filter | | Genomics | VCF, SEG (CNV) | Variant QC, CNV segmentation | | Metabolomics | Peak tables | Missing values, normalization |
def match_samples_across_omics(omics_data_dict):
"""Match samples across multiple omics datasets."""
sample_ids = {k: set(df.columns) for k, df in omics_data_dict.items()}
common_samples = set.intersection(*sample_ids.values())
matched_data = {k: df[sorted(common_samples)] for k, df in omics_data_dict.items()}
return sorted(common_samples), matched_data
from scipy.stats import spearmanr, pearsonr
# RNA vs Protein: expect positive r ~ 0.4-0.6
# Methylation vs Expression: expect negative r (promoter repression)
# CNV vs Expression: expect positive r (dosage effect)
for gene in common_genes:
r, p = spearmanr(rna[gene], protein[gene])
# Score pathway dysregulation using combined evidence from all omics
# Aggregate per-gene evidence, then per-pathway
pathway_score = mean(abs(rna_fc) + abs(protein_fc) + abs(meth_diff) + abs(cnv))
See: phase_details.md for full implementations of each operation.
| Method | Description | Best For | |--------|-------------|----------| | MOFA+ | Latent factors explaining cross-omics variation | Identifying shared/omics-specific drivers | | Joint NMF | Shared decomposition across omics | Patient subtype discovery | | SNF | Similarity network fusion | Integrating heterogeneous data types |
| Skill | Used For | Phase |
|-------|----------|-------|
| tooluniverse-rnaseq-deseq2 | RNA-seq analysis | 1, 4 |
| tooluniverse-epigenomics | Methylation, ChIP-seq | 1, 4 |
| tooluniverse-variant-analysis | CNV/SNV processing | 1, 3, 4 |
| tooluniverse-protein-interactions | Protein network context | 6 |
| tooluniverse-gene-enrichment | Pathway enrichment | 6 |
| tooluniverse-expression-data-retrieval | Public data retrieval | 1 |
| tooluniverse-target-research | Gene/protein annotation | 3, 8 |
Integrate TCGA RNA-seq + proteomics + methylation + CNV to identify patient subtypes, cross-omics driver genes, and multi-omics biomarkers.
Identify SNP -> methylation -> expression regulatory chains (mediation analysis).
Predict drug response using baseline multi-omics profiles; identify resistance/sensitivity pathways.
See: phase_details.md "Use Cases" for detailed step-by-step workflows.
| Component | Requirement | |-----------|-------------| | Omics types | At least 2 datasets | | Common samples | At least 10 across omics | | Cross-correlation | Pearson/Spearman computed | | Clustering | At least one method (MOFA+, NMF, or SNF) | | Pathway integration | Enrichment with multi-omics evidence scores | | Report | Summary, correlations, clusters, pathways, biomarkers |
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).