skills/tooluniverse-metabolomics-analysis/SKILL.md
Analyze metabolomics data end-to-end — metabolite identification, quantification (TIC normalization, batch correction), differential analysis, and pathway interpretation. Use for processing mass-spec metabolomics output, normalization choice, untargeted metabolomics workflows, and integrating with other omics layers.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-metabolomics-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive analysis of metabolomics data from metabolite identification through quantification, statistical analysis, pathway interpretation, and integration with other omics layers.
Metabolomics quantification depends critically on normalization. Total ion current (TIC) normalization corrects for sample-loading variation and works well for global abundance changes; internal standard normalization is more accurate for targeted analysis where specific metabolite concentrations matter. Missing values in a peak table may reflect signal below the detection limit — not true absence — and should be imputed or handled explicitly rather than treated as zero. Failing to account for batch effects across instrument runs is a frequent source of spurious differential metabolites.
Metabolite_search and Metabolite_get_info to confirm names, CIDs, and HMDB IDs; never assume identity from m/z alone.Metabolite_get_diseases; do not infer clinical relevance without database evidence.Triggers:
Example Questions:
| Capability | Description | |-----------|-------------| | Data Import | LC-MS, GC-MS, NMR, targeted/untargeted platforms | | Metabolite Identification | Match to HMDB, KEGG, PubChem, spectral libraries | | Quality Control | Peak quality, blank subtraction, internal standard normalization | | Normalization | Probabilistic quotient, total ion current, internal standards | | Statistical Analysis | Univariate and multivariate (PCA, PLS-DA, OPLS-DA) | | Differential Analysis | Identify significant metabolite changes | | Pathway Enrichment | KEGG, Reactome, BioCyc metabolic pathway analysis | | Metabolite-Enzyme Integration | Correlate with expression data | | Flux Analysis | Metabolic flux balance analysis (FBA) | | Biomarker Discovery | Multi-metabolite signatures |
Input: Metabolomics Data (Peak Table or Spectra)
|
v
Phase 1: Data Import & Metabolite Identification
|-- Load peak table or process raw spectra
|-- Match features to HMDB, KEGG (accurate mass +/- 5 ppm)
|-- Confidence scoring (Level 1-4)
|
v
Phase 2: Quality Control & Filtering
|-- CV in QC samples (<30%)
|-- Blank subtraction (sample/blank > 3)
|-- Remove features with >50% missing
|
v
Phase 3: Normalization
|-- Sample-wise: TIC, PQN, or internal standards
|-- Transformation: log2, Pareto, or auto-scaling
|-- Batch effect correction (if multi-batch)
|
v
Phase 4: Exploratory Analysis
|-- PCA for sample clustering
|-- PLS-DA for supervised separation
|-- Outlier detection
|
v
Phase 5: Differential Analysis
|-- t-test / ANOVA / Wilcoxon
|-- Fold change + FDR correction
|-- Volcano plots, heatmaps
|
v
Phase 6: Pathway Analysis
|-- Metabolite set enrichment (MSEA)
|-- KEGG/Reactome pathway mapping
|-- Pathway topology (hub/bottleneck metabolites)
|
v
Phase 7: Multi-Omics Integration
|-- Metabolite-enzyme Spearman correlation
|-- Pathway-level concordance scoring
|-- Metabolic flux inference
|
v
Phase 8: Generate Report
|-- Summary statistics, differential metabolites
|-- Pathway diagrams, biomarker panel
Load peak tables (CSV/TSV) or process raw spectra (mzML). Match features to HMDB by accurate mass (+/- 5 ppm). Assign confidence levels: L1 (standard match), L2 (MS/MS), L3 (mass only), L4 (unknown).
Assess CV in QC samples (reject >30%), compute blank ratios (keep >3x blank), filter features with >50% missing values. Check internal standard recovery (95-105% acceptable).
Three methods available: TIC (simple, assumes similar total abundance), PQN (robust to large changes, recommended), Internal Standard (most accurate with spiked standards). Follow with log2 transform or Pareto scaling.
PCA reveals sample grouping and batch effects. PLS-DA provides supervised separation (report R2 and Q2 for model quality). Flag and investigate outliers.
Welch's t-test (two groups) or ANOVA (multiple groups) with Benjamini-Hochberg FDR correction. Significance thresholds: adj. p < 0.05 and |log2FC| > 1.0.
Map differential metabolites to KEGG compound IDs. Perform MSEA for pathway enrichment. Consider topology: metabolites at pathway hubs (high degree/betweenness centrality) have greater impact.
Correlate metabolite levels with enzyme expression (Spearman). Expected: substrate-enzyme negative correlation (consumption), product-enzyme positive correlation (production). Score pathway dysregulation using combined metabolite + gene evidence.
See report_template.md for full example output.
| Skill | Used For | Phase |
|-------|----------|-------|
| tooluniverse-gene-enrichment | Pathway enrichment | Phase 6 |
| tooluniverse-rnaseq-deseq2 | Enzyme expression for integration | Phase 7 |
| tooluniverse-proteomics-analysis | Protein levels for integration | Phase 7 |
| tooluniverse-multi-omics-integration | Comprehensive integration | Phase 7 |
| Component | Requirement | |-----------|-------------| | Metabolites | At least 50 identified metabolites | | Replicates | At least 3 per condition | | QC | CV < 30% in QC samples, blank subtraction | | Statistical test | t-test or Wilcoxon with FDR correction | | Pathway analysis | MSEA with KEGG or Reactome | | Report | QC, differential metabolites, pathways, visualizations |
Methods:
Databases:
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.