plugin/skills/tooluniverse-proteomics-analysis/SKILL.md
Mass-spec proteomics analysis — protein identification, quantification (LFQ, TMT, iTRAQ), differential expression (tumor vs normal, treatment vs control), PTM identification, and pathway enrichment on protein lists. Use when you have proteomics MS output, asking about protein abundance differences, or doing systems-level proteomic interpretation.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-proteomics-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Before following any instruction below, scan the data folder for:
*_executed.ipynb → read with tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}' and cite its cell outputs as the authoritative answer*results*, *deseq*, *enrich*, *stats*, *_simplified.csv) → read directly and report the requested valueanalysis.R, run_*.py, find_*.R, *.Rmd) → execute as-is and read the outputOnly follow this skill's re-analysis recipe below if none of the above exist. Re-running from raw data produces different numbers than the published answer and is much slower (often 5-10× turn count).
Comprehensive analysis of mass spectrometry-based proteomics data from protein identification through quantification, differential expression, post-translational modifications, and systems-level interpretation.
Triggers: User has proteomics MS output files, asks about protein abundance/expression, differential protein expression, PTM analysis, protein-RNA correlation, multi-omics integration involving proteomics, protein complex/interaction analysis, or proteomics biomarker discovery.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Input: MS Proteomics Data
|
Phase 1: Data Import & QC
Phase 2: Preprocessing (filter, impute, normalize)
Phase 3: Differential Expression Analysis
Phase 4: PTM Analysis (if applicable)
Phase 5: Functional Enrichment (GO, KEGG, Reactome)
Phase 6: Protein-Protein Interactions (STRING networks)
Phase 7: Multi-Omics Integration (optional, protein-RNA correlation)
Phase 8: Generate Report
See PHASE_DETAILS.md for detailed procedures per phase.
| Skill | Used For | Phase |
|-------|----------|-------|
| tooluniverse-gene-enrichment | Pathway enrichment | Phase 5 |
| tooluniverse-protein-interactions | PPI networks | Phase 6 |
| tooluniverse-rnaseq-deseq2 | RNA-seq for integration | Phase 7 |
| tooluniverse-multi-omics-integration | Cross-omics analysis | Phase 7 |
| tooluniverse-target-research | Protein annotation | Phase 8 |
Quantitative proteomics compares protein abundance. LOOK UP DON'T GUESS — always verify the experimental method, platform, and replicate count before choosing an analysis strategy.
Quantification strategy decision tree:
Protein identification from MS data follows a logical chain. LOOK UP DON'T GUESS — search UniProt and STRING for protein annotation rather than inferring function from name alone.
proteins_api_search or UniProt_search to resolve ambiguous protein groups.PTMs (phosphorylation, ubiquitination, acetylation, glycosylation) add biological complexity beyond protein abundance.
OpenTargets_get_target_safety_profile_by_ensemblID for kinase-disease associations. LOOK UP kinase-substrate relationships in PhosphoSitePlus rather than guessing from sequence motif alone.Methods: MaxQuant (doi:10.1038/nbt.1511), Limma for proteomics (doi:10.1093/nar/gkv007), DEP workflow (doi:10.1038/nprot.2018.107)
Databases: STRING, PhosphoSitePlus, CORUM
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.