skills/tooluniverse-proteomics-analysis/SKILL.md
Mass-spec proteomics analysis — protein identification, quantification (LFQ, TMT, iTRAQ), differential expression (tumor vs normal, treatment vs control), PTM identification, and pathway enrichment on protein lists. Use when you have proteomics MS output, asking about protein abundance differences, or doing systems-level proteomic interpretation.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-proteomics-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Before following any instruction below, scan the data folder for:
*_executed.ipynb → read with tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}' and cite its cell outputs as the authoritative answer*results*, *deseq*, *enrich*, *stats*, *_simplified.csv) → read directly and report the requested valueanalysis.R, run_*.py, find_*.R, *.Rmd) → execute as-is and read the outputOnly follow this skill's re-analysis recipe below if none of the above exist. Re-running from raw data produces different numbers than the published answer and is much slower (often 5-10× turn count).
Comprehensive analysis of mass spectrometry-based proteomics data from protein identification through quantification, differential expression, post-translational modifications, and systems-level interpretation.
Triggers: User has proteomics MS output files, asks about protein abundance/expression, differential protein expression, PTM analysis, protein-RNA correlation, multi-omics integration involving proteomics, protein complex/interaction analysis, or proteomics biomarker discovery.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Input: MS Proteomics Data
|
Phase 1: Data Import & QC
Phase 2: Preprocessing (filter, impute, normalize)
Phase 3: Differential Expression Analysis
Phase 4: PTM Analysis (if applicable)
Phase 5: Functional Enrichment (GO, KEGG, Reactome)
Phase 6: Protein-Protein Interactions (STRING networks)
Phase 7: Multi-Omics Integration (optional, protein-RNA correlation)
Phase 8: Generate Report
See PHASE_DETAILS.md for detailed procedures per phase.
| Skill | Used For | Phase |
|-------|----------|-------|
| tooluniverse-gene-enrichment | Pathway enrichment | Phase 5 |
| tooluniverse-protein-interactions | PPI networks | Phase 6 |
| tooluniverse-rnaseq-deseq2 | RNA-seq for integration | Phase 7 |
| tooluniverse-multi-omics-integration | Cross-omics analysis | Phase 7 |
| tooluniverse-target-research | Protein annotation | Phase 8 |
Quantitative proteomics compares protein abundance. LOOK UP DON'T GUESS — always verify the experimental method, platform, and replicate count before choosing an analysis strategy.
Quantification strategy decision tree:
Protein identification from MS data follows a logical chain. LOOK UP DON'T GUESS — search UniProt and STRING for protein annotation rather than inferring function from name alone.
proteins_api_search or uniprot_search_proteins to resolve ambiguous protein groups.PTMs (phosphorylation, ubiquitination, acetylation, glycosylation) add biological complexity beyond protein abundance.
OpenTargets_get_target_safety_profile_by_ensemblID for kinase-disease associations. LOOK UP kinase-substrate relationships in PhosphoSitePlus rather than guessing from sequence motif alone.Methods: MaxQuant (doi:10.1038/nbt.1511), Limma for proteomics (doi:10.1093/nar/gkv007), DEP workflow (doi:10.1038/nprot.2018.107)
Databases: STRING, PhosphoSitePlus, CORUM
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).