metabolomics/lipidomics/SKILL.md
Assigns honest lipid annotation levels, designs class-based internal-standard quantification, and runs lipid-aware differential and enrichment analysis with lipidr, guarding against in-source-fragment phantoms, sn-position over-claims, and invalid cross-class quantification. Use when naming or canonicalizing lipid species (shorthand separators, Goslin), deciding shotgun vs RP vs HILIC LC-MS, picking internal standards (SPLASH/EquiSPLASH), interpreting MS-DIAL/LipidSearch output, or comparing lipid classes. For general feature detection see metabolomics/xcms-preprocessing and metabolomics/msdial-preprocessing; for non-lipid annotation confidence see metabolomics/metabolite-annotation; for normalization/QC see metabolomics/normalization-qc; for multivariate stats see metabolomics/statistical-analysis.
npx skillsauth add GPTomics/bioSkills bio-metabolomics-lipidomicsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: lipidr 2.16+, pygoslin 2.0+, MS-DIAL 5+
The achievable annotation level is fixed by the acquired evidence, not the software: sn-position and double-bond localization require EAD/OzID/PB/UVPD data that routine CID never produces, and class-resolved quantification requires one isotope-labeled internal standard per class. Verify both before trusting a name or a number.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('lipidr') then ?function_name to verify parameterspip show pygoslin then help(module.function) to check signaturesIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Analyze my lipidomics data" -> Canonicalize names to the resolution level the evidence supports, quantify each class against its own standard, then run class/chain-aware differential and enrichment analysis.
lipidr::read_skyline() / as_lipidomics_experiment(), de_analysis(), lsea()pygoslin (Python) or rgoslin (R) for parsing/canonicalizationThe Liebisch/LIPID MAPS shorthand encodes, in its punctuation, exactly how much structure was measured: PC 34:1 (space, sum composition) < PC 16:0_18:1 (underscore, chains known) < PC 16:0/18:1 (slash, sn-resolved) < PC 16:0/18:1(9Z) (double-bond position+geometry). The resolution level is a property of the evidence, not of the string. Tools manufacture overstatement three ways: a formatter that only knows /, an in-silico library entry authored at sn-level that a species-level match inherits, and "annotate to the nearest database structure" silently promoting a sum composition to a full structure. sn-position is almost never genuinely measured under CID, so treat every / as an unproven _ until EAD/UVPD/derivatization evidence is in hand. The default rule is: when in doubt, drop a level.
| Notation | Separator | What was measured | What may NOT be claimed |
|----------|-----------|-------------------|-------------------------|
| PC 34:1 | space | class + total carbons:double-bonds (accurate mass + isotope + class diagnostic) | the two chains; sn; C=C position |
| PC 16:0_18:1 | underscore _ | the two acyl chains (MS/MS acyl losses, RT/ECN-consistent, not an in-source fragment) | which chain is sn-1 vs sn-2 |
| PC 16:0/18:1 | slash / | sn-1/sn-2 assignment (EAD/UVPD/enzymatic - not a CID acyl-loss intensity guess) | C=C position/geometry |
| PC 16:0/18:1(9Z) | parentheses | exact double-bond position + cis/trans (OzID/PB/EAD/UVPD) | (full structure) |
| PC O-34:1 / PC P-34:1 | O- ether / P- plasmalogen | ether vs vinyl-ether linkage (diagnostic ion or acid-lability) | a sum composition alone cannot distinguish P-34:1 from O-34:2 (vinyl ether = ether + one C=C) |
| Cer 18:1;O2/16:0 | ;O2 | sphingoid hydroxyl count (old d18:1) - measured, not assumed | backbone unsaturation if d18:1 was a default rather than fragment-confirmed |
Canonicalize every name through Goslin before merging tables or querying LIPID MAPS; never string-match lipid names by hand. Goslin preserves a false / faithfully - it is necessary but not sufficient.
| Question / situation | Approach | Why |
|----------------------|----------|-----|
| Accurate class-level quantification, high throughput | Shotgun (direct infusion) or HILIC-LC-MS | constant concentration / class bands -> clean ratio to a co-eluting class IS |
| Resolve isobars/isomers, deep low-abundance coverage | RP-LC-MS (± ion mobility) | RT axis adds an identity coordinate; co-elution flags in-source fragments |
| Double-bond position, sn-position, ether/plasmalogen | LC-MS + EAD/OzID/PB/UVPD (± IM) | only these break C=C / glycerol backbone; CID is blind to them |
| Spatial localization | MS-imaging (MS-DIAL 5 spatial mode) | tissue context with predicted-CCS database |
| Need PC acyl chains | negative-mode formate/acetate adduct -> [M-CH3]- | [M+H]+ gives only the m/z 184 head-group ion (class, no chains) |
| Neutral lipids (TG/DG) chains | [M+NH4]+ adduct | drives neutral-loss-of-fatty-acid fragmentation |
| Suspicious elevated LPC / DG / FA pool | RT co-elution test vs the parent class | an LPC eluting at a PC's RT is an in-source fragment, not biology |
| An apparent odd-chain species (PC 33:1) | require MS/MS chain confirmation | usually an in-source fragment or 13C-isotope artifact of an even neighbor |
| Merge names across tools / before a DB lookup | Goslin canonicalization first | abbreviations and separators are tool-specific; hand string-matching corrupts merges |
| Untargeted oxidized-lipid claim | escalate to a targeted, standard-anchored oxylipin panel | untargeted oxidized-lipid IDs are hypotheses; auto-oxidation in the tube fabricates them |
Goal: Import a quantified lipid table, normalize within class, and find lipids that differ between groups with class/chain-aware output.
Approach: Read a Skyline/matrix export into a LipidomicsExperiment, attach sample groups, normalize (PQN or class internal standard), then de_analysis with an explicit contrast; visualize as a class-faceted volcano.
library(lipidr)
# data_normalized ships with lipidr (PQN-normalized, log2); substitute a real import:
# d <- read_skyline(list.files(datadir, 'data.csv', full.names = TRUE))
# d <- add_sample_annotation(d, 'clinical.csv')
# d <- normalize_pqn(d, measure = 'Area', exclude = 'blank', log = TRUE)
data(data_normalized)
# Contrast references sample-group labels directly; group_col defaults to the first annotation
de_results <- de_analysis(data_normalized, HighFat_water - NormalDiet_water, measure = 'Area')
# logFC.cutoff is on the log2 scale used by limma's topTable inside de_analysis
sig <- significant_molecules(de_results, p.cutoff = 0.05, logFC.cutoff = 1)
plot_results_volcano(de_results, show.labels = FALSE)
Goal: Convert per-class signal to comparable abundances without baking in class-dependent ionization error.
Approach: Ratio each species to a stable-isotope-labeled standard of its OWN class, spiked before extraction so it shares the class's recovery loss; never quantify one class with another class's standard.
# normalize_istd divides each lipid by the internal standard of its matched class.
# Requires one labeled IS per class present in the data (e.g. SPLASH/EquiSPLASH covers ~13 classes).
d_istd <- normalize_istd(data_normalized, measure = 'Area', exclude = 'blank', log = TRUE)
# Class-level summary is only valid WITHIN a class unless per-class response factors were calibrated:
# cross-class molar ratios (e.g. 'PE is 3x PC') carry head-group response bias and are not licensed here.
plot_lipidclass(d_istd, 'sd')
Goal: Downgrade any name to the level the evidence supports and verify the claimed level is internally consistent.
Approach: Parse with Goslin, read the perceived level, and re-emit at SPECIES (or MOLECULAR_SPECIES) unless sn/C=C evidence exists.
from pygoslin.parser.Parser import LipidParser
from pygoslin.domain.LipidLevel import LipidLevel
parser = LipidParser()
lipid = parser.parse('PC 16:0/18:1') # a slash-claimed name from a tool export
claimed_level = lipid.lipid.info.level # LipidLevel enum the string asserts
# Without EAD/UVPD evidence, re-emit at the honest molecular-species level (drops the unproven sn):
honest_name = lipid.get_lipid_string(LipidLevel.MOLECULAR_SPECIES) # 'PC 16:0_18:1'
sum_name = lipid.get_lipid_string(LipidLevel.SPECIES) # 'PC 34:1'
/ from CID-only data, or a library back-fills its authored sn arrangement onto a species-level match./-formatted names with no EAD/UVPD/derivatization evidence file attached.MOLECULAR_SPECIES (_); at most state "dominant sn-2 likely X" while reporting _.P- (plasmalogen) from a sum composition.P-34:1 and O-34:2 share elemental composition (vinyl ether = ether + one C=C); mass cannot distinguish them.| Threshold | Source | Rationale | |-----------|--------|-----------| | One isotope-labeled IS per lipid class | Köfeler 2021 (good practice); SPLASH/EquiSPLASH | ESI response is head-group-dominated; one global IS miscalibrates every other class | | EquiSPLASH = 13 deuterated IS at equal 100 µg/mL | Avanti product spec | equimolar comparative use; SPLASH LIPIDOMIX uses unequal physiological concentrations | | Spike IS before extraction | Köfeler 2021 | only a co-extracted IS corrects class-biased recovery (Folch/Bligh-Dyer/MTBE differ for polar minor classes) | | MS-DIAL 5 EAD ~14 eV; 96.4% standards delineated, 78.0% sn/OH/C=C correct >1 µM | Takeda 2024 | structural lipidomics yield even with the modern method is incomplete and concentration-dependent | | ~half of single-software species-level IDs need orthogonal evidence | Köfeler 2021 (Nat Commun) | 510/1108 features, 130/301 PCs & 55/171 TGs violated the ECN/RT model in an audited published set | | LipidSearch grades: keep A/B/C, drop D | LipidSearch grade definitions | D = mass-only; A = class + all chains = molecular-species level, NOT sn/C=C resolved | | Shotgun infusion below the aggregation regime | Han/Gross protocol literature | above it lipids aggregate, ESI response goes nonlinear, the IS-ratio assumption collapses |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| could not find function "read_lipidomes" | non-existent function name | use read_skyline() or as_lipidomics_experiment() |
| plot_enrichment rejects an enrich.results argument | wrong signature | plot_enrichment(de.results, significant.sets, annotation = 'class', measure = 'logFC'); get sets from significant_lipidsets() |
| lsea(type = 'chain') errors | no type argument | lsea tests class/length/unsat sets automatically; rank with rank.by = c('logFC','P.Value','adj.P.Val') |
| de_results$FDR is NULL | wrong column name | de_analysis returns limma columns: adj.P.Val, P.Value, logFC |
| pygoslin LipidLevel.MOLECULAR_SUBSPECIES AttributeError | pre-2.0 enum name | current enum is SPECIES / MOLECULAR_SPECIES / SN_POSITION / STRUCTURE_DEFINED / FULL_STRUCTURE / COMPLETE_STRUCTURE |
| Elevated LPC reported from shotgun data | in-source fragmentation with no RT to flag it | add the in-source-fragment caveat; confirm with LC-MS RT co-elution before claiming lyso biology |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.