metabolomics/xcms-preprocessing/SKILL.md
XCMS3 workflow for LC-MS/MS metabolomics preprocessing. Covers peak detection, retention time alignment, correspondence (grouping), and gap filling. Use when processing raw LC-MS data into a feature table for untargeted metabolomics.
npx skillsauth add GPTomics/bioSkills bio-metabolomics-xcms-preprocessingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: MSnbase 2.28+, scanpy 1.10+, xcms 4.0+
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_name to verify parametersIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Requires Bioconductor 3.18+ with xcms 4.0+ and MSnbase 2.28+.
Goal: Import raw LC-MS files into R for downstream peak detection and alignment.
Approach: Read mzML/mzXML files into an OnDiskMSnExp object using MSnbase for memory-efficient access.
"Process my raw LC-MS data into a feature table" -> Detect chromatographic peaks, align retention times across samples, group corresponding peaks, and fill missing values to produce a sample-by-feature intensity matrix.
library(xcms)
library(MSnbase)
# Read mzML/mzXML files
raw_files <- list.files('raw_data', pattern = '\\.(mzML|mzXML)$', full.names = TRUE)
# Create OnDiskMSnExp object
raw_data <- readMSData(raw_files, mode = 'onDisk')
# Check data
raw_data
table(msLevel(raw_data))
Goal: Attach sample metadata (group labels, injection order) to the raw data object.
Approach: Create a data frame of sample information and assign it to the phenoData slot.
# Sample metadata
sample_info <- data.frame(
sample_name = basename(raw_files),
sample_group = c(rep('Control', 5), rep('Treatment', 5), rep('QC', 3)),
injection_order = 1:length(raw_files)
)
# Assign to phenoData
pData(raw_data) <- sample_info
Goal: Identify chromatographic peaks in centroided LC-MS data.
Approach: Use the CentWave algorithm which detects peaks by continuous wavelet transform on regions of interest defined by m/z and RT.
# CentWave algorithm for centroided data
cwp <- CentWaveParam(
peakwidth = c(5, 30), # Peak width range in seconds
ppm = 15, # m/z tolerance
snthresh = 10, # Signal-to-noise threshold
prefilter = c(3, 1000), # Min peaks and intensity
mzdiff = 0.01, # Minimum m/z difference
noise = 1000, # Noise level
integrate = 1 # Integration method
)
# Run peak detection
xdata <- findChromPeaks(raw_data, param = cwp)
# Summary
head(chromPeaks(xdata))
cat('Peaks found:', nrow(chromPeaks(xdata)), '\n')
Goal: Detect peaks in profile (non-centroided) LC-MS data.
Approach: Use the MatchedFilter algorithm designed for continuum data, which convolves with a Gaussian model peak.
# MatchedFilter for profile/continuum data
mfp <- MatchedFilterParam(
binSize = 0.1,
fwhm = 30,
snthresh = 10,
step = 0.1,
mzdiff = 0.8
)
xdata_profile <- findChromPeaks(raw_data, param = mfp)
Goal: Correct retention time drift across samples to enable peak correspondence.
Approach: Apply Obiwarp alignment which uses dynamic time warping on the TIC profiles to compute sample-wise RT corrections.
# Obiwarp alignment (recommended)
obp <- ObiwarpParam(
binSize = 0.5,
response = 1,
distFun = 'cor_opt',
gapInit = 0.3,
gapExtend = 2.4
)
xdata <- adjustRtime(xdata, param = obp)
# Check alignment
plotAdjustedRtime(xdata)
Goal: Group corresponding chromatographic peaks across samples into consensus features.
Approach: Use peak density-based grouping which models the RT distribution of peaks in m/z slices to identify features present across samples.
# Group peaks across samples
pdp <- PeakDensityParam(
sampleGroups = pData(xdata)$sample_group,
bw = 5, # RT bandwidth
minFraction = 0.5, # Min fraction of samples
minSamples = 1, # Min samples per group
binSize = 0.025 # m/z bin size
)
xdata <- groupChromPeaks(xdata, param = pdp)
# Check feature definitions
featureDefinitions(xdata)
cat('Features:', nrow(featureDefinitions(xdata)), '\n')
Goal: Recover signal for features that were missed during initial peak detection in some samples.
Approach: Integrate intensity in the expected m/z-RT region for features with missing values using ChromPeakAreaParam.
# Fill in missing peaks
fpp <- ChromPeakAreaParam()
xdata <- fillChromPeaks(xdata, param = fpp)
# Alternative: FillChromPeaksParam for more control
fpp2 <- FillChromPeaksParam(
expandMz = 0,
expandRt = 0,
ppm = 0
)
Goal: Generate a samples-by-features intensity matrix with m/z and RT annotations for downstream analysis.
Approach: Extract feature values and definitions from the processed XCMSnExp object and combine into an exportable table.
# Get feature values (intensity matrix)
feature_values <- featureValues(xdata, method = 'maxint', value = 'into')
# Feature definitions (m/z, RT)
feature_defs <- featureDefinitions(xdata)
feature_defs <- as.data.frame(feature_defs)
feature_defs$feature_id <- rownames(feature_defs)
# Combine
feature_table <- cbind(feature_defs[, c('feature_id', 'mzmed', 'rtmed')], feature_values)
rownames(feature_table) <- feature_table$feature_id
# Save
write.csv(feature_table, 'feature_table.csv', row.names = FALSE)
Goal: Assess preprocessing quality through TIC plots, peak counts, RT correction, and PCA.
Approach: Visualize total ion chromatograms, per-sample peak counts, RT adjustment, and PCA of the feature matrix.
# TIC for each sample
tic <- chromatogram(raw_data, aggregationFun = 'sum')
plot(tic)
# Peak count per sample
peak_counts <- table(chromPeaks(xdata)[, 'sample'])
barplot(peak_counts, main = 'Peaks per sample')
# Check RT correction
par(mfrow = c(1, 2))
plotAdjustedRtime(xdata, col = pData(xdata)$sample_group)
# PCA of features
library(pcaMethods)
log_values <- log2(feature_values + 1)
log_values[is.na(log_values)] <- 0
pca <- pca(t(log_values), nPcs = 3, method = 'ppca')
plotPcs(pca, col = as.factor(pData(xdata)$sample_group))
Goal: Identify isotope patterns and adduct groups among detected peaks to reduce feature redundancy.
Approach: Use CAMERA to group peaks by RT correlation, assign isotope clusters, and annotate adduct types.
library(CAMERA)
# Create CAMERA object
xsa <- xsAnnotate(as(xdata, 'xcmsSet'))
# Group by RT
xsa <- groupFWHM(xsa, perfwhm = 0.6)
# Find isotopes
xsa <- findIsotopes(xsa, mzabs = 0.01, ppm = 10)
# Find adducts
xsa <- findAdducts(xsa, polarity = 'positive')
# Get annotated peak list
camera_results <- getPeaklist(xsa)
Goal: Format the XCMS feature table for import into MetaboAnalyst web or R package.
Approach: Transpose the matrix, create M/Z-RT feature names, and prepend sample group information.
# Format for MetaboAnalyst web or R package
export_data <- t(feature_values)
colnames(export_data) <- paste0('M', round(feature_defs$mzmed, 4), 'T', round(feature_defs$rtmed, 1))
# Add sample info
export_df <- data.frame(Sample = rownames(export_data), Group = pData(xdata)$sample_group, export_data)
write.csv(export_df, 'metaboanalyst_input.csv', row.names = FALSE)
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.