microbiome/differential-abundance/SKILL.md
Differential abundance testing for microbiome data using compositionally-aware methods like ALDEx2, ANCOM-BC2, and MaAsLin2. Use when identifying taxa that differ between experimental groups while accounting for the compositional nature of microbiome data.
npx skillsauth add GPTomics/bioSkills bio-microbiome-differential-abundanceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: DESeq2 1.42+, ggplot2 3.5+, phyloseq 1.46+, scanpy 1.10+
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_name to verify parametersIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Find which taxa differ between my groups" -> Identify differentially abundant taxa between experimental conditions using compositionally-aware methods that account for the relative nature of microbiome data.
ALDEx2::aldex() for CLR-transformed Welch's t-testANCOMBC::ancombc2() for bias-corrected log-linear modelsMaaslin2::Maaslin2() for multivariable associationMicrobiome data is compositional - abundances are relative, not absolute. Standard tests (t-test, DESeq2) can give false positives.
Goal: Identify differentially abundant taxa between groups using a compositionally-aware statistical framework.
Approach: Apply CLR transformation with Monte Carlo sampling on the OTU table, run Welch's t-test per taxon, and filter by FDR-corrected p-value and effect size.
library(ALDEx2)
library(phyloseq)
ps <- readRDS('phyloseq_object.rds')
otu <- as.data.frame(otu_table(ps))
if (!taxa_are_rows(ps)) otu <- t(otu)
# Define groups
groups <- sample_data(ps)$Group
# Run ALDEx2 (CLR transformation + Welch's t-test)
aldex_results <- aldex(otu, groups, mc.samples = 128, test = 'welch',
effect = TRUE, include.sample.summary = FALSE)
# Filter significant
sig_aldex <- aldex_results[aldex_results$we.eBH < 0.05 & abs(aldex_results$effect) > 1, ]
# Volcano-like plot
aldex.plot(aldex_results, type = 'MW', test = 'welch')
library(ANCOMBC)
# Run ANCOM-BC2 with sensitivity analysis
ancom_result <- ancombc2(data = ps, fix_formula = 'Group',
p_adj_method = 'BH', pseudo_sens = TRUE,
prv_cut = 0.1, lib_cut = 1000,
group = 'Group', struc_zero = TRUE)
# Extract results (includes sensitivity analysis)
res_df <- ancom_result$res
# Primary results
sig_ancom <- res_df[res_df$diff_Group == TRUE, ]
# Check sensitivity (passed_ss = passed sensitivity analysis)
robust_hits <- res_df[res_df$diff_Group == TRUE & res_df$passed_ss_Group == TRUE, ]
library(Maaslin2)
# Prepare data
features <- as.data.frame(t(otu_table(ps)))
metadata <- as.data.frame(sample_data(ps))
# Run MaAsLin2
maaslin_results <- Maaslin2(
input_data = features,
input_metadata = metadata,
output = 'maaslin2_output',
fixed_effects = 'Group',
normalization = 'CLR',
transform = 'NONE',
analysis_method = 'LM'
)
# Results in maaslin2_output/all_results.tsv
sig_maaslin <- maaslin_results$results[maaslin_results$results$qval < 0.05, ]
library(DESeq2)
library(phyloseq)
# Convert to DESeq2 (use geometric mean of poscounts)
ps_deseq <- ps
ps_deseq <- prune_samples(sample_sums(ps_deseq) > 1000, ps_deseq)
dds <- phyloseq_to_deseq2(ps_deseq, ~ Group)
dds <- DESeq(dds, test = 'Wald', fitType = 'parametric', sfType = 'poscounts')
res <- results(dds, alpha = 0.05)
sig_deseq <- res[which(res$padj < 0.05 & abs(res$log2FoldChange) > 1), ]
library(ggplot2)
# Volcano plot from ALDEx2
ggplot(aldex_results, aes(x = effect, y = -log10(we.eBH))) +
geom_point(aes(color = we.eBH < 0.05 & abs(effect) > 1), alpha = 0.6) +
geom_hline(yintercept = -log10(0.05), linetype = 'dashed') +
geom_vline(xintercept = c(-1, 1), linetype = 'dashed') +
scale_color_manual(values = c('grey', 'red')) +
theme_minimal() +
labs(x = 'Effect Size', y = '-log10(Adjusted P-value)')
| Method | Handles | Covariates | Speed | Notes | |--------|---------|------------|-------|-------| | ALDEx2 | Compositionality | Limited | Slow | Best for simple designs | | ANCOM-BC2 | Compositionality, zeros, sensitivity | Yes | Medium | Recommended for complex designs | | MaAsLin2 | Compositionality | Yes | Fast | Good for longitudinal | | DESeq2 | Sparsity (less ideal) | Yes | Fast | Use with caution for microbiome |
development
Find restriction enzyme cut sites in DNA sequences using Biopython Bio.Restriction. Search with single enzymes, batches of enzymes, or commercially available enzyme sets. Returns cut positions for linear or circular DNA. Use when finding restriction enzyme cut sites in sequences.
development
Create restriction maps showing enzyme cut positions on DNA sequences using Biopython Bio.Restriction. Visualize cut sites, calculate distances between sites, and generate text or graphical maps. Use when creating or analyzing restriction maps.
development
Analyze restriction digest fragments using Biopython Bio.Restriction. Predict fragment sizes, get fragment sequences, simulate gel electrophoresis patterns, and perform double digests. Use when analyzing restriction digest fragment patterns.
development
Select restriction enzymes by criteria using Biopython Bio.Restriction. Find enzymes that cut once, don't cut, produce specific overhangs, are commercially available, or have compatible ends for cloning. Use when selecting restriction enzymes for cloning or analysis.