ecological-genomics/edna-metabarcoding/SKILL.md
Processes eDNA metabarcoding from raw paired-end reads to species tables, navigating ASV (DADA2, UNOISE3) vs OTU (swarm v2) decision (Callahan 2017 vs Schloss multi-copy-16S critique), marker/primer choice (Leray COI, MiFish 12S, 515F/806R 16S, ITS2) with primer-specific bias, OBITools3 v3 command-name break (obi stats plural; .tar.gz taxonomy), tag-jumping with dual-indexing (Schnell 2015; NovaSeq 10x MiSeq), decontam as screening-not-classifier (Davis 2018), read-counts-not-abundance critique (Lamb 2019), site-occupancy modeling (Ficetola 2015), Naive-Bayes calibration limits (Bokulich 2018), and eDNA decay (Strickler 2015). Use when going from raw eDNA FASTQ to species tables, picking marker + denoising pipeline, deciding whether read counts represent abundance, applying occupancy modeling, configuring OBITools3 v3, or interpreting decontam output. Not for clinical 16S microbiome (see microbiome/amplicon-processing).
npx skillsauth add GPTomics/bioSkills bio-ecological-genomics-edna-metabarcodingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: DADA2 1.30+, cutadapt 4.7+, OBITools3 (Python 3), decontam 1.20+, microDecon 1.0+, occumb 1.0+, vsearch 2.27+, swarm 3.1+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturespackageVersion('<pkg>') then ?function_name to verify parameters<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Process eDNA samples to identify species present" -> Trim primers, denoise to ASVs (or cluster to OTUs), detect chimeras, assign taxonomy, filter contamination with negative controls AND DNA concentration, decompose tag-jumping artifacts, and quantify detection uncertainty via site-occupancy modeling. For the foundational eDNA-for-wildlife review, see Bohmann et al. 2014 Trends Ecol Evol 29:358-367.
cutadapt for primer removal (linked-adapter mode)dada2::filterAndTrim() -> dada() -> assignTaxonomy() for ASV pipelineobi stats / obi clean / obi ecotag for OBITools3 (NOTE: v3 plural commands)decontam::isContaminant() for contamination screeningoccumb::occumb() for detection-corrected occurrenceElbrecht & Leese 2015 PLoS One 10:e0130324 and Lamb et al. 2019 Mol Ecol 28:420-430 (meta-analysis) established that metabarcoding read counts have weak-to-moderate, taxon-specific, NONLINEAR correlation with biomass or DNA input. Primer-binding bias dominates; PCR replicates introduce stochasticity. Reporting read counts as abundance without mock-community calibration is malpractice. Modern practice: report PRESENCE/ABSENCE or relative abundance with explicit calibration; use multiple PCR replicates; apply site-occupancy models for detection correction.
A second cornerstone: the ASV-vs-OTU debate is taxon-specific, not universal. Callahan, McMurdie, Holmes 2017 ISME J 11:2639-2643 argued ASVs replace OTUs because modern denoising resolves single-nucleotide differences. Schloss 2021 mSphere 6:e00191-21 showed that for bacterial 16S with 1-15 intra-genomic rRNA copies, a single E. coli strain produces ~7 distinct ASVs, splitting bacterial genomes across artificial clusters. For COI metazoan metabarcoding, ASVs (DADA2/UNOISE3) are recommended; for bacterial 16S, ASVs inflate alpha-diversity and OTUs may be appropriate.
A third: decontam (Davis 2018) is a SCREENING tool, not a deterministic classifier. It flags candidates; biological plausibility check is required before deletion. The default threshold=0.1 over-flags in low-biomass data.
| Method | Output | Strength | Fails when |
|--------|--------|----------|------------|
| DADA2 | Single-nucleotide ASVs | High resolution; learned error model; standard for COI/12S/18S/fungal-ITS | Small datasets (< 100 samples) for error learning; multi-copy bacterial rRNA |
| UNOISE3 (USEARCH/VSEARCH; Edgar 2016) | zOTUs (essentially ASVs) | Fast; algorithmic simplicity | Limited Linux/Mac binary distribution under license |
| Swarm v2 -d 1 --fastidious (Mahé 2015) | Abundance-weighted single-linkage OTUs | Modern OTU pipeline; better than legacy 97% UCLUST | OTUs by design (not single-nt resolution) |
| 97% UCLUST | Classical OTUs | Legacy familiarity | Biologically arbitrary threshold; supersedes by DADA2/swarm |
| VSEARCH global pairwise | Taxonomic assignment via best-hit | Fast, transparent, no training | Conservative; mis-assigns sister species when ref incomplete |
| Naive Bayes (q2-feature-classifier, RDP) | Probabilistic taxonomic assignment | Probabilistic confidence; standard for 16S | Confidence values are scikit-learn calibrated, not true probabilities (Bokulich 2018) |
| SINTAX (Edgar) | Bootstrap-supported taxonomy | Fast; no training | Less accurate than Naive Bayes for divergent sequences |
| LCA (BASTA, MEGAN-LCA) | Lowest common ancestor of multiple hits | Conservative; never over-confident | Can over-merge to high taxonomic ranks |
| Phylogenetic placement (EPA-ng + gappa) | Position on reference tree | Most rigorous; phylogenetically explicit | 10-100x slower; emerging not yet standard |
| decontam | Flagged contaminant candidates | Statistical screening of negative controls and DNA concentration patterns | Output is screening, not classification; needs biological-plausibility check |
| UCHIME3 (in DADA2/VSEARCH) | Chimera detection | Standard for de novo chimera removal | Some divergent chimeras escape |
| Scenario | Recommended approach | Why |
|----------|---------------------|-----|
| Metazoan COI metabarcoding (water, gut content) | mlCOIintF/jgHCO2198 (Leray 2013) primers; DADA2 ASVs | Standard primer set; ASVs preserve single-nt resolution |
| Fish eDNA from water | MiFish-U/E (Miya 2015) 12S primers; DADA2 ASVs | Dominant eDNA fish marker globally |
| Freshwater macroinvertebrate bioassessment | BF1/BR1 freshwater-optimized COI primers | Higher primer-binding inclusivity for aquatic insects |
| Bacterial community 16S | 515F/806R (V4) Parada modified; ASVs OR Swarm v2 | Schloss 2021 caveat applies; ASVs may oversplit multi-copy rRNA |
| Fungal community ITS | ITS2 primers; DADA2 or UNITE pipeline | UNITE is curated for fungal ITS |
| Plant community DNA | trnL P6 loop (Taberlet 2007) for degraded DNA | Robust to degradation |
| Deciding ASV vs OTU | ASVs for COI/12S/18S/fungi; OTU consideration for 16S with multi-copy concern | Taxon-specific |
| NovaSeq library (patterned flow cell) | Heavier tag-jumping correction; expect 10x higher rates than MiSeq | Patterned-cell index hopping |
| Low-biomass eDNA (deep ocean, ancient) | decontam frequency + prevalence methods; explicit reagent-contamination check | Reagent contamination dominates |
| Quantitative comparison across samples | Mock-community calibration BEFORE reporting read counts | Without mock, read counts are biased estimators of biomass |
| Detection probability with replication | Site-occupancy models (occumb, eDNAoccupancy; Ficetola 2015) | Read counts alone underestimate occurrence; replicates correct |
| Taxonomic assignment for marker > 80% covered | Naive Bayes (q2-feature-classifier) | Probabilistic; well-supported |
| Taxonomic assignment for sparse reference | Phylogenetic placement (EPA-ng) | Robust to incomplete references |
| OBITools3 pipeline | obi stats (NOTE: plural), DMS-based, .tar.gz taxonomy | v3 syntax differs from v1 |
Goal: Remove primer sequences while discarding reads that lack primers, before quality filtering.
Approach: Use cutadapt linked-adapter mode with marker-specific 5' and 3' primer pairs. --discard-untrimmed removes reads lacking expected primers; min_overlap prevents false primer detection in random sequence regions.
# COI metazoan (Leray mlCOIintF / jgHCO2198 -> 313 bp)
cutadapt -g 'GGWACWGGWTGAACWGTWTAYCCYCC;min_overlap=20' \
-G 'TAIACYTCIGGRTGICCRAARAAYCA;min_overlap=20' \
--discard-untrimmed --pair-filter=any \
-o trimmed_R1.fastq.gz -p trimmed_R2.fastq.gz \
raw_R1.fastq.gz raw_R2.fastq.gz
# Fish 12S (MiFish-U -> 163-185 bp)
cutadapt -g 'GTCGGTAAAACTCGTGCCAGC;min_overlap=18' \
-G 'CATAGTGGGGTATCTAATCCCAGTTTG;min_overlap=18' \
--discard-untrimmed --pair-filter=any \
-o trimmed_R1.fastq.gz -p trimmed_R2.fastq.gz \
raw_R1.fastq.gz raw_R2.fastq.gz
# Fungal ITS2
cutadapt -g 'GTGAATCATCGAATCTTTGAAC;min_overlap=18' \
-G 'TCCTCCGCTTATTGATATGC;min_overlap=18' \
--discard-untrimmed --pair-filter=any \
-o trimmed_R1.fastq.gz -p trimmed_R2.fastq.gz \
raw_R1.fastq.gz raw_R2.fastq.gz
Goal: Denoise paired-end amplicon reads into exact amplicon sequence variants (ASVs) with chimera removal and reference-based taxonomy assignment, per Callahan et al. 2016 Nat Methods 13:581-583.
Approach: Filter to length/quality thresholds, learn error rates per dataset, run dada() to denoise, merge pairs, build sequence table, remove chimeras with UCHIME3-equivalent in DADA2, then assign taxonomy against the marker-appropriate reference DB. CRITICAL: primers must be removed (cutadapt) BEFORE filterAndTrim, OR the error model is corrupted.
library(dada2)
# CRITICAL: primer removal MUST precede filterAndTrim
# DADA2's error model assumes primer-free reads
fwd_reads <- sort(list.files('primer_trimmed/', pattern = '_R1', full.names = TRUE))
rev_reads <- sort(list.files('primer_trimmed/', pattern = '_R2', full.names = TRUE))
filt_fwd <- file.path('filtered', basename(fwd_reads))
filt_rev <- file.path('filtered', basename(rev_reads))
# Filter and trim
# maxEE=c(2,2): expected errors per read; tradeoff sensitivity/specificity
# truncLen: set from quality profile inspection; do not guess
out <- filterAndTrim(fwd_reads, filt_fwd, rev_reads, filt_rev,
maxN = 0, maxEE = c(2, 2), truncQ = 2,
truncLen = c(220, 180), # data-dependent; inspect plotQualityProfile()
minLen = 100, rm.phix = TRUE, multithread = TRUE)
# Learn error rates
# For small datasets (< 100 samples), pool aggressively or use pre-learned model
err_fwd <- learnErrors(filt_fwd, multithread = TRUE)
err_rev <- learnErrors(filt_rev, multithread = TRUE)
# Denoise
dada_fwd <- dada(filt_fwd, err = err_fwd, multithread = TRUE)
dada_rev <- dada(filt_rev, err = err_rev, multithread = TRUE)
# Merge pairs with minimum overlap
merged <- mergePairs(dada_fwd, filt_fwd, dada_rev, filt_rev, minOverlap = 12)
# Build sequence table
seqtab <- makeSequenceTable(merged)
# Remove chimeras
# method='consensus': per-sample then consensus; conservative (default)
# method='pooled': pooled across samples; aggressive; can over-merge real diversity
# Chimera rate >30% typically indicates library prep problems
seqtab_nochim <- removeBimeraDenovo(seqtab, method = 'consensus',
multithread = TRUE)
cat('Chimera rate:', round(1 - sum(seqtab_nochim) / sum(seqtab), 3), '\n')
# Taxonomy assignment
# minBoot=80: standard genus-level confidence; 50 for family-level
# IMPORTANT: pair the marker with the appropriate reference DB
# COI -> MIDORI2 LONGEST_NUC_GB259_CO1 (or BOLD with curation)
# 12S -> MitoFish (Miya lab)
# 16S V4 -> SILVA 138.1+
# 18S V4/V9 -> SILVA 138.1+ or PR2
# Fungal ITS -> UNITE 9.0+
taxa <- assignTaxonomy(seqtab_nochim,
'MIDORI2_LONGEST_NUC_GB259_CO1_DADA2.fasta.gz',
minBoot = 80, multithread = TRUE)
Goal: Process eDNA reads through the Unix-style OBITools v3 pipeline (Boyer et al. 2016 Mol Ecol Resour 16:176-182 introduced OBITools v1; v3 is the post-2018 Python 3 rewrite) with DMS-based sequence management.
Approach: v3 introduces a Database Management System (DMS) abstraction; sequences are imported into a DMS rather than read directly from FASTQ. Commands use spaces (e.g., obi stats plural, not obistat). Taxonomy import expects .tar.gz archive, not a directory.
# v1 -> v3 command-name changes (critical):
# v1: obistat -> v3: obi stats
# v1: obigrep -> v3: obi grep
# v1: obiuniq -> v3: obi uniq
# v1: obitab -> v3: obi annotate / obi export --tab-output (different semantics)
# v1: ngsfilter -> v3: obi ngsfilter
# v1: taxdump dir -> v3: .tar.gz archive
# Import paired FASTQ into DMS
obi import --fastq-input raw_R1.fastq.gz EDNA/reads1
obi import --fastq-input raw_R2.fastq.gz EDNA/reads2
# Paired-end alignment
obi alignpairedend -R EDNA/reads2 EDNA/reads1 EDNA/aligned
# Filter by alignment score and length
obi grep -p 'sequence["score"] >= 50' EDNA/aligned EDNA/filtered
obi grep -p 'len(sequence) >= 100 and len(sequence) <= 500' \
EDNA/filtered EDNA/length_filtered
# Demultiplex (NGS filter file maps barcodes -> samples)
obi ngsfilter -t ngsfilter.txt -u EDNA/unassigned \
EDNA/length_filtered EDNA/demux
# Dereplicate (obi uniq creates merged_sample attribute automatically)
obi uniq EDNA/demux EDNA/derep
# Remove suspected error singletons
obi grep -p 'sequence["count"] >= 2' EDNA/derep EDNA/no_singletons
# Denoise via obi clean
obi clean -s merged_sample -r 0.05 -H EDNA/no_singletons EDNA/denoised
# Taxonomy assignment against reference database
obi ecotag -R EDNA/refdb --taxonomy EDNA/taxonomy EDNA/denoised EDNA/assigned
# Export tab-separated species table
obi export --tab-output EDNA/assigned > species_table.tsv
Across most metabarcoding studies, 50-85% of ASVs cannot be assigned to species level due to incomplete references (Wangensteen et al. 2018 PeerJ 6:e4705 documented this for marine COI + 18S). Report this gap honestly; do not infer ecology from "unassigned" reads.
Goal: Detect and remove sequence-to-sample misassignments arising from chimeric library molecules with mismatched indices.
Approach: Use dual-indexing (different indices at both ends; cross-jumped pairs are discarded). Quantify residual tag-jumping rate from per-ASV cross-sample appearance and apply per-ASV abundance threshold filtering with metabaR::tagjumpslayer. For NovaSeq libraries, expect ~10x higher tag-jumping than MiSeq due to patterned flow cells.
library(metabaR)
# metabaR expects an metabarlist object (asv table + sample info + ngsfilter)
# tagjumpslayer applies per-ASV abundance-threshold filter
# threshold: 0.01 (1% of ASV total) is conservative; 0.001 for aggressive removal
# Adjust threshold higher for NovaSeq (~0.005-0.01) than MiSeq (~0.001-0.005)
# Quantify residual tag-jumping rate before filtering:
# Count reads in sample x ASV combinations that should be 0 by experimental design
# (e.g., samples explicitly excluded from a particular condition)
# That rate / total reads = empirical tag-jumping rate
# Report this rate in methods section
Goal: Identify candidate contaminant ASVs from negative controls and DNA-concentration patterns.
Approach: Use decontam::isContaminant with method='combined' when both DNA concentration AND negative controls are available. Treat flagged ASVs as SCREENING CANDIDATES; verify biological plausibility before deletion. The default threshold=0.1 is over-aggressive in low-biomass data.
library(decontam)
# Frequency method: contaminants more frequent at LOW DNA concentration
# Prevalence method: contaminants more frequent in negative controls
# Combined: uses both signals (most robust)
contam <- isContaminant(seqtab_nochim,
conc = dna_concentration, # qPCR or Qubit per sample
neg = is_negative_control, # logical: which samples are controls
method = 'combined',
threshold = 0.1) # default; lower for high-confidence calls
# CRITICAL: decontam output is SCREENING, not classification
# Manually inspect each flagged ASV: is the taxonomic assignment plausibly a reagent contaminant?
# Common reagent contaminants: Delftia, Sphingomonas, Burkholderia, Propionibacterium
flagged <- which(contam$contaminant)
cat('Decontam flagged', length(flagged), 'ASVs as candidates\n')
# After manual review, remove confirmed contaminants
confirmed_contam <- intersect(flagged, biological_plausibility_check_result)
seqtab_clean <- seqtab_nochim[, !(colnames(seqtab_nochim) %in% confirmed_contam)]
Goal: Estimate true species occurrence probabilities from replicated eDNA samples, accounting for false negatives in any single PCR replicate.
Approach: Fit a multi-species occupancy model via MCMC (Ficetola 2015 Mol Ecol Resour 15:543-556) on a 3D array of replicated read counts. Output: per-site, per-species occupancy probabilities corrected for detection.
library(occumb)
# y: 3D array [species, sites, replicates] of read counts
# spec_cov: species covariates (traits)
# site_cov: site covariates (env)
data_obj <- occumbData(y = count_array, spec_cov = species_covariates,
site_cov = site_covariates)
# Fit hierarchical occupancy model
# Requires JAGS installation
# n.iter >= 10000, n.burn >= 2500 for publication-quality posteriors
fit <- occumb(data = data_obj, n.chains = 4, n.iter = 10000,
n.thin = 5, n.burn = 2500)
# Extract detection-corrected occupancy
summary(fit)
Trigger: Comparing read counts of two ASVs and reporting the ratio as a biomass / abundance estimate.
Mechanism: Primer-template binding affinity varies systematically across taxa; PCR amplification is non-linear (saturates); read counts have weak-to-moderate, NONLINEAR correlation with biomass (Elbrecht 2015; Lamb 2019).
Symptom: Reviewer asks "how is it known that reads = biomass?"; cross-study quantitative comparisons fail to replicate.
Fix: Either (a) restrict reporting to presence/absence; (b) report read counts as relative abundances with explicit caveat; or (c) include mock-community of known composition for primer-specific calibration. Do not silently equate reads with biomass.
Trigger: Applying tag-jumping filters calibrated on MiSeq libraries to NovaSeq data.
Mechanism: NovaSeq patterned flow cells have ~10x higher index hopping than MiSeq. MiSeq-calibrated thresholds (often ~0.001 fraction) are too permissive on NovaSeq data.
Symptom: Apparent rare-species detections in NovaSeq libraries do not replicate; per-ASV cross-sample appearance is unusually broad.
Fix: Use NovaSeq-appropriate tag-jumping thresholds (~0.005-0.01) and report the empirical tag-jumping rate from explicit-zero combinations.
Trigger: Applying default threshold = 0.1 to ASVs from open-ocean water, ancient sediments, or other dilute samples.
Mechanism: In low-biomass samples, the contaminant signal/background ratio approaches 1; decontam over-flags real but dilute biology as "contaminant" because the statistical pattern looks similar.
Symptom: Many ASVs flagged from low-biomass samples; taxonomic profile of "flagged contaminants" looks biologically realistic.
Fix: Lower threshold (0.05 or 0.01); always manually review flagged ASVs for biological plausibility; cite Salter 2014 BMC Biol 12:87 for the low-biomass reagent-contamination caveat.
Trigger: Running DADA2's filterAndTrim() on FASTQ files that still contain primer sequences.
Mechanism: DADA2 learns sequencing error from the empirical data; if primer sequences are present, they look like "perfect agreement" and corrupt the error model. ASVs are inferred with primer artifacts attached.
Symptom: DADA2 reports "phix-like contamination" (false; it's primers); ASVs start with the primer sequence; chimera rate elevated.
Fix: Always run cutadapt (or similar) BEFORE filterAndTrim. Verify with head of trimmed FASTQ that primer sequences are gone.
Trigger: Running obistat or obigrep on a v3 install.
Mechanism: v1 used concatenated command names (obistat); v3 uses subcommand syntax with a space (obi stats — note plural).
Symptom: Bash error obistat: command not found; tutorial documentation does not match installed version.
Fix: Use obi <subcommand> syntax; consult obi --help for current command list. Taxonomy import requires .tar.gz archive, not unpacked directory.
| Threshold | Value | Source / rationale | |-----------|-------|-------------------| | DADA2 maxEE per read | 2 | Standard sensitivity/specificity balance | | DADA2 chimera rate alarm | > 30% suggests library issues | Empirical convention | | DADA2 minBoot for taxonomy | 80 for genus; 50 for family | Standard confidence cutoffs | | Tag-jumping filter MiSeq | 0.001-0.005 fraction of ASV total | Schnell 2015 | | Tag-jumping filter NovaSeq | 0.005-0.01 fraction of ASV total | Patterned-cell index hopping ~10x higher | | decontam threshold | 0.1 default; 0.05 for low-biomass | Davis 2018; reduce for dilute samples | | Per-sample minimum reads | 1000 (after filtering) | Below this rare-species detection unreliable | | Singleton removal | count >= 2 | Singletons often error-driven | | Bootstrap nperm for tests | 999 | Standard permutation count | | Occupancy model iterations | n.iter >= 10000, n.burn >= 2500 | occumb default for stable posteriors | | eDNA decay (20 deg C surface water) | half-life ~4-15 hours | Strickler 2015 Biol Conserv 183:85-92 |
| Error | Cause | Solution |
|-------|-------|----------|
| obistat: command not found | OBITools v3 uses obi stats (plural) | Use v3 syntax |
| DADA2 error rate plot looks pathological | Primer sequences still in reads | Re-run cutadapt before filterAndTrim |
| Chimera rate > 30% | Library-prep issue or primer dimers | Inspect raw FASTQ; check PCR conditions |
| decontam flags many real species | Default threshold too aggressive for low-biomass | Lower threshold; manual review |
| Naive Bayes confidence 0.95 but species is wrong | scikit-learn-calibrated "confidence" not true probability | Use phylogenetic placement for borderline assignments |
| occumb JAGS not found error | JAGS not installed system-wide | Install JAGS (CRAN page has platform instructions) |
| eDNA detections do not replicate | Read counts treated as abundance | Switch to presence/absence; use mock-community calibration |
| MIDORI2 download path expired | Database updated; old URL gone | Check current MIDORI2 / MitoFish download page |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.