Version Compatibility

Reference examples tested with: DiffBind 3.12+, DESeq2 1.42+, edgeR 4.0+, csaw 1.36+, limma 3.58+, GenomicRanges 1.54+, ChIPseeker 1.38+, Subread 2.0+ (featureCounts), sva 3.50+, RUVSeq 1.36+.

Before using code patterns, verify installed versions match:

R: packageVersion('<pkg>') then ?function_name to verify parameters

If code throws unexpected errors, introspect the installed package and adapt rather than retrying.

Differential Accessibility

"Find chromatin regions that change accessibility between my conditions" -> Build a sample-by-region count matrix, normalize for library size and chromatin compaction, fit a generalized linear model (negative-binomial), and extract regions with significant accessibility change.

R (consensus-peak workflow): DiffBind -> count -> normalize -> contrast -> analyze
R (window-based, no peak set): csaw::windowCounts + filterWindowsGlobal + edgeR QL F-test
R (existing peak-count matrix): DESeq2 or edgeR directly on featureCounts output

DiffBind is a wrapper around DESeq2 / edgeR with ATAC-aware defaults. csaw is the only peak-free option; it tests fixed-width sliding windows. The choice depends on whether peaks are stable across conditions (use DiffBind) or whether some condition has dramatically different peak structure (use csaw or rebuild consensus peaks).

Algorithmic Taxonomy

| Tool | Model | Input | Min reps | Strength | Fails when | |------|-------|-------|----------|----------|------------| | DiffBind 3.x (default DESeq2) | NB GLM via DESeq2 on consensus peaks | BAM + peak files | 2-3 per group | ATAC-aware defaults; built-in QC; blocking factors. Default in 3.x is normalize=DBA_NORM_LIB with library=DBA_LIBSIZE_FULL (full library size, background-included) | Peaks differ dramatically between conditions (closed -> open shifts width); fewer than 2 reps per group | | DiffBind with edgeR backend | NB GLM via edgeR-QL on consensus peaks | Same | 2-3 per group | Robust at low replicates (n=2 OK); QL test calibrates dispersion better than DESeq2 at small n | When global accessibility shifts dominate, switch to spike-in or full-library (library=DBA_LIBSIZE_FULL), never reads-in-peaks | | DESeq2 directly on peak counts | NB GLM with shrinkage | featureCounts SAF | 3+ | Maximum control; integrates with apeglm shrinkage; modern interface | Need to manually build consensus peakset; per-region pre-filter required (low counts inflate dispersion) | | edgeR QL F-test on peak counts | NB QL (quasi-likelihood) | featureCounts | 2 | Calibrated FDR at low n (n=2 viable); robust to outlier reps | Manual consensus peakset; small library bias unless normalization explicit | | csaw (windows) | edgeR-QL on sliding windows | BAM only | 2 | No peak set required; detects diffuse changes peaks miss; merges adjacent windows | Computationally heavy; window size choice biases results; harder to annotate downstream | | limma-voom | linear model with mean-variance trend | log2(CPM+offset) | 3 | Fast; good calibration at moderate count | Mis-calibrated at very low counts (atac peaks often have dropouts); needs explicit voom normalization |

Methodology evolves; verify the current consensus practice (Gontarz 2020 DA-strategy benchmark; Reske 2020 normalization comparison) before locking pipelines.

Decision Tree by Experimental Scenario

| Scenario | Recommended workflow | Why | |----------|---------------------|-----| | 3+ reps, similar peak structure across conditions | DiffBind (DESeq2 backend), summits=250, normalize=DBA_NORM_NATIVE | Standard pattern; peak-level inference is interpretable | | 2 reps per condition | DiffBind with edgeR backend OR raw edgeR QL | DESeq2 underpowered at n=2; QL is robust | | Peak structure differs dramatically (e.g., differentiation, KO of pioneer TF) | csaw windows OR rebuild consensus peakset post-hoc per condition then take union | Stable consensus peakset is invalid when chromatin landscape shifts | | Multi-factor design (batch, sex, time) | DiffBind with dba.contrast(..., design='~Batch + Condition') | Standard linear model adjustment | | Hidden batch / unknown variance | DESeq2 + SVA OR RUVseq before fitting | Empirical surrogate variables capture unknown nuisance | | Long timecourse (5+ time points) | DESeq2 LRT (likelihood ratio test) on ~time + condition + time:condition | Captures temporal interaction; use differential-expression/timeseries-de patterns | | Diffuse / broad accessibility change (super-enhancers) | csaw with merged windows OR call broad peaks first | Narrow peaks fragment broad domains -> inflated peak count, deflated effect | | Single-cell ATAC pseudobulk | DESeq2 on aggregated counts OR Signac::FindMarkers | See atac-seq/single-cell-atac | | Allele-specific accessibility | csaw on heterozygous SNPs OR HOMER tagDir | Peak-level invalid because alleles share peaks | | Plant / non-model organism | DiffBind works; just provide custom genome and disable annotation | Annotation step assumes UCSC TxDb; bypass if absent |

Consensus Peak Set Strategy

The consensus peakset choice drives FDR calibration. DiffBind defaults rarely match what a chromatin biologist wants.

| Strategy | Implementation | When to use | |----------|---------------|-------------| | Intersection (peak in all reps) | dba.count(minOverlap=N) with N = total reps | Strict; for high-confidence reproducible analysis (matches IDR philosophy) | | Union (peak in any rep) | minOverlap=1 | Maximum sensitivity; risks single-rep artefact peaks | | Majority rule (peak in >= half reps) | minOverlap=ceiling(N/2) | DiffBind default-ish; balance | | Per-condition union, then union of unions | Compute consensus per group, then merge | Best when conditions have very different peak counts | | Iterative overlap removal (Corces 2018) | Sort peaks by significance; greedily keep non-overlapping; fixed-width 501 bp | Standard for fixed-width consensus; required for peak-count matrices used in machine learning |

Refer to atac-seq/consensus-peakset for full coverage of fixed-width re-centering and the iterative overlap algorithm. For DiffBind, the key parameter is summits=250 (re-center peaks on summit +/- 250 bp = 501 bp fixed width).

Normalization: The ATAC-Specific Choice

DiffBind 3.x conflates two orthogonal choices: the normalization method (normalize=) and the library-size definition (library=). The defaults are normalize=DBA_NORM_LIB with library=DBA_LIBSIZE_FULL (full mapped-read total).

| Choice | DiffBind argument | What it does | When to use | |--------|-------------------|--------------|-------------| | Normalize by library size only | normalize=DBA_NORM_LIB (default) | Scale counts by the chosen library size | Standard; pairs with full or RiP library | | Reads-in-peaks library size | library=DBA_LIBSIZE_PEAKREADS | Library size = reads in consensus peaks (RiP) | When background varies independently of biology (protects against background drift) | | Full mapped library size | library=DBA_LIBSIZE_FULL (default) | Library size = total mapped reads | When global accessibility shifts must remain visible (e.g., chromatin compaction) | | Native per-tool default | normalize=DBA_NORM_NATIVE | DESeq2 RLE or edgeR TMM, depending on backend | Use DESeq2/edgeR conventions directly | | TMM (edgeR) | normalize=DBA_NORM_TMM | Trimmed mean of M-values | Robust to a few highly-DA peaks dominating | | RLE (DESeq2) | normalize=DBA_NORM_RLE | DESeq2 geometric-mean size factors | DESeq2-conventional analysis | | Spike-in / external | not built-in; pre-scale counts | Exogenous reference (e.g., spike-in chromatin) | Required when global scaling is biological |

Trigger: Treatment causes global chromatin compaction (e.g., HDAC inhibitor, DNMT inhibitor).

Mechanism: Full library-size normalization is robust to background but the default still scales background reads in; under uniform global compaction the magnitudes can collapse. RiP scaling (library=DBA_LIBSIZE_PEAKREADS) makes the opposite assumption (peak signal is stable, background absorbs the shift) and so erases the very biology of interest.

Symptom: Volcano plot is symmetric about zero; PCA shows treatment effect that vanishes after normalization.

Fix: Use spike-in normalization (add exogenous chromatin pre-Tn5; scale by spike-in reads), or keep the default library=DBA_LIBSIZE_FULL but interpret with the global shift in mind. Reske 2020 documented that the normalization choice materially changes which peaks are called differential under such a global shift.

Per-Tool Failure Modes

DiffBind -- Library-size choice confounds global change

Trigger: Treatment causes whole-genome accessibility shift; cell-cycle synchronized samples; differentiation timecourse.

Mechanism: Setting library=DBA_LIBSIZE_PEAKREADS (RiP-based) assumes total reads-in-peaks is comparable across samples. Global accessibility shifts break this assumption.

Symptom: Conditions clearly differ in PCA before normalization; after normalization PC1 is nearly noise.

Fix: Keep the default library=DBA_LIBSIZE_FULL (or use spike-in scaling) and re-run dba.contrast and dba.analyze. Re-inspect PCA; if treatment now drives PC1, the global-shift biology is preserved.

DiffBind summits parameter -- Width-driven differential

Trigger: Per-rep peaks have very different widths; consensus uses union.

Mechanism: Without summits=250, DiffBind counts reads in the original peak intervals. A peak called as 200 bp in one rep and 800 bp in another inflates the count for the wider rep.

Symptom: Top differential peaks track peak width, not signal intensity.

Fix: Always set summits=250 (or 100, depending on resolution). This re-centers all peaks on the summit and uses identical 501 (or 201) bp windows.

csaw -- Window size and filter choice dominates results

Trigger: Default width=spacing=50 bp windows; default filter=10 count cutoff.

Mechanism: Narrow windows have very low counts and inflated dispersion; the global background filter discards too many windows. Results are extremely sensitive to these.

Symptom: Number of significant windows ranges from 200 to 200,000 across reasonable parameter sweeps.

Fix: Use width=150 for ATAC (matches typical NFR fragment); threshold with filterWindowsGlobal(data, background)$filter > log2(3) to discard low-signal windows. Validate by running on technical replicates -- ~0 differential windows is the expected outcome.

DESeq2 -- Apeglm shrinkage with too few reps

Trigger: n=2 per condition; using lfcShrink(type='apeglm').

Mechanism: Apeglm shrinks log2FC toward zero based on dispersion estimate; at n=2 dispersion is unreliable, shrinkage is over-aggressive, and biology is masked.

Symptom: All log2FC values cluster near zero post-shrinkage; FDR list has high p-values across the board.

Fix: Skip shrinkage at n=2 OR switch to edgeR QL test. If n=2 is unavoidable, report unshrunken log2FC alongside FDR; do not use shrunken FCs as the effect size.

edgeR QL -- Filter must be aggressive enough

Trigger: Including peaks with mean count < 5 across all samples.

Mechanism: The QL F-test calibrates dispersion across all features. Including very-low-count peaks pulls dispersion estimates and inflates FDR.

Fix: filterByExpr(y, group=group) removes low-count peaks; restore peaks one at a time only if they are biologically critical and supported by at least one rep at depth.

Reconciliation: When Tools Disagree

| Pattern | Likely cause | Action | |---------|--------------|--------| | DiffBind + DESeq2 differ wildly | Different normalization (DiffBind default = full library DBA_NORM_LIB/DBA_LIBSIZE_FULL, DESeq2 default = RLE) | Force same normalization; differences should shrink | | DiffBind + csaw differ | csaw catches diffuse changes peaks miss; DiffBind catches narrow peaks csaw smooths | Both can be correct; report intersection as high-confidence | | Top hits in DiffBind have FDR > 0.5 in DESeq2 | DiffBind's blacklist filter or width re-centering changes the per-region count | Re-run DESeq2 on the exact DiffBind consensus matrix (dba.peakset extract) | | Effect-size ranking differs across reps | One rep is an outlier -- check PCA | Drop or block as covariate; never silently include | | No significant peaks despite obvious browser-track differences | Library-size normalization eaten the global shift | Switch to spike-in or full-library normalization |

Operational rule: For high-confidence reporting, require concordant detection in two methods from different families (DiffBind/DESeq2-style on consensus peaks AND csaw-style sliding windows agreeing within +/- 500 bp). Report the intersection as primary; the union as exploratory.

Effect Size and Threshold Selection

| Question | Threshold | Rationale | |----------|-----------|-----------| | Statistical significance | FDR < 0.05 | Standard BH FDR (DESeq2 / edgeR / DiffBind default) | | Stringent biological change | abs(log2FC) >= 1 (= 2-fold) | Within-noise effects below 2-fold are unreliable in chromatin | | Conservative reporting | FDR < 0.01 AND abs(log2FC) >= 1 | Per ENCODE differential reporting guidance | | Exploratory / discovery | FDR < 0.1 OR shrunken log2FC >= 0.585 (1.5x) | For follow-up validation, not final claim | | Proper effect-size reporting | Use shrunken log2FC (apeglm or DESeq2 lfcShrink) when n >= 3 | Raw log2FC at low counts is volatile |

abs(log2FC) >= 1 is not universal. ATAC effects in primary cells (immune subsets, neurons) often max at 1.5-fold; require log2FC >= 0.585 with FDR < 0.05 for those settings.

Hidden Batch with SVA / RUVseq

Goal: Recover differential accessibility signal when unknown batch effects swamp the contrast.

Approach: Estimate surrogate variables on normalized counts via svaseq, append them to the DESeq2 design, refit the model, and extract the contrast.

library(DESeq2); library(sva)

dds <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~condition)
dds <- estimateSizeFactors(dds)
dat <- counts(dds, normalized=TRUE)
dat <- dat[rowMeans(dat) > 1, ]

mod  <- model.matrix(~condition, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svobj <- svaseq(dat, mod, mod0, n.sv=2)

dds$SV1 <- svobj$sv[, 1]; dds$SV2 <- svobj$sv[, 2]
design(dds) <- ~SV1 + SV2 + condition
dds <- DESeq(dds)
res <- results(dds, contrast=c('condition', 'treated', 'control'))

RUVseq is the alternative when negative-control regions (ChrM peaks NOT changing) or technical replicates are available. SVA is preferred when no controls exist.

Spike-in Normalization

Trigger: Treatment causes whole-genome accessibility shift (HDAC inhibitor, DNMT inhibitor); RPM/CPM/RiP normalization erases the global biology.

Mechanism: Exogenous chromatin spike-in (Drosophila S2 nuclei) is added at constant cell number ratio pre-Tn5; reads aligning to dm6 quantify the constant exogenous baseline. Sample-level scaling factor = inverse of dm6 reads per sample, applied to human-aligned counts.

Goal: Preserve global accessibility shifts that RiP / library-size normalization would erase.

Approach: Compute per-sample size factors from inverse spike-in read counts, override DESeq2's default size factors, then run the standard DESeq2 fit and contrast.

library(DESeq2)

# spike_counts: per-sample dm6 read counts (one column per sample)
sf_spike <- 1 / spike_counts
sf_spike <- sf_spike / mean(sf_spike)                 # Geometric mean = 1 for stability

dds <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~condition)
sizeFactors(dds) <- sf_spike                          # Override library-size factors
dds <- DESeq(dds)
res <- results(dds, contrast=c('condition', 'treated', 'control'))

After spike-in normalization, log2FC reflects absolute accessibility change (not just relative redistribution). Spike-in is the most direct control for global-shift biology.

Permutation Testing for Low Replicate Designs

Trigger: n=2 per condition; parametric NB tests give over-confident p-values.

Mechanism: csaw provides a permutation framework: the null is generated by shuffling sample labels; test statistic is the count-difference per window; per-region p is the rank under permutation.

Goal: Generate empirical per-region p-values when parametric NB tests are over-confident at low replicate counts.

Approach: Fit the observed edgeR QL F statistic, repeatedly shuffle group labels and refit, then compute per-region p as the rank of the observed F under the shuffled null.

library(csaw); library(edgeR)

# Standard csaw counts (windows or peaks)
counts <- regionCounts(bam_files, regions, ext=200)

# Standard NB fit
y <- DGEList(counts=assay(counts), group=condition)
y <- calcNormFactors(y, method='TMM')
design <- model.matrix(~condition)
y <- estimateDisp(y, design)
fit <- glmQLFit(y, design)

# Permutation: shuffle group labels n_perms times; track per-region rank statistic
n_perms <- 1000
perm_p <- replicate(n_perms, {
    shuffled <- sample(condition)
    design_p <- model.matrix(~shuffled)
    fit_p <- glmQLFit(estimateDisp(y, design_p), design_p)
    glmQLFTest(fit_p, coef=2)$table$F
})
observed_F <- glmQLFTest(fit, coef=2)$table$F
permp <- rowMeans(perm_p >= observed_F)

Permutation requires ~1000 shuffles for stable per-region p; computationally expensive but essential when parametric tests cannot be trusted.

DESeq2 Likelihood Ratio Test for Time-Courses

Goal: Identify peaks whose accessibility trajectory differs between conditions across a timecourse.

Approach: Fit a DESeq2 LRT comparing a full model with a spline-by-condition interaction against a reduced model lacking the interaction; significant peaks have time-dependent condition response.

library(DESeq2); library(splines)

# Spline-modeled time course (5+ time points)
dds <- DESeqDataSetFromMatrix(countData=counts, colData=coldata,
                              design=~ns(timepoint, df=3) + condition + ns(timepoint, df=3):condition)
dds_full <- DESeq(dds, test='LRT', reduced=~ns(timepoint, df=3) + condition)
res <- results(dds_full)

The LRT compares the full model (with time:condition interaction) to a reduced model without; significant peaks have time-dependent condition response. Use df=3 natural splines for typical 5-7 timepoints; df=4-5 for >= 8.

Hi-C-Loop-Anchored Differential

Trigger: Combined ATAC-seq + Hi-C/HiChIP datasets; want to test enhancer-promoter pair-level differential.

Mechanism: Aggregate peak-level differential signal at HiCCUPS loop anchors (or ABC-predicted enhancer-gene pairs). Combined enhancer + promoter accessibility change has more statistical power than either alone.

Goal: Test enhancer-promoter pair-level differential accessibility by aggregating peak-level signal at loop anchors.

Approach: Import HiCCUPS loops, map consensus peaks to anchor positions, then aggregate per-peak log2FC across both anchors of each loop to get loop-level effect sizes.

# Pseudo-pattern: per loop, sum DESeq2 log2FC at both anchors
loops <- makeGenomicInteractionsFromFile('hiccups_loops.bedpe', type='bedpe',
                                         experiment_name='hiccups', description='HiCCUPS loops')
peak_to_loop <- findOverlaps(consensus_peaks, c(anchorOne(loops), anchorTwo(loops)))

loop_lfc <- aggregate(res$log2FoldChange[queryHits(peak_to_loop)],
                      by=list(loop=ceiling(subjectHits(peak_to_loop) / 2)),
                      FUN=function(x) sum(x, na.rm=TRUE))

For implementation, use the InteractionSet Bioconductor package which preserves loop-pair structure during testing. Reference: Mumbach 2017 Nat Genet (HiChIP enhancer connectome).

Annotate Differential Peaks

Goal: Assign each differentially accessible peak to its nearest gene and feature class for downstream interpretation.

Approach: Pull DiffBind / DESeq2 results as GRanges, annotate via ChIPseeker against a TxDb with a custom promoter window, then plot annotation distribution and extract gene IDs for enrichment.

library(ChIPseeker); library(TxDb.Hsapiens.UCSC.hg38.knownGene)

diff_peaks <- dba.report(dba)
peakAnno <- annotatePeak(diff_peaks, TxDb=TxDb.Hsapiens.UCSC.hg38.knownGene,
                         tssRegion=c(-2000, 500), level='gene')
plotAnnoPie(peakAnno); plotDistToTSS(peakAnno)
genes <- as.data.frame(peakAnno)$geneId          # for GO enrichment via pathway-analysis/go-enrichment

tssRegion=c(-2000, 500) defines promoter as TSS-2kb to TSS+500bp; ChIPseeker default (-3000, 3000) over-counts promoter assignments. Adjust per cell type / organism.

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | DiffBind very slow | Counting all peaks across all BAMs sequentially | dba.count(..., bParallel=TRUE) and provide BPPARAM | | unable to use the provided design matrix (DESeq2) | Confounded design (e.g., batch perfectly aligns with condition) | Replicate in a way that breaks the confound, or drop the batch term | | FDR list empty despite obvious differences | RiP scaling (library=DBA_LIBSIZE_PEAKREADS) removed global biology | Use spike-in or keep default library=DBA_LIBSIZE_FULL; verify with browser tracks | | Top peaks all on chrM | chrM not removed from BAM before counting | Always strip chrM upstream | | dispersion estimate failure (DESeq2) | Too few peaks pass filter; too few reps | filterByExpr less aggressively; check rep count | | Error in if (any(out))(csaw) | Window count below threshold | Reduce bin.size; check BAM is paired-end | | ChIPseeker error on non-human TxDb | Wrong organism db loaded | Use make_org_db from biomartr or AnnotationDbi for non-model | | Volcano plot symmetric about zero with no significant peaks | Hidden batch swamping signal | Run SVA/RUVseq |

References

Stark R & Brown G 2011 DiffBind (Bioconductor; canonical reference)
Lun ATL & Smyth GK 2014 NAR 42:e95 (csaw windowed differential method)
Love MI et al 2014 Genome Biol 15:550 (DESeq2)
Robinson MD et al 2010 Bioinformatics 26:139 (edgeR)
Chen Y et al 2016 F1000Res 5:1438 (edgeR-QL framework)
Leek JT 2014 NAR 42:e161 (svaseq for hidden batch)
Risso D et al 2014 Nat Biotechnol 32:896 (RUVseq)
Reske JJ et al 2020 Epigenetics Chromatin 13:22 (ATAC normalization-method comparison; ARID1A/PIK3CA global-shift case study)
Gontarz P et al 2020 Sci Rep 10:10150 (comparison of differential-accessibility analysis strategies for ATAC-seq)
Corces MR et al 2018 Science 362:eaav1898 (iterative overlap, fixed-width 501 bp consensus)
Yu G et al 2015 Bioinformatics 31:2382 (ChIPseeker)

Related Skills

atac-seq/atac-peak-calling - Generate per-replicate peaks
atac-seq/consensus-peakset - Build the differential-ready consensus peakset
atac-seq/atac-qc - Pre-screen and drop failing replicates
atac-seq/single-cell-atac - Pseudobulk-level differential per cluster
atac-seq/co-accessibility - Identify cis-regulatory connections among DA peaks
differential-expression/deseq2-basics - Underlying DESeq2 patterns
differential-expression/de-results - Effect-size reporting and shrinkage
chip-seq/differential-binding - Same DiffBind workflow, ChIP context
pathway-analysis/go-enrichment - Downstream gene-level enrichment of DA-associated genes

Version Compatibility

Reference examples tested with: DiffBind 3.12+, DESeq2 1.42+, edgeR 4.0+, csaw 1.36+, limma 3.58+, GenomicRanges 1.54+, ChIPseeker 1.38+, Subread 2.0+ (featureCounts), sva 3.50+, RUVSeq 1.36+.

Before using code patterns, verify installed versions match:

R: packageVersion('<pkg>') then ?function_name to verify parameters

If code throws unexpected errors, introspect the installed package and adapt rather than retrying.

Differential Accessibility

R (consensus-peak workflow): DiffBind -> count -> normalize -> contrast -> analyze
R (window-based, no peak set): csaw::windowCounts + filterWindowsGlobal + edgeR QL F-test
R (existing peak-count matrix): DESeq2 or edgeR directly on featureCounts output

Algorithmic Taxonomy

Methodology evolves; verify the current consensus practice (Gontarz 2020 DA-strategy benchmark; Reske 2020 normalization comparison) before locking pipelines.

Decision Tree by Experimental Scenario

Consensus Peak Set Strategy

The consensus peakset choice drives FDR calibration. DiffBind defaults rarely match what a chromatin biologist wants.

Normalization: The ATAC-Specific Choice

Trigger: Treatment causes global chromatin compaction (e.g., HDAC inhibitor, DNMT inhibitor).

Symptom: Volcano plot is symmetric about zero; PCA shows treatment effect that vanishes after normalization.

Per-Tool Failure Modes

DiffBind -- Library-size choice confounds global change

Trigger: Treatment causes whole-genome accessibility shift; cell-cycle synchronized samples; differentiation timecourse.

Mechanism: Setting library=DBA_LIBSIZE_PEAKREADS (RiP-based) assumes total reads-in-peaks is comparable across samples. Global accessibility shifts break this assumption.

Symptom: Conditions clearly differ in PCA before normalization; after normalization PC1 is nearly noise.

DiffBind summits parameter -- Width-driven differential

Trigger: Per-rep peaks have very different widths; consensus uses union.

Mechanism: Without summits=250, DiffBind counts reads in the original peak intervals. A peak called as 200 bp in one rep and 800 bp in another inflates the count for the wider rep.

Symptom: Top differential peaks track peak width, not signal intensity.

Fix: Always set summits=250 (or 100, depending on resolution). This re-centers all peaks on the summit and uses identical 501 (or 201) bp windows.

csaw -- Window size and filter choice dominates results

Trigger: Default width=spacing=50 bp windows; default filter=10 count cutoff.

Mechanism: Narrow windows have very low counts and inflated dispersion; the global background filter discards too many windows. Results are extremely sensitive to these.

Symptom: Number of significant windows ranges from 200 to 200,000 across reasonable parameter sweeps.

DESeq2 -- Apeglm shrinkage with too few reps

Trigger: n=2 per condition; using lfcShrink(type='apeglm').

Mechanism: Apeglm shrinks log2FC toward zero based on dispersion estimate; at n=2 dispersion is unreliable, shrinkage is over-aggressive, and biology is masked.

Symptom: All log2FC values cluster near zero post-shrinkage; FDR list has high p-values across the board.

Fix: Skip shrinkage at n=2 OR switch to edgeR QL test. If n=2 is unavoidable, report unshrunken log2FC alongside FDR; do not use shrunken FCs as the effect size.

edgeR QL -- Filter must be aggressive enough

Trigger: Including peaks with mean count < 5 across all samples.

Mechanism: The QL F-test calibrates dispersion across all features. Including very-low-count peaks pulls dispersion estimates and inflates FDR.

Fix: filterByExpr(y, group=group) removes low-count peaks; restore peaks one at a time only if they are biologically critical and supported by at least one rep at depth.

Reconciliation: When Tools Disagree

Effect Size and Threshold Selection

abs(log2FC) >= 1 is not universal. ATAC effects in primary cells (immune subsets, neurons) often max at 1.5-fold; require log2FC >= 0.585 with FDR < 0.05 for those settings.

Hidden Batch with SVA / RUVseq

Goal: Recover differential accessibility signal when unknown batch effects swamp the contrast.

Approach: Estimate surrogate variables on normalized counts via svaseq, append them to the DESeq2 design, refit the model, and extract the contrast.

library(DESeq2); library(sva)

dds <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~condition)
dds <- estimateSizeFactors(dds)
dat <- counts(dds, normalized=TRUE)
dat <- dat[rowMeans(dat) > 1, ]

mod  <- model.matrix(~condition, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svobj <- svaseq(dat, mod, mod0, n.sv=2)

dds$SV1 <- svobj$sv[, 1]; dds$SV2 <- svobj$sv[, 2]
design(dds) <- ~SV1 + SV2 + condition
dds <- DESeq(dds)
res <- results(dds, contrast=c('condition', 'treated', 'control'))

RUVseq is the alternative when negative-control regions (ChrM peaks NOT changing) or technical replicates are available. SVA is preferred when no controls exist.

Spike-in Normalization

Trigger: Treatment causes whole-genome accessibility shift (HDAC inhibitor, DNMT inhibitor); RPM/CPM/RiP normalization erases the global biology.

Goal: Preserve global accessibility shifts that RiP / library-size normalization would erase.

Approach: Compute per-sample size factors from inverse spike-in read counts, override DESeq2's default size factors, then run the standard DESeq2 fit and contrast.

library(DESeq2)

# spike_counts: per-sample dm6 read counts (one column per sample)
sf_spike <- 1 / spike_counts
sf_spike <- sf_spike / mean(sf_spike)                 # Geometric mean = 1 for stability

dds <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~condition)
sizeFactors(dds) <- sf_spike                          # Override library-size factors
dds <- DESeq(dds)
res <- results(dds, contrast=c('condition', 'treated', 'control'))

After spike-in normalization, log2FC reflects absolute accessibility change (not just relative redistribution). Spike-in is the most direct control for global-shift biology.

Permutation Testing for Low Replicate Designs

Trigger: n=2 per condition; parametric NB tests give over-confident p-values.

Mechanism: csaw provides a permutation framework: the null is generated by shuffling sample labels; test statistic is the count-difference per window; per-region p is the rank under permutation.

Goal: Generate empirical per-region p-values when parametric NB tests are over-confident at low replicate counts.

Approach: Fit the observed edgeR QL F statistic, repeatedly shuffle group labels and refit, then compute per-region p as the rank of the observed F under the shuffled null.

library(csaw); library(edgeR)

# Standard csaw counts (windows or peaks)
counts <- regionCounts(bam_files, regions, ext=200)

# Standard NB fit
y <- DGEList(counts=assay(counts), group=condition)
y <- calcNormFactors(y, method='TMM')
design <- model.matrix(~condition)
y <- estimateDisp(y, design)
fit <- glmQLFit(y, design)

# Permutation: shuffle group labels n_perms times; track per-region rank statistic
n_perms <- 1000
perm_p <- replicate(n_perms, {
    shuffled <- sample(condition)
    design_p <- model.matrix(~shuffled)
    fit_p <- glmQLFit(estimateDisp(y, design_p), design_p)
    glmQLFTest(fit_p, coef=2)$table$F
})
observed_F <- glmQLFTest(fit, coef=2)$table$F
permp <- rowMeans(perm_p >= observed_F)

Permutation requires ~1000 shuffles for stable per-region p; computationally expensive but essential when parametric tests cannot be trusted.

DESeq2 Likelihood Ratio Test for Time-Courses

Goal: Identify peaks whose accessibility trajectory differs between conditions across a timecourse.

Approach: Fit a DESeq2 LRT comparing a full model with a spline-by-condition interaction against a reduced model lacking the interaction; significant peaks have time-dependent condition response.

library(DESeq2); library(splines)

# Spline-modeled time course (5+ time points)
dds <- DESeqDataSetFromMatrix(countData=counts, colData=coldata,
                              design=~ns(timepoint, df=3) + condition + ns(timepoint, df=3):condition)
dds_full <- DESeq(dds, test='LRT', reduced=~ns(timepoint, df=3) + condition)
res <- results(dds_full)

Hi-C-Loop-Anchored Differential

Trigger: Combined ATAC-seq + Hi-C/HiChIP datasets; want to test enhancer-promoter pair-level differential.

Goal: Test enhancer-promoter pair-level differential accessibility by aggregating peak-level signal at loop anchors.

Approach: Import HiCCUPS loops, map consensus peaks to anchor positions, then aggregate per-peak log2FC across both anchors of each loop to get loop-level effect sizes.

# Pseudo-pattern: per loop, sum DESeq2 log2FC at both anchors
loops <- makeGenomicInteractionsFromFile('hiccups_loops.bedpe', type='bedpe',
                                         experiment_name='hiccups', description='HiCCUPS loops')
peak_to_loop <- findOverlaps(consensus_peaks, c(anchorOne(loops), anchorTwo(loops)))

loop_lfc <- aggregate(res$log2FoldChange[queryHits(peak_to_loop)],
                      by=list(loop=ceiling(subjectHits(peak_to_loop) / 2)),
                      FUN=function(x) sum(x, na.rm=TRUE))

For implementation, use the InteractionSet Bioconductor package which preserves loop-pair structure during testing. Reference: Mumbach 2017 Nat Genet (HiChIP enhancer connectome).

Annotate Differential Peaks

Goal: Assign each differentially accessible peak to its nearest gene and feature class for downstream interpretation.

Approach: Pull DiffBind / DESeq2 results as GRanges, annotate via ChIPseeker against a TxDb with a custom promoter window, then plot annotation distribution and extract gene IDs for enrichment.

library(ChIPseeker); library(TxDb.Hsapiens.UCSC.hg38.knownGene)

diff_peaks <- dba.report(dba)
peakAnno <- annotatePeak(diff_peaks, TxDb=TxDb.Hsapiens.UCSC.hg38.knownGene,
                         tssRegion=c(-2000, 500), level='gene')
plotAnnoPie(peakAnno); plotDistToTSS(peakAnno)
genes <- as.data.frame(peakAnno)$geneId          # for GO enrichment via pathway-analysis/go-enrichment

tssRegion=c(-2000, 500) defines promoter as TSS-2kb to TSS+500bp; ChIPseeker default (-3000, 3000) over-counts promoter assignments. Adjust per cell type / organism.

Common Errors

References

Stark R & Brown G 2011 DiffBind (Bioconductor; canonical reference)
Lun ATL & Smyth GK 2014 NAR 42:e95 (csaw windowed differential method)
Love MI et al 2014 Genome Biol 15:550 (DESeq2)
Robinson MD et al 2010 Bioinformatics 26:139 (edgeR)
Chen Y et al 2016 F1000Res 5:1438 (edgeR-QL framework)
Leek JT 2014 NAR 42:e161 (svaseq for hidden batch)
Risso D et al 2014 Nat Biotechnol 32:896 (RUVseq)
Reske JJ et al 2020 Epigenetics Chromatin 13:22 (ATAC normalization-method comparison; ARID1A/PIK3CA global-shift case study)
Gontarz P et al 2020 Sci Rep 10:10150 (comparison of differential-accessibility analysis strategies for ATAC-seq)
Corces MR et al 2018 Science 362:eaav1898 (iterative overlap, fixed-width 501 bp consensus)
Yu G et al 2015 Bioinformatics 31:2382 (ChIPseeker)

Related Skills

atac-seq/atac-peak-calling - Generate per-replicate peaks
atac-seq/consensus-peakset - Build the differential-ready consensus peakset
atac-seq/atac-qc - Pre-screen and drop failing replicates
atac-seq/single-cell-atac - Pseudobulk-level differential per cluster
atac-seq/co-accessibility - Identify cis-regulatory connections among DA peaks
differential-expression/deseq2-basics - Underlying DESeq2 patterns
differential-expression/de-results - Effect-size reporting and shrinkage
chip-seq/differential-binding - Same DiffBind workflow, ChIP context
pathway-analysis/go-enrichment - Downstream gene-level enrichment of DA-associated genes

Adoption

GPTomics/bio-atac-seq-differential-accessibility

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Differential Accessibility

Algorithmic Taxonomy

Decision Tree by Experimental Scenario

Consensus Peak Set Strategy

Normalization: The ATAC-Specific Choice

Per-Tool Failure Modes

DiffBind -- Library-size choice confounds global change

DiffBind summits parameter -- Width-driven differential

csaw -- Window size and filter choice dominates results

DESeq2 -- Apeglm shrinkage with too few reps

edgeR QL -- Filter must be aggressive enough

Reconciliation: When Tools Disagree

Effect Size and Threshold Selection

Hidden Batch with SVA / RUVseq

Spike-in Normalization

Permutation Testing for Low Replicate Designs

DESeq2 Likelihood Ratio Test for Time-Courses

Hi-C-Loop-Anchored Differential

Annotate Differential Peaks

Common Errors

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-atac-seq-differential-accessibility

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Differential Accessibility

Algorithmic Taxonomy

Decision Tree by Experimental Scenario

Consensus Peak Set Strategy

Normalization: The ATAC-Specific Choice

Per-Tool Failure Modes

DiffBind -- Library-size choice confounds global change

DiffBind summits parameter -- Width-driven differential

csaw -- Window size and filter choice dominates results

DESeq2 -- Apeglm shrinkage with too few reps

edgeR QL -- Filter must be aggressive enough

Reconciliation: When Tools Disagree

Effect Size and Threshold Selection

Hidden Batch with SVA / RUVseq

Spike-in Normalization

Permutation Testing for Low Replicate Designs

DESeq2 Likelihood Ratio Test for Time-Courses

Hi-C-Loop-Anchored Differential

Annotate Differential Peaks

Common Errors

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis