epitranscriptomics/m6a-differential/SKILL.md
Identifies differential m6A methylation between conditions from MeRIP-seq paired IP/input data using exomePeak2 with `bam_ip` + `bam_input` (control arm) and `bam_treated_ip` + `bam_treated_input` (treatment arm) for integrated GC-bias-aware differential calling (Liu 2022 *NAR Genom Bioinform* 4:lqac046), QNB beta-binomial test (Liu 2017 *BMC Bioinformatics* 18:387), MeTDiff HMM-based differential bundled with MeTPeak, RADAR (Zhang 2019 *Genome Biol* 20:294) with its `filterBins -> diffIP -> reportResult` workflow, and the defensible paired-symmetric case of edgeR / DESeq2 on featureCounts-on-peaks matrices when batch / lot covariates need explicit fixed-effect handling (exomePeak2's top-level API does NOT accept arbitrary covariates). Covers paired vs unpaired vs interaction designs, batch confounding (antibody lot, RNA prep, sequencing run) and the per-lot meta-analysis strategy when exomePeak2 is the primary caller, the stoichiometry-vs-expression-vs-IP-efficiency confound that all MeRIP differential methods inherit, normalisation choice (size factor on IP, on input, on per-sample IP/input ratio), the McIntyre 2020 reproducibility caveat, effect-size filtering as a guardrail against under-powered N=2 designs, and orthogonal-validation routes for absolute stoichiometry (GLORI / SAC-seq / m6Anet `mod_ratio`). Use when comparing m6A levels across two or more conditions, choosing between exomePeak2 / QNB / RADAR / MeTDiff for a given design, handling batch confounding when exomePeak2's API is too rigid, normalising against input properly, distinguishing real hyper- / hypo-methylation from expression-level shifts, applying effect-size thresholds, interpreting volcano plots of differential peaks, or planning a follow-up orthogonal stoichiometry validation.
npx skillsauth add GPTomics/bioSkills bio-epitranscriptomics-m6a-differentialInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: exomePeak2 1.14+ (Bioconductor 3.18+), QNB 1.1.11 (GitHub lzcyzm/QNB), MeTDiff (GitHub, bundled with MeTPeak), RADAR 0.2.4+ (GitHub scottzijiezhang/RADAR), DESeq2 1.42+, edgeR 4.0+, GenomicFeatures 1.54+, ggplot2 3.5+, GenomicAlignments 1.38+, Rsubread 2.16+.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('exomePeak2') then ?exomePeak2 to verify parameterspackageVersion('QNB') then ?qnbtest to confirm argument signatureIf R throws unused argument or argument is missing, the API moved between Bioconductor minor releases; consult ?exomePeak2 directly. QNB function signature has been stable since 2017 but the GitHub source has occasional changes; pin the commit SHA.
exomePeak2's differential interface is NOT a mode= argument — populate the bam_treated_ip= and bam_treated_input= arguments alongside the standard bam_ip= / bam_input= (control arm) to trigger paired differential calling. peak_calling_mode is a separate argument controlling locus scope (exon | full_transcript | whole_genome). Verify against ?exomePeak2 and Bioconductor 3.20+ release notes. QNB is on GitHub only; install via devtools::install_github('lzcyzm/QNB'). RADAR is on GitHub only; install via devtools::install_github('scottzijiezhang/RADAR'); its differential workflow is countReads -> normalizeLibrary -> adjustExprLevel -> filterBins -> diffIP -> reportResult.
"Compare m6A methylation between my conditions" -> Quantify how much each m6A peak's IP/input enrichment shifts between conditions, after normalising for transcript-abundance changes (which all show up in input) and for IP-efficiency drift (which the design matrix and within-run replicates control for). Then apply effect-size filtering to distinguish real biology from the technical noise floor that all MeRIP differential methods inherit (McIntyre 2020 Sci Rep 10:6590: between-study m6A peak overlap is ~45% median; differential calls within that noise envelope routinely fail to replicate). Critically: a higher MeRIP signal in condition A vs B can mean (1) more transcripts of the peak-bearing gene, (2) more methylation per transcript, OR (3) higher IP efficiency in batch A — distinguishing requires careful normalisation or an orthogonal absolute-stoichiometry method.
exomePeak2::exomePeak2(bam_ip, bam_input, bam_treated_ip, bam_treated_input, ...) -- integrated peak + differential GLM (modern default)QNB::qnbtest(control_ip, treated_ip, control_input, treated_input) -- beta-binomial for small N (Liu 2017 BMC Bioinformatics 18:387)RADAR::diffIP() then reportResult() -- Poisson-NB on peak windows with TMM normalisation (Zhang 2019 Genome Biol 20:294)MeTDiff::metdiff() -- HMM-based differential paired with MeTPeakA higher m6A peak signal in condition A vs B can mean ANY of: (1) more transcripts of the peak-bearing gene (expression up; the input increases proportionally, so the ratio should not change — but residual normalisation noise leaks through), (2) more methylation per transcript (stoichiometry up; the real biology of interest), (3) higher IP efficiency in batch A (technical; antibody lot, IP day, RNA prep). Differential MeRIP WITHOUT per-window input-normalisation OR an orthogonal stoichiometry-aware method (GLORI Liu 2023 Nat Biotechnol 41:355; SAC-seq Hu 2022 Nat Biotechnol 40:1210; MAZTER-seq Garcia-Campos 2019 Cell 178:731; m6Anet per-read modification rate) cannot separate the three. exomePeak2 differential mode, QNB, RADAR, and MeTDiff all implement per-window IP/input ratio modelling, but each has different default normalisations and the choice matters. Equally critical: McIntyre 2020 Sci Rep 10:6590 showed that "differential" m6A peaks from MeRIP-seq routinely do not replicate between independent studies in nominally identical conditions. The empirical noise floor is high; effect-size filtering (|log2FC| >= 0.5 minimum, often >= 1) AND replicate-direction concordance (the change is consistent in direction across replicates) AND minimum N=3 per condition are needed for differential calls to survive replication. For any absolute stoichiometry claim ("this peak is 80% methylated in tumour vs 20% in normal"), require an orthogonal stoichiometry method, not MeRIP alone.
| Tool / mode | Mechanism | Inputs | Output | Strength | Fails when | |-------------|-----------|--------|--------|----------|------------| | exomePeak2 differential (Liu 2022) | Transcript-windowed Poisson GLM with GC-bias correction; integrated peak + differential | control IP/input BAM vectors + treated IP/input BAM vectors + TxDb + BSgenome | Differential peaks with log2FC + FDR per peak | Most use cases; integrates with peak calling; modern default | Small-N (<=2) overdispersion poorly estimated; top-level API does NOT accept arbitrary covariates (use DESeq2/edgeR for batch-aware designs) | | QNB (Liu 2017 BMC Bioinformatics 18:387) | Quad-negative-binomial joint model of IP, input, condition | per-peak count matrices (4 matrices: ip1, ip2, input1, input2) | Differential peaks with p-value + log2 RR | Designed for small N (2-3 per group); handles overdispersion explicitly | Requires pre-computed count matrices; not integrated with peak calling | | MeTDiff (bundled with MeTPeak) | HMM + Beta-binomial differential paired with MeTPeak | paired IP/input BAM + GTF + condition factor | Differential peaks per window | Pairs naturally with MeTPeak output; HMM smoothing helps low-coverage | GitHub-only; less benchmarked than exomePeak2 / QNB | | RADAR (Zhang 2019 Genome Biol 20:294) | Poisson-NB with TMM normalisation; reproducibility-aware | paired IP/input BAM + condition factor | Differential peaks with logFC + FDR | Explicit replicate-variance modeling; reproducibility-aware framework | GitHub-only; slower than exomePeak2 | | DRME (Liu 2016 Anal Biochem 499:15) | Count-based small-sample alternative | per-peak count matrices | Differential peaks | Sister to QNB from same group; small-N alternative | Less benchmarked than QNB / exomePeak2 | | edgeR / DESeq2 on peak counts | Generic RNA-seq differential framework applied to featureCounts-on-peaks | peak count matrix + sample sheet | Differential peaks with log2FC + FDR | Familiar; flexible designs; well-tested in RNA-seq | Treats peak counts as RNA counts; loses IP/input pairing structure; defensible only for paired symmetric designs with strong input normalisation | | Ratio-of-ratios (heuristic) | Compute per-peak log2 (IP_A / Input_A) - log2 (IP_B / Input_B) per sample, then t-test | per-peak count matrix | per-peak t-test | Transparent; no model assumptions | No multiple-testing correction; ignores overdispersion; not recommended for primary analysis |
| Scenario | Recommended | Why wrong choices fail |
|----------|-------------|------------------------|
| Standard 3-vs-3 paired-design MeRIP differential | exomePeak2 with bam_ip (ctrl) + bam_treated_ip (treat) and matching inputs | QNB usable but designed for smaller N; edgeR/DESeq2 loses IP/input pairing |
| Very small N (2 vs 2) | QNB (designed for small-sample overdispersion); supplement with exomePeak2 if possible | edgeR / DESeq2 dispersion estimation collapses; exomePeak2 GLM also struggles at N=2 |
| Paired design (patient as blocking factor) | QNB per-pair then aggregate; OR featureCounts-on-peaks -> DESeq2 with ~patient + condition design; exomePeak2 cannot encode patient blocking via its top-level API | Unpaired analysis inflates within-group variance |
| Interaction design (genotype × treatment) | featureCounts-on-peaks -> DESeq2 / edgeR with interaction term; exomePeak2 top-level API is two-group only | QNB pairwise only; build interaction model from pairwise contrasts manually |
| Batch confounding (antibody lot, sequencing run, IP day) | Include batch as fixed effect in DESeq2 / edgeR model on featureCounts-on-peaks matrix; OR run exomePeak2 per-batch and meta-analyse | Pooling cross-batch counts without batch term attributes lot-effect to condition |
| Time-course differential | featureCounts-on-peaks -> DESeq2 / limma with time as numeric covariate; OR pairwise time-point exomePeak2 contrasts | Naive group-vs-group ignores time structure |
| Stoichiometry claims (not just enrichment) | NOT MeRIP differential -- orthogonal GLORI / SAC-seq / m6Anet per-read | MeRIP IP fold-change is relative; cannot give per-molecule stoichiometry |
| Cross-batch differential (different antibody lots) | featureCounts-on-peaks -> DESeq2 / edgeR with lot in design; OR run exomePeak2 per-lot then meta-analyse; ideally avoid confounding lot with condition at the experimental-design stage | Lot effect inflates false positives; exomePeak2 top-level API cannot encode lot |
| Validation of differential calls | Run >=2 differential methods; require concordant direction across replicates AND |log2FC| > 0.5 minimum (>= 1 stringent); orthogonal validation at top hits | Single-method differential calls within technical noise floor (per McIntyre 2020) |
| Visualising differential peaks | Volcano plot with |log2FC| + FDR thresholds; MA plot to inspect normalisation; per-peak boxplot for top hits | Single number summaries hide stoichiometry vs expression confound |
| Wanting to test a single gene / locus | Targeted: per-peak boxplot across replicates with condition factor; manual t-test or Wilcoxon at high-coverage peak | Whole-transcriptome differential testing wastes multiple-testing budget for single-locus questions |
Methodology evolves; before any high-stakes differential analysis, web-search "exomePeak2 differential mode Bioconductor 3.20" and "MeRIP differential benchmark McIntyre" for current consensus parameters.
Goal: Identify m6A peaks that differ in methylation level between conditions, controlling for transcript-abundance differences (via input normalisation) and GC bias (via internal correction), with an integrated peak-calling + differential pipeline.
Approach: Build TxDb from the matched GTF; pass control IP/input BAM vectors via bam_ip and bam_input AND treatment IP/input BAM vectors via bam_treated_ip and bam_treated_input — populating the treated arms triggers differential mode (there is no separate mode= argument). Output is per-peak log2FC + FDR.
library(exomePeak2)
library(GenomicFeatures)
library(BSgenome.Hsapiens.UCSC.hg38)
txdb <- makeTxDbFromGFF('refs/annotation.gtf', format='gtf')
ctrl_ip <- c('aligned/ctrl_IP1.bam', 'aligned/ctrl_IP2.bam', 'aligned/ctrl_IP3.bam')
ctrl_input <- c('aligned/ctrl_Input1.bam', 'aligned/ctrl_Input2.bam', 'aligned/ctrl_Input3.bam')
treat_ip <- c('aligned/treat_IP1.bam', 'aligned/treat_IP2.bam', 'aligned/treat_IP3.bam')
treat_input <- c('aligned/treat_Input1.bam', 'aligned/treat_Input2.bam', 'aligned/treat_Input3.bam')
result <- exomePeak2(
bam_ip = ctrl_ip,
bam_input = ctrl_input,
bam_treated_ip = treat_ip,
bam_treated_input = treat_input,
txdb = txdb,
genome = BSgenome.Hsapiens.UCSC.hg38,
paired_end = TRUE,
library_type = 'unstranded',
peak_calling_mode = 'exon',
save_dir = 'exomepeak2_diff_output',
experiment_name = 'ctrl_vs_treat'
)
diff_table <- as.data.frame(result)
nrow(diff_table)
head(diff_table[, c('seqnames', 'start', 'end', 'log2FC', 'pvalue', 'padj')])
peak_calling_mode accepts 'exon' (transcript-aware, default), 'full_transcript', or 'whole_genome'; the meaning is locus scope, NOT differential-vs-non-differential. For arbitrary covariate adjustment (batch, antibody lot, patient blocking), the exomePeak2 top-level API is insufficient — move counts into DESeq2 / edgeR via the featureCounts-on-peaks route below.
Goal: Test differential m6A at pre-called peaks using a quad-negative-binomial model that handles small-N overdispersion better than generic GLM frameworks.
Approach: Count reads in each IP and Input BAM at each peak using featureCounts or summarizeOverlaps; pass the four count matrices (ip1, ip2, input1, input2 — ip/input per group) to qnbtest().
library(QNB)
library(Rsubread)
library(rtracklayer)
peaks <- import('exomepeak2_output/m6a_run1/peaks.bed')
peak_saf <- data.frame(
GeneID = paste0('peak_', seq_along(peaks)),
Chr = as.character(seqnames(peaks)),
Start = start(peaks),
End = end(peaks),
Strand = as.character(strand(peaks))
)
count_matrix <- function(bam_paths, peak_saf) {
fc <- featureCounts(
files = bam_paths,
annot.ext = peak_saf,
isPairedEnd = TRUE,
nthreads = 8,
allowMultiOverlap = TRUE
)
fc$counts
}
ip_ctrl <- count_matrix(c('aligned/ctrl_IP1.bam', 'aligned/ctrl_IP2.bam', 'aligned/ctrl_IP3.bam'), peak_saf)
ip_treat <- count_matrix(c('aligned/treat_IP1.bam', 'aligned/treat_IP2.bam', 'aligned/treat_IP3.bam'), peak_saf)
in_ctrl <- count_matrix(c('aligned/ctrl_Input1.bam', 'aligned/ctrl_Input2.bam', 'aligned/ctrl_Input3.bam'), peak_saf)
in_treat <- count_matrix(c('aligned/treat_Input1.bam', 'aligned/treat_Input2.bam', 'aligned/treat_Input3.bam'), peak_saf)
qnb_result <- qnbtest(
control_ip = ip_ctrl,
treated_ip = ip_treat,
control_input = in_ctrl,
treated_input = in_treat,
mode = 'per-condition'
)
head(qnb_result)
sig <- qnb_result[qnb_result$padj < 0.05 & abs(qnb_result$log2.RR) > 0.5, ]
nrow(sig)
Verify QNB argument names against ?qnbtest for the installed version; older tutorials may show different signatures.
Goal: Test differential m6A using a Poisson-NB framework with TMM normalisation and explicit replicate-variance modeling; useful when replicate variability is a known issue.
Approach: RADAR's documented workflow is countReads -> normalizeLibrary -> adjustExprLevel -> filterBins -> diffIP -> reportResult. CRITICAL: RADAR expects matched BAMs in bamFolder named <sample>.input.bam and <sample>.m6A.bam per replicate; the generic IP / Input naming used elsewhere must be re-conformed or symlinked. variable() is set with a data.frame, NOT a bare factor.
library(RADAR)
radar <- countReads(
samplenames = c('ctrl_rep1', 'ctrl_rep2', 'ctrl_rep3', 'treat_rep1', 'treat_rep2', 'treat_rep3'),
gtf = 'refs/annotation.gtf',
bamFolder = 'aligned_radar/',
modification = 'm6A',
strandToKeep = 'opposite',
threads = 8
)
radar <- normalizeLibrary(radar)
radar <- adjustExprLevel(radar)
variable(radar) <- data.frame(group = c('ctrl', 'ctrl', 'ctrl', 'treat', 'treat', 'treat'))
radar <- filterBins(radar, minCountsCutOff = 15)
radar <- diffIP(radar)
result <- reportResult(radar, cutoff = 0.1, Beta_cutoff = 0.5)
sig <- result[result$padj < 0.05 & abs(result$logFC) > 0.5, ]
nrow(sig)
reportResult thresholds (cutoff = p-value cutoff; Beta_cutoff = effect-size cutoff in beta units) are RADAR-specific; convert to the project's standard reporting thresholds downstream. The aligned_radar/ directory should contain BAMs with RADAR's expected naming (<sample>.input.bam, <sample>.m6A.bam).
Goal: Visualise the differential peak set with effect size on the x-axis and statistical significance on the y-axis; flag peaks passing |log2FC| and FDR thresholds.
Approach: Standard ggplot2 volcano with colour-coded significance and threshold lines.
library(ggplot2)
diff_table$significance <- with(diff_table,
ifelse(padj < 0.05 & abs(log2FC) > 0.5, 'differential', 'not_sig'))
ggplot(diff_table, aes(x=log2FC, y=-log10(padj), colour=significance)) +
geom_point(alpha=0.5, size=0.8) +
geom_vline(xintercept=c(-0.5, 0.5), linetype='dashed') +
geom_hline(yintercept=-log10(0.05), linetype='dashed') +
scale_colour_manual(values=c(differential='red', not_sig='grey60')) +
labs(x='log2 (treat / ctrl) MeRIP enrichment ratio',
y='-log10 (FDR)',
title='Differential m6A peaks: ctrl vs treat',
caption='Per-window IP/input ratio normalised; not absolute stoichiometry') +
theme_minimal()
The caption is intentional: MeRIP differential reports CHANGES IN ENRICHMENT RATIO, NOT changes in absolute stoichiometry. For stoichiometry claims, cross-validate with GLORI / SAC-seq / m6Anet.
Trigger: "Peak X shows hyper-methylation in treatment" inferred from MeRIP IP fold-change alone.
Mechanism: MeRIP IP fold-change conflates per-molecule methylation stoichiometry, transcript abundance, and IP efficiency variation between libraries. An IP fold-change increase can reflect any or all of these.
Symptom: Reported m6A "hyper-methylation" tracks RNA-seq expression changes between conditions; reverse-direction effects when properly normalised against input.
Fix: Use "increased / decreased enrichment" terminology for MeRIP-only studies. Reserve "hyper- / hypo-methylated" for studies with absolute quantification orthogonal validation (GLORI, SAC-seq, MAZTER-seq, m6Anet per-read). For high-stakes claims at named loci, run GLORI on a subset of conditions.
Trigger: Reporting "1,500 differential m6A peaks" with FDR < 0.05 (uncorrected p-value or naive multiple testing) and no effect-size filter.
Mechanism: With sufficient sequencing depth, MeRIP-seq has high statistical power to detect very small (~1.1-1.2x) IP-ratio changes that lie within antibody / technical noise. McIntyre 2020 Sci Rep 10:6590 showed these changes do not replicate.
Symptom: Differential peak set has many peaks with small effect sizes; replication in an independent study recovers <30% of original calls.
Fix: Apply effect-size filter (|log2FC| >= 0.5 minimum, often >= 1) AND adjusted p-value (FDR < 0.05) AND replicate-direction concordance. Differential peaks should be reported with effect size, not just p-value. Report effect-size distribution alongside peak count.
Trigger: Differential m6A study with N=2 IP and N=2 input per condition.
Mechanism: Per McIntyre 2020 and many subsequent benchmarks, MeRIP replicate variance is high; N=2 estimates of dispersion are unreliable; differential calls are unstable.
Symptom: Many "differential" peaks; volcano plot dense; small fraction replicates in held-out replicate.
Fix: Minimum N=3 per condition (per condition per IP/input arm = 12 BAMs for a 2-condition study); N=4-5 preferred for high-stakes claims. Underpowered studies should report effect-size-only filtered subsets (the most extreme peaks) and acknowledge the noise floor explicitly.
Trigger: Control samples processed in batch 1 with antibody lot A; treatment samples processed in batch 2 with antibody lot B.
Mechanism: Anti-m6A antibody lots have batch-to-batch variability in pulldown efficiency and m6A-vs-m6Am cross-reactivity. Pooling cross-batch counts in a differential model attributes batch-effect to condition.
Symptom: "Differential" peaks at high-abundance transcripts; effect sizes track batch rather than condition; reanalysis with batch in the model removes most differential peaks.
Fix: Include antibody_lot / batch / prep_day as a fixed effect in a DESeq2 / edgeR model on featureCounts-on-peaks counts; OR run exomePeak2 separately per lot and meta-analyse the per-lot differential peak sets; OR re-design the experiment to avoid lot-condition confounding. exomePeak2's top-level API does NOT accept arbitrary covariates — DESeq2 / edgeR is the route when covariate handling is required.
mode= argumentTrigger: exomePeak2(..., mode='differential') OR exomePeak2(..., mode='diff_peak') returns "unused argument" error.
Mechanism: exomePeak2 has NO mode= argument. Differential is triggered by populating bam_treated_ip and bam_treated_input alongside the standard bam_ip and bam_input (control arm). The peak_calling_mode argument is unrelated — it accepts 'exon' | 'full_transcript' | 'whole_genome' and controls locus scope, not differential-vs-non-differential.
Fix: Populate the four BAM-vector arguments (bam_ip, bam_input, bam_treated_ip, bam_treated_input); drop any mode= reference; consult ?exomePeak2 for the authoritative signature in the installed version.
Trigger: Compute featureCounts at peaks, build a count matrix, run edgeR / DESeq2 on the matrix as if peaks were genes.
Mechanism: Peak counts reflect both IP enrichment AND transcript abundance. Generic RNA-seq DE on peak counts mixes the two; size-factor normalisation on IP-only counts loses the input-pair information.
Fix: edgeR / DESeq2 on peak counts is defensible ONLY for paired symmetric designs where input is modelled as an offset (per-sample size factor on input counts AND per-sample size factor on IP counts, then differential on the ratio). For most uses, exomePeak2 / QNB / RADAR's purpose-built models are more appropriate.
Trigger: featureCounts(...) invoked without strandSpecific= flag; or with wrong strand setting.
Mechanism: Strand-aware counting matters when the protocol is stranded (most modern MeRIP is unstranded; some are reverse-stranded). Wrong strand counts include antisense reads as if they were sense.
Fix: Verify protocol strandedness from sequencing-core notes or by inspecting featureCounts summary at a few transcripts. Pass strandSpecific=0 (unstranded), 1 (forward), or 2 (reverse) explicitly.
| Pattern | Likely cause | Action | |---------|--------------|--------| | exomePeak2 calls a differential peak; QNB does not | Different dispersion estimates; QNB more conservative at low coverage | Trust intersection; report concordant set as high-confidence | | Most "differential" peaks fall at high-abundance housekeeping transcripts | Expression / IP-efficiency confound | Check log2FC vs input log2FC; if correlated, batch effect or expression-driven | | RADAR vs exomePeak2 disagree on direction | Different normalisation (TMM vs internal) | Inspect normalisation diagnostic; choose method aligned with experimental design | | Differential peaks anti-correlated with RNA-seq DE | Expression conflated with methylation in the differential model | Re-run with stronger input normalisation; consider per-peak ratio normalisation | | Single differential peak survives across all methods | High-confidence call | Orthogonally validate (GLORI / SAC-seq) at the named locus | | Differential calls scatter randomly across genome | Underpowered; technical noise dominates | Increase N; apply stricter effect-size filter; report null result honestly | | Cross-condition peak overlap < 50% before differential | Conditions are biologically very different; OR antibody lot effect | Inspect cross-replicate concordance; check antibody lot metadata | | Volcano shows extreme outliers at low-coverage peaks | Per-peak variance dominated by Poisson sampling | Filter peaks by minimum coverage (>=30 reads in IP AND input) before differential |
| Quantity | Threshold | Source / rationale | |----------|-----------|--------------------| | Minimum biological replicates per condition | 3 (4-5 preferred) | McIntyre 2020 Sci Rep 10:6590 — N=2 routinely under-powered | | FDR threshold | 0.05 | Standard convention | | Effect-size threshold (|log2FC|) | >= 0.5 minimum; >= 1 stringent | Below 0.5, calls within technical noise floor | | Minimum coverage per peak window (each IP AND input) | 30 reads | Standard convention; below this, statistical calls noisy | | Replicate-direction concordance | Same direction in >=2/N replicates | Guardrail against single-replicate artifacts | | Antibody lot tracking | Mandatory in design matrix when lots differ | Lot confounding is a known false-positive source | | MeRIP per-window IP/input ratio inflation | log2(IP/input) >= 1 for "enriched"; >=2 for "strongly enriched" | Convention | | Cross-study peak overlap baseline | ~45% median between labs | McIntyre 2020 Sci Rep 10:6590 — bounds inter-study reproducibility | | Effect size for biological validation | |log2FC| >= 1 AND padj < 0.05 typically used for downstream wet-lab follow-up | Field convention; tighten for low-N studies | | GLORI orthogonal validation threshold | Stoichiometry change >= 10% at named site | Liu C 2023 Nat Biotechnol 41:355 calibration |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| exomePeak2 mode='differential' rejected | exomePeak2 has no mode= arg; differential is via bam_treated_ip + bam_treated_input | Populate the four BAM-vector args; consult ?exomePeak2 |
| QNB error "unused argument" | Function signature changed between versions | Pin commit SHA; verify against installed ?qnbtest |
| RADAR install fails | GitHub-only; requires devtools and specific Bioconductor deps | devtools::install_github('scottzijiezhang/RADAR'); check Bioconductor requirements |
| featureCounts returns zero counts | Strand setting wrong; OR peak BED has different chromosome naming than BAM | Verify strandSpecific=; reconcile chromosome names |
| Volcano plot all peaks near origin | Low effect sizes; technical noise dominant | Check input normalisation; increase N; report null result if appropriate |
| FDR-significant but small effect size | Large dataset finds small differences with high power | Apply effect-size filter; report effect-size distribution |
| Many differential peaks track expression changes | Input normalisation insufficient | Re-run with stronger input adjustment; OR use ratio-of-ratios |
| edgeR estimateDisp() fails at N=2 | Small-N dispersion estimation collapses | Use QNB instead; designed for this case |
| MeTDiff install fails | GitHub-only, bundled with MeTPeak | devtools::install_github('compgenomics/MeTPeak') |
| exomePeak2 result has no log2FC / padj columns | bam_treated_ip / bam_treated_input were not populated; only ran peak calling on the control arm | Pass all four BAM-vector arguments (bam_ip, bam_input, bam_treated_ip, bam_treated_input) to trigger differential output |
| Per-peak boxplot shows huge within-condition variance | High biological noise; OR one replicate is an outlier | Inspect plotCorrelation in merip-preprocessing for outlier; consider exclusion |
| Pushback | Response |
|----------|----------|
| "How many replicates per condition?" | N=3 minimum; N=4-5 preferred; rationale McIntyre 2020 |
| "What's the effect-size threshold?" | |log2FC| >= 0.5 minimum; results table reports effect size alongside FDR |
| "Was batch / antibody lot controlled for?" | Yes — antibody_lot included as fixed effect in DESeq2 model on featureCounts-on-peaks counts (exomePeak2's top-level API does not accept covariates); per-lot exomePeak2 + meta-analysis when DESeq2 path infeasible; lot-condition confounding assessed |
| "Does the differential signal track expression changes?" | Cross-checked log2FC vs input log2FC per peak; only peaks with strong IP/input ratio shift reported as differential |
| "Was orthogonal validation done?" | Top hits orthogonally validated via GLORI / m6Anet / per-locus assay |
| "Why exomePeak2 over QNB?" | exomePeak2 default for standard 3-vs-3; QNB used for small-N sensitivity analyses; both reported as concordance check |
| "What about absolute stoichiometry?" | MeRIP differential reports relative enrichment changes; absolute stoichiometry requires GLORI / SAC-seq |
| "Was the right normalisation used?" | Per-window IP/input ratio normalisation; exomePeak2 internal; alternative TMM (RADAR) compared |
| "How does this replicate in independent studies?" | Cross-study peak overlap reported; differential subset checked against published m6A-Atlas |
| "Why not edgeR / DESeq2?" | Generic RNA-seq DE on peak counts loses IP/input pairing structure; used only for sensitivity analysis with paired symmetric design |
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.