clip-seq/differential-clip/SKILL.md
Identify differentially bound regions across CLIP-seq conditions (knockdown vs control, treatment vs vehicle, disease vs healthy) using DEWSeq (sliding-window DESeq2), Flipper (Skipper-downstream), ASpeak, edgeR, or limma-voom. Use when computing condition-level changes in RBP binding intensity, choosing peak-level vs window-level vs crosslink-level testing, designing replicate experiments, or distinguishing biological binding shifts from technical confounders.
npx skillsauth add GPTomics/bioSkills bio-clip-seq-differential-clipInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: DEWSeq 1.18+, htseq-clip 2.0+, DESeq2 1.44+, edgeR 4.2+, limma 3.60+, Flipper (commit 2024.04+), Skipper (commit 2023.05+), pybedtools 0.10+, pyranges 0.0.129+.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_name to verify parameterspip show <package> then help(module.function) to check signatures<tool> --version then <tool> --help to confirm flagsIf code throws unexpected errors, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Identify regions with changed RBP binding across conditions" -> Test for condition-level differences in IP enrichment relative to SMInput, accounting for replicate variance and (where available) sequencing depth normalization. Three statistical scales are possible: peak-level (test each peak as a unit), window-level (test fixed transcriptome windows; DEWSeq, Flipper), or crosslink-site level (test single-nt positions). The choice depends on the biology (narrow regulatory shift vs broad binding-mode change) and on which upstream peak caller was used (CLIPper -> peak-level; Skipper -> window-level; PureCLIP -> CL-level).
library(DEWSeq); dds <- DESeqDataSetFromSlidingWindows(counts, colData, design=~condition); dds <- DESeq(dds); res <- results(dds)flipper differential -i skipper_out/ --design design.tsv --contrast treatment vs control -o flipper_out/dge <- DGEList(counts=peak_counts, group=condition); dge <- calcNormFactors(dge); design <- model.matrix(~condition); dge <- estimateDisp(dge, design); fit <- glmQLFit(dge, design); res <- glmQLFTest(fit)v <- voom(dge, design); fit <- lmFit(v, design); fit <- eBayes(fit); res <- topTable(fit, coef=2, number=Inf)htseq-clip extract -i annotation.gff -o annotation_windows.bed -w 50 -s 20 && htseq-clip count -i sample.bam -w annotation_windows.bed -o sample.counts.txtDEWSeq is the EMBL/Hentze-group windowed-NB approach designed specifically for differential CLIP. Flipper is the Skipper-companion tool (Tu 2024) for the modern Skipper workflow. Peak-level edgeR/limma-voom work when the upstream peak caller produced a comparable peak BED across conditions (e.g., consensus peaks from CLIPper).
| Tool | Scale | Statistical model | Replicate requirement | Strength | Fails when | |------|-------|-------------------|----------------------|----------|------------| | DEWSeq (Schwarzl 2020) | 50-100 nt sliding window | Negative binomial GLM (DESeq2 internals) | >= 2 reps per condition | Designed specifically for CLIP; integrates SMInput | Slow on dense libraries; output window-resolution | | Flipper (Tu 2024) | Skipper window (100 nt) | Negative binomial; designed for Skipper output | >= 2 reps per condition | Modern; pairs with Skipper peak caller | Only useful if upstream is Skipper | | ASpeak | Peak | Negative binomial | >= 2 reps | Peak-level for CLIPper output | Less popular; legacy | | edgeR (general) | Peak or window | Quasi-likelihood F-test on NB | >= 2 reps | Mature, widely cited | Generic; not CLIP-aware; needs careful normalization | | limma-voom | Peak or window | Linear model with mean-variance trend | >= 2 reps | Fast; well-validated; handles small samples | Generic; treats counts as continuous after voom | | DESeq2 (direct) | Peak or window | Negative binomial GLM | >= 2 reps | Mature; same engine as DEWSeq | Same as edgeR caveats | | MASTR-seq (Banks 2023) | Cell-resolved (scCLIP) | Cell-mixture model | Cells | Single-cell CLIP differential | Specialized; few published applications | | diffbind (CLIP adaptation) | Peak | DESeq2 or edgeR backend | >= 2 reps | Familiar from ChIP/ATAC | Designed for ChIP; needs CLIP-specific normalization | | MAnorm2 | Peak | Hierarchical empirical Bayes | >= 2 reps | Tested on ChIP; less on CLIP | Less CLIP-specific |
Methodology evolves; verify DEWSeq vignette and Flipper paper for current best practice. The DEWSeq + htseq-clip pipeline (Schwarzl 2020) is the most-published CLIP-specific differential framework; Flipper (Tu 2024) is the modern Skipper-coupled alternative.
Use ~ type + condition + type:condition and test the interaction-term coefficient. This is the single most consequential statistical choice in differential CLIP.
type = ip vs sminput (whether the library is IP or size-matched input)condition = treated vs untreated (or KD vs control)type:condition interaction = "Does the IP-vs-input ratio shift across conditions?"A naive ~ condition design tests whether read counts differ regardless of whether they come from IP or SMInput - this confounds binding changes with expression changes. The interaction term explicitly tests for differential binding (the biologically meaningful signal) rather than differential expression at peak loci.
When testing, extract the interaction-term coefficient: results(dds, name = 'typeip.conditiontreat'). The log2FoldChange returned is the change in IP/input ratio in treated vs untreated - this is what "differential binding" means.
Three scales differ in resolution and statistical power:
Peak-level (edgeR, limma-voom, DESeq2 on CLIPper peaks): Test each consensus peak's IP/SMInput log2 FC across conditions. Pro: peak boundaries are biologically meaningful; output interpretable. Con: peak set changes across conditions (a new peak in treatment but missing in control complicates testing); SMInput normalization must be applied consistently.
Window-level (DEWSeq, Flipper): Tile transcriptome into fixed 50-100 nt windows; test each window. Pro: comparable across conditions (windows are pre-defined); handles binding-mode shifts within a peak; high statistical power. Con: window-resolution; multiple-testing burden (millions of windows); biological meaning of a window needs translation.
Crosslink-level (custom): Test each single-nt CL position. Pro: nucleotide resolution; captures motif-level shifts. Con: very low coverage per position; massive multiple-testing burden; rarely used in published differential CLIP.
| Goal | Scale | Tool | |------|-------|------| | ENCODE-style peak-level differential (CLIPper upstream) | Peak | DESeq2 / edgeR / limma-voom on CLIPper consensus peaks | | Maximum sensitivity windowed differential | Window | DEWSeq (with htseq-clip) or Flipper (with Skipper) | | Modern Skipper-coupled workflow | Window | Flipper | | Single-cell scCLIP differential | Cell | MASTR-seq or scCLAM (sparse) | | Compare binding-mode shifts within a peak | Window | DEWSeq | | Allele-specific differential | CL site | BEAPR + custom logistic | | RBP-KD effect on binding profile | Peak/window | DEWSeq (handles KD-effect on RBP itself) |
DEWSeq is the EMBL/Hentze-group differential framework specifically designed for CLIP. The pipeline is:
Goal: Identify transcriptome windows where the IP-vs-SMInput ratio shifts across conditions, accounting for replicate variance with the negative-binomial GLM.
Approach: Use htseq-clip to extract sliding 50 nt windows across annotated features, count reads per window per sample, build a DESeqDataSetFromSlidingWindows object with the ~ type + condition + type:condition interaction design, extract the typeip.conditiontreat interaction coefficient as the differential-binding effect size, and aggregate adjacent significant windows with bedtools merge -d 100.
# Step 1: htseq-clip generates sliding-window count matrices
htseq-clip extract \
-i gencode.v38.annotation.gff \
-o annotation_windows.bed \
--window-size 50 \
--window-step 20 \
--feature-type CDS,UTR
# Step 2: count IP and SMInput reads per window per sample
for sample in ip_rep1 ip_rep2 sminput_rep1 sminput_rep2; do
htseq-clip count \
-i ${sample}.dedup.bam \
-a annotation_windows.bed \
-o ${sample}.counts.txt \
--mate 2
done
# (For eCLIP, --mate 2 because R2 5' is the truncation site; for iCLIP single-end use --mate 1)
# Step 3: DEWSeq differential testing
htseq-clip mergeCounts \
-i ip_rep1.counts.txt ip_rep2.counts.txt sminput_rep1.counts.txt sminput_rep2.counts.txt \
-o merged_counts.tsv
library(DEWSeq)
counts <- read.table('merged_counts.tsv', sep='\t', header=TRUE, row.names=1)
colData <- data.frame(
sample = c('ip_rep1','ip_rep2','sminput_rep1','sminput_rep2'),
type = c('ip','ip','sminput','sminput'),
condition = c('treated','treated','untreated','untreated')
)
dds <- DESeqDataSetFromSlidingWindows(
countData = counts,
colData = colData,
annotObj = 'annotation_windows.bed',
design = ~ type + condition
)
dds <- DESeq(dds)
# In a model with `~ type + condition`, the IP-vs-input contrast is the simple condition
# main effect; to test the differential CLIP signal between conditions use the interaction
# term name from the design matrix (matches the skill's interaction-term guidance below):
res <- results(dds, name='typeip.conditiontreated')
# Window-level FDR adjustment
res_filtered <- res[!is.na(res$padj) & res$padj < 0.05 & abs(res$log2FoldChange) > 1, ]
# Aggregate adjacent significant windows into differential regions
sig_windows <- as.data.frame(res_filtered)
sig_windows$chr <- gsub('_.*', '', rownames(sig_windows))
# Custom reduce: combine adjacent windows within 100 nt
Flipper (Tu 2024) is the differential companion to Skipper, using the same 100 nt feature-respecting windows and the same beta-binomial framework.
# Assume Skipper has been run on all samples; Skipper output is at skipper_out/
flipper differential \
-i skipper_out/ \
--design design.tsv \
--contrast treatment vs control \
-o flipper_out/
# design.tsv format:
# sample_id condition replicate ip_or_input
# ip_treat_r1 treatment 1 ip
# ip_treat_r2 treatment 2 ip
# in_treat_r1 treatment 1 input
# ...
Output: differential window BED with log2 FC, p, padj per window.
library(DESeq2)
library(GenomicRanges)
library(Rsubread)
# Step 1: union of CLIPper peaks across conditions
# (See bedtools merge upstream)
peaks <- read.table('consensus_peaks.bed', sep='\t', col.names=c('chr','start','end','name','score','strand'))
# Step 2: count reads per peak per sample with featureCounts
saf <- data.frame(
GeneID = peaks$name,
Chr = peaks$chr,
Start = peaks$start + 1, # 1-based for featureCounts
End = peaks$end,
Strand = peaks$strand
)
counts_ip <- featureCounts(
files = c('ip_treat_r1.bam','ip_treat_r2.bam','ip_ctrl_r1.bam','ip_ctrl_r2.bam'),
annot.ext = saf,
isGTFAnnotationFile = FALSE,
strandSpecific = 1,
isPairedEnd = TRUE
)$counts
counts_in <- featureCounts(
files = c('in_treat_r1.bam','in_treat_r2.bam','in_ctrl_r1.bam','in_ctrl_r2.bam'),
annot.ext = saf,
isGTFAnnotationFile = FALSE,
strandSpecific = 1,
isPairedEnd = TRUE
)$counts
# Step 3: DESeq2 with IP vs SMInput interaction
all_counts <- cbind(counts_ip, counts_in)
colData <- data.frame(
type = rep(c('ip','input'), each=4),
condition = rep(c('treat','treat','ctrl','ctrl'), 2),
replicate = rep(c('r1','r2','r1','r2'), 2)
)
dds <- DESeqDataSetFromMatrix(countData = all_counts, colData = colData,
design = ~ type + condition + type:condition)
dds <- DESeq(dds)
# The interaction term `typeip.conditiontreat` tests:
# does the IP/input ratio differ in treatment vs control?
res <- results(dds, name = 'typeip.conditiontreat')
# Filter
res_sig <- res[!is.na(res$padj) & res$padj < 0.05 & abs(res$log2FoldChange) > 1, ]
The interaction-term design (type:condition) is the correct statistical model for differential CLIP: it tests whether the IP-vs-input ratio differs across conditions, which is what "differential binding" means. Naive testing of just condition (ignoring SMInput) confounds binding changes with expression changes.
The canonical differential CLIP design is to knock down the RBP and observe what binding sites are lost. Caveats:
| Issue | Implication | Mitigation | |-------|-------------|------------| | RBP KD also depletes the RBP protein in cells | siRNA/shRNA reduces RBP -> reduces IP yield -> reduces unique fragments | Normalize against SMInput WITHIN each condition; the relative IP/SMInput captures binding, not protein level | | RBP KD changes transcript abundance | mRNA stability regulators (HuR, PUM2) when knocked down change target abundance | Both IP and SMInput see the change; ratio still works | | Off-target effects of siRNA | Multiple binding profiles change | Use multiple independent siRNAs; require concordance | | KD efficiency varies | Lower KD -> smaller binding-loss signal | Validate KD by WB on the same IP lysate; > 70% protein loss target | | Rescue requires re-introducing RBP | siRNA-resistant RBP cDNA for rescue | The standard differential validation experiment |
Trigger: Whole-genome window tiling at 20 nt step; dense library (50M unique fragments); 4+ samples.
Mechanism: DEWSeq runs DESeq2 internals on millions of windows; the dispersion fit on this many features is slow.
Symptom: Runtime > 6 h; out-of-memory; "size of object exceeds vector limit".
Fix: Increase window step size to 50 nt; pre-filter windows with low counts; or restrict to expressed transcripts only. DEWSeq vignette suggests keep <- rowSums(counts(dds)) >= 30; dds <- dds[keep,] before testing.
Trigger: Windows are 50 nt; biological binding sites are 50-500 nt; user expects DEWSeq to output continuous "differential regions" but gets individual windows.
Mechanism: DEWSeq outputs per-window results; aggregating adjacent significant windows into regions is a separate step.
Symptom: Output has 10,000 individual windows; user expects 1,000 biological regions.
Fix: Use the DEWSeq utility resultsDEWSeq() then bedtools merge -d 100 on the significant-window BED. Or use the top_hits_to_bed.R script from DEWSeq examples.
Trigger: CLIPper called peaks separately per condition; treatment has peaks at sites missing in control (and vice versa).
Mechanism: Peak unification requires a consensus peakset; testing on a "treatment-only" peak underestimates evidence in control (zero reads) and produces spurious DE.
Symptom: "Treatment-specific" peaks dominate DE results; biologically implausible.
Fix: Generate consensus peakset across all conditions (bedtools merge of all per-condition peak BEDs); count reads per consensus peak across all samples; THEN run differential. The Yeo lab convention is consensus peakset across all samples.
Trigger: DESeq2 design ~ condition instead of ~ type + condition + type:condition.
Mechanism: Simple ~ condition tests whether read counts differ between treatment and control regardless of whether reads are from IP or SMInput. A condition-driven expression change in SMInput is detected as DE binding.
Symptom: DE results dominated by transcripts with global expression changes (housekeeping shifts).
Fix: Always use the interaction-term design. The biologically meaningful test is the interaction type:condition p-value.
Trigger: edgeR calcNormFactors(method='TMM') on CLIP-seq data.
Mechanism: TMM assumes most features (genes) are not differentially expressed. CLIP-seq peak counts can be globally shifted if the RBP itself is knocked down; TMM normalization would force the shift to be invisible.
Symptom: Knockdown experiment shows ~0 DE peaks; expected hundreds.
Fix: Use SMInput as the control library; spike-in normalization if available; or skip TMM and use library-size normalization only. Some CLIP-specific tools (DEWSeq, Flipper) handle this internally.
Trigger: Flipper called on CLIPper output.
Mechanism: Flipper expects Skipper's window-level output format with beta-binomial estimates.
Symptom: Flipper crashes or produces nonsense.
Fix: Use DEWSeq with htseq-clip for CLIPper-upstream workflows; use Flipper only with Skipper.
| Scenario | Tool + design | Why | |----------|---------------|-----| | KD vs control eCLIP, CLIPper upstream | DEWSeq + htseq-clip | CLIP-specific NB GLM with interaction term | | KD vs control eCLIP, Skipper upstream | Flipper | Skipper companion; matches windowing | | Treatment vs vehicle (small effect) | DEWSeq (window-level higher power) | Sliding windows capture small shifts | | Multiple time points | DEWSeq with time as covariate | Continuous design with time vector | | Allele-specific differential | BEAPR per-allele + custom logistic | See clip-seq/clip-alignment for WASP | | Single-cell CLIP differential | MASTR-seq or scCLAM | Specialized; few options | | Differential motif occupancy | Window-level + DEWSeq + motif overlap | Combine differential windows with motif BED | | RBP overexpression vs control | Same as KD reversed | Same statistical framework | | Compare two RBPs | NOT differential CLIP; use SPIDR or separate CLIPs | Different RBPs need separate IPs | | Spike-in normalization needed | DEWSeq + spike-in size factors | For global occupancy shifts | | chimeric eCLIP differential miRNA targets | Custom; treat each miRNA-target chimera as feature | Specialized; see clip-seq/ago-clip-mirna-targets |
| Pattern | Likely cause | Action |
|---------|--------------|--------|
| DEWSeq finds many DE windows; edgeR peak-level finds few | Window-level higher power for narrow shifts | Aggregate DEWSeq windows; cross-check |
| edgeR many DE; DEWSeq few | edgeR not accounting for SMInput | Re-run edgeR with interaction term |
| DE peaks dominated by expression changes | No interaction term | Use ~ type + condition + type:condition |
| KD experiment shows ~0 DE | TMM over-corrects global shift | Switch to library-size norm only; use SMInput |
| siRNA replicates discordant | Off-target effects vary | Use multiple independent siRNAs; require concordance |
| Treatment-only peaks dominate DE | No consensus peakset | Generate consensus first; then test on unified set |
| Significant windows scattered | Window aggregation step skipped | bedtools merge -d 100 on significant-window BED |
| Same gene appears in many DE windows | Multiple binding sites per gene differential | Report at gene-level too; not just window-level |
| Flipper fails with non-Skipper input | Upstream mismatch | Use DEWSeq for non-Skipper workflows |
| DESeq2 dispersion fit fails | Too few replicates (n=2 per condition); too few features after filtering | Increase replicates; or relax filtering |
Operational rule for high-confidence differential reporting: (a) Use SMInput-aware design (~ type + condition + type:condition); (b) generate consensus peakset across conditions; (c) require padj < 0.05 AND |log2FC| > 1; (d) require concordance with at least one orthogonal method (e.g., DEWSeq + edgeR peak-level on same data); (e) for KD experiments, validate KD efficiency by WB and require multiple independent siRNAs.
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| DESeq2 ~ condition finds many DE | No interaction with type | Use ~ type + condition + type:condition and test interaction |
| DEWSeq output is gene-level not window-level | keep <- ... filter too aggressive | Loosen pre-filter |
| Adjacent significant windows not aggregated | Forgot bedtools merge | bedtools merge -d 100 on sig windows |
| edgeR TMM fits all libraries to one value | Most-features-not-DE assumption violated | Use library-size norm; or DEWSeq for CLIP-specific |
| Flipper crashes on CLIPper input | Tool mismatch | Switch to DEWSeq |
| Few replicates -> unstable estimates | n=2 not enough for dispersion | Increase n; or use limma-voom (more tolerant) |
| Peaks differ across conditions | Per-condition peak calls | Unify with consensus peakset |
| KD experiment yields 0 DE | Normalization over-corrected; OR KD efficiency too low | Validate KD WB; check normalization |
| Global shift in IP relative to SMInput | RBP itself knocked down so IP yield lower | Normalize WITHIN each condition |
| Lots of "treatment-only" peaks | Caller stringency higher in one condition | Use consensus peakset for fairness |
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.