chip-seq/super-enhancers/SKILL.md
Identifies super-enhancers from H3K27ac, MED1, or BRD4 ChIP-seq using ROSE, ROSE2, LILY, HOMER -style super, and ENCODE dELS cross-referencing. Handles peak stitching parameters, ranking choices, hockey-stick inflection, marker choice (H3K27ac vs MED1/BRD4), and cross-condition comparison with spike-in normalization. Constructs core regulatory circuitry (Saint-Andre 2016) from SE-encoded TFs. Use when identifying cell-identity / cancer-associated regulatory domains, comparing super-enhancers between conditions, identifying master transcription factor networks, or predicting BET-inhibitor responsiveness.
npx skillsauth add GPTomics/bioSkills bio-chipseq-super-enhancersInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: ROSE (stjude/ROSE, 2018+), ROSE2 (linlabbcm/rose2, 2021+), LILY (BoevaLab/LILY, 2020+), HOMER 4.11+, samtools 1.19+, bedtools 2.31+, GenomicRanges 1.54+.
ROSE is unmaintained Python 2; ROSE2 is the Python 3 port with the same algorithm. LILY (Boeva 2017) is a refactored implementation with input-control background subtraction for low-quality H3K27ac data.
"Identify super-enhancers driving cell identity / cancer biology" -> Stitch nearby active enhancer peaks (H3K27ac, MED1, or BRD4) within a stitching window, exclude proximal-promoter signal, rank by total signal, find the hockey-stick inflection point where signal sharply increases, and classify all stitched regions above the inflection as super-enhancers.
python ROSE_main.py -g HG38 -i peaks.gff -r h3k27ac.bam -c input.bam -s 12500 -t 2500findPeaks tag_dir/ -style super -i input_tag_dir/The SE concept (Whyte 2013) is a thresholding heuristic on a continuous signal distribution (Pott & Lieb 2015 Nat Genet), not a categorical biological category. Functional CRISPR-tiling at SE loci (Hnisz 2017; Dukler 2017) shows only 1-3 constituent elements per SE are essential; the "SE" label is a useful operational definition for BET-inhibitor responsiveness and cell-identity gene regulation, not an absolute biological property.
| Marker | Captures | When to prefer | |--------|----------|----------------| | H3K27ac | Active regulatory elements broadly | Most widely available; standard for SE definition since Whyte 2013 | | MED1 | Mediator complex accumulation (the defining biology) | Direct readout of SE; less common antibody; lower signal-to-noise | | BRD4 | BET cofactor accumulation | Most predictive of BET-inhibitor responsiveness; clinical relevance | | H3K27ac + MED1 intersection | High-confidence SE | Gold standard if both available | | dELS from ENCODE cCREs | Cell-type-agnostic distal enhancer registry | Cross-reference; not SE-specific by itself |
Operational rule: H3K27ac for discovery; MED1 or BRD4 ChIP for functional / therapeutic claims. SE called on H3K27ac alone may not respond to BET inhibitors; SE called on BRD4 will.
| Tool | Method | Strength | Fails when |
|------|--------|----------|------------|
| ROSE (Whyte 2013) | Stitch within 12.5 kb, exclude ±2.5 kb of TSS, rank by signal, hockey-stick inflection | Original; widely cited; canonical reference | Python 2 only; unmaintained; ROSE_main.py crashes on Python 3 |
| ROSE2 (Lin 2016; linlabbcm) | Same algorithm, Python 3 port | Maintained; identical output to ROSE | None vs ROSE |
| LILY (Boeva 2017) | ROSE-like with input-control background subtraction | Works on lower-quality H3K27ac data; subtracts input | Adds complexity; less validated; specific to neuroblastoma/glioma in original paper |
| HOMER -style super | Native ROSE-like in HOMER framework; stitching without TSS exclusion | Integrated with HOMER workflow | Different stitching defaults; not directly comparable to ROSE counts |
| Custom hockey-stick (R) | Generic rank-by-signal + tangent inflection | Flexible; works on any signal definition | Reinvents algorithm; verify against ROSE on known dataset |
Most papers use ROSE/ROSE2 with default stitching (12.5 kb) and TSS exclusion (2.5 kb). This is the de facto standard for cross-paper comparison. HOMER's -style super produces different counts and is not directly comparable.
| Scenario | Recommended pipeline |
|----------|----------------------|
| Standard SE discovery, H3K27ac available | ROSE2 with default -s 12500 -t 2500; input control for subtraction |
| Predict BET-inhibitor response | BRD4 ChIP -> ROSE2 (or H3K27ac SE intersected with BRD4 peaks) |
| Compare SE between conditions (drug treatment) | ROSE2 per condition + spike-in normalization (HDACi/BETi/EZH2i need ChIP-Rx) |
| Build core regulatory circuitry | ROSE2 + Saint-Andre 2016 algorithm: identify TFs encoded by SE that bind own SE + cross-bind other SE-encoded TFs |
| Low-quality H3K27ac (low FRiP) | LILY with input subtraction |
| Compare with ENCODE dELS atlas | ROSE2 + intersect with ENCODE cCRE dELS BED |
| Differential SE between conditions | ROSE2 per condition + signal-quantitative differential (DiffBind on SE regions) |
Goal: Identify super-enhancers by stitching nearby active enhancer peaks within a stitching distance and ranking by total signal.
Approach: Convert peaks to GFF, exclude promoter-proximal peaks via -t (TSS exclusion window), stitch enhancers within -s (default 12.5 kb), rank by total H3K27ac (or MED1/BRD4) signal, find the hockey-stick inflection point, classify regions above as super-enhancers.
# Install ROSE2 (Python 3 port; unmaintained ROSE Py2 not recommended)
git clone https://github.com/linlabbcm/rose2.git
pip install ./rose2
# Convert peaks BED to GFF (ROSE requires GFF input)
awk 'BEGIN{OFS="\t"} {print $1,"peaks","enhancer",$2,$3,".",$6,".","ID="NR}' \
peaks.narrowPeak > peaks.gff
# Filter promoter peaks before SE calling (within 2.5 kb of TSS)
# ROSE handles this via -t flag; preferable to pre-filter for clarity
bedtools intersect -a peaks.narrowPeak -b promoters_2kb.bed -v > enhancer_peaks.bed
# Run ROSE2 with input control
rose2 -g HG38 -i peaks.gff \
-r h3k27ac.bam -c input.bam \
-o rose_output/ \
-s 12500 \
-t 2500
ROSE2 outputs:
*_AllEnhancers.table.txt — all stitched enhancer regions ranked by signal*_SuperEnhancers.table.txt — SE only (above hockey-stick inflection)*_Enhancers_withSuper.bed — BED with SE / TE classification*_Plot_points.png — hockey-stick plotThis is the analysis most often done wrong. SE calling thresholds depend on absolute signal, so any global shift (HDACi, BETi, EZH2i) confounds direct SE-count comparison.
Wrong approach: Call ROSE2 on condition A and condition B separately, intersect SE BEDs, report "gained/lost SE."
Right approach:
library(DiffBind)
# Union of SE BED files from condition A and B
union_se <- rtracklayer::import('union_SE.bed')
# Run DiffBind quantification on this region set with spike-in normalization
For BET-inhibitor experiments: the biology IS that all SE decrease globally; spike-in is mandatory.
The CRC algorithm identifies master TF networks from SE annotations:
# Install CRC pipeline (linlabbcm CRC)
git clone https://github.com/linlabbcm/CRC2.git
# Requires: SE BED, TF motif annotations, gene-SE mapping
python CRC2/crc.py -e SE_table.txt -g genes.gtf -b h3k27ac.bam
CRC outputs the connected components of the regulatory network. Master TFs typically appear in the largest component with high out-degree.
ENCODE distal Enhancer-Like Signatures (dELS) are the cell-type-agnostic regulatory atlas (see chip-seq/peak-annotation). Cross-referencing SE against dELS:
wget https://api.wenglab.org/screen_v13/screen_human_ccres_simple.bed.gz
gunzip screen_human_ccres_simple.bed.gz
awk -F'\t' '$NF == "dELS"' screen_human_ccres_simple.bed > dels.bed
# Fraction of SE constituents overlapping dELS
bedtools intersect -a SuperEnhancers.bed -b dels.bed -u | wc -l
bedtools intersect -a SuperEnhancers.bed -b dels.bed -wa -wb > se_with_dels.tsv
Trigger: Running ROSE_main.py on a modern system (Python 3 only).
Mechanism: ROSE is Python 2 code; print statements without parens, dict.iteritems(), etc.
Symptom: SyntaxError on first import.
Fix: Use ROSE2 (linlabbcm Python 3 port); identical algorithm and output format.
Trigger: Using default -s 12500 (12.5 kb) on small genomes or compact gene structures.
Mechanism: Default was set on human/mouse vertebrate genomes; Drosophila / yeast / plants have different regulatory architecture.
Fix: For non-vertebrate genomes, reduce stitching distance proportionally (e.g., -s 2500 for Drosophila, -s 500 for yeast).
Trigger: Default -t 2500 (exclude peaks within 2.5 kb of TSS) on promoter-proximal enhancers (e.g., pELS class).
Mechanism: TSS-proximal enhancers are filtered out; SE definition becomes distal-only.
Symptom: Lower SE counts than expected for cell types with promoter-enhancer architecture (e.g., human ES cells).
Fix: Reduce TSS exclusion to -t 500 or -t 0 if including promoter-proximal regulatory regions; document the decision.
Trigger: Calling SE on H3K27ac and claiming BET-inhibitor responsiveness.
Mechanism: H3K27ac marks active enhancers broadly; not all H3K27ac-positive SE have BRD4 accumulation.
Symptom: Predicted BET-sensitive genes don't respond to BET inhibitors in cell-based assays.
Fix: For BET-inhibitor claims, use BRD4 ChIP for SE calling, or intersect H3K27ac SE with BRD4 peaks.
Trigger: Comparing SE counts in HDACi-treated vs DMSO without spike-in normalization.
Mechanism: SE calling thresholds depend on absolute signal; HDACi globally increases H3K27ac, raising every region's signal and shifting the hockey-stick inflection.
Symptom: Reports "1000 SE in HDACi vs 500 in DMSO" when biology is just global H3K27ac increase.
Fix: Spike-in normalize BAMs (ChIP-Rx with Drosophila chromatin), call SE on scaled signal; OR quantify signal at a union peak set rather than calling SE per condition.
Trigger: Running LILY without high-quality matched input control.
Mechanism: LILY subtracts background based on input signal; mismatched input introduces artifactual negative signal.
Fix: Use LILY only when input quality is good (same library prep, same depth, same fragmentation); otherwise use ROSE2 with standard input handling.
Trigger: Calling SE on a small peak set (< 5000 enhancers).
Mechanism: Hockey-stick inflection depends on having a long "tail" of typical enhancers; few peaks distort the inflection.
Symptom: SE count is unreasonably high (50%+ of all peaks called SE) or unreasonably low (< 50 SE).
Fix: Require ≥ 5000 enhancer peaks input to ROSE2; if fewer, use absolute signal cutoff (e.g., top 5% by signal density) rather than hockey-stick.
| Pattern | Likely cause | Action | |---------|--------------|--------| | ROSE2 vs HOMER -style super differ | Different stitching distance / TSS handling | Use ROSE2 standard for cross-paper comparison; HOMER for HOMER-integrated workflows | | H3K27ac SE ≠ MED1 SE at same locus | H3K27ac is broad; MED1 marks subset of active SE | MED1 SE is the more functional definition; H3K27ac includes inactive-but-acetylated regions | | SE called in DMSO but not in BETi (or vice versa) | Global signal shift confounds threshold | Spike-in normalize; compare quantitatively at union SE set | | SE shifts location between replicates | Marginal calls below inflection; hockey-stick inflection noisy | Use top N SE by rank for robustness; or require SE in ≥ 2/3 replicates | | LILY and ROSE2 disagree on SE count | LILY's input subtraction differs | Trust ROSE2 unless input quality is poor (low FRiP) |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| SyntaxError: invalid syntax in ROSE | Python 2 codebase | Use ROSE2 (linlabbcm) Python 3 port |
| GFF format error | Wrong column ordering | Use awk template: chr<TAB>peaks<TAB>enhancer<TAB>start<TAB>end<TAB>.<TAB>strand<TAB>.<TAB>ID=N |
| ROSE2 reports 0 SE | Hockey-stick inflection failed; too few enhancers | Inspect _Plot_points.png; ≥ 5000 peaks input recommended |
| Genome flag error in ROSE2 | Genome not pre-configured | Genome flag must be one of HG18, HG19, HG38, MM8, MM9, MM10 |
| All SE at promoters | TSS exclusion too narrow OR data dominated by promoter signal | Verify -t 2500; check input is H3K27ac at enhancers not full chromatin |
| Cross-condition SE gain/loss not reproducible | No spike-in normalization | Spike-in (ChIP-Rx) or quantitative differential on union SE |
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.