chip-seq/cut-and-run-tag/SKILL.md
Analyzes CUT&RUN (Skene Henikoff 2017) and CUT&Tag (Kaya-Okur 2019) chromatin profiling data. Handles SEACR vs MACS2 peak calling (with the btaf375 2025 benchmark guidance), pA-MNase vs pA-Tn5 vs pAG-Tn5 chimera differences, E. coli spike-in carryover normalization, IgG-only control logic (no input), characteristic fragment-size signatures (25-75 bp for CUT&Tag), and lower depth requirements (5M reads typical vs 25M for ChIP). Use when calling peaks from CUT&RUN/CUT&Tag, scaling by E. coli spike-in carryover, choosing SEACR norm mode, or comparing CUT&RUN/Tag results to traditional ChIP.
npx skillsauth add GPTomics/bioSkills bio-chipseq-cut-and-run-tagInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: SEACR 1.4+, MACS2 2.2.9+, MACS3 3.0.4+, samtools 1.19+, bowtie2 2.5+, bedtools 2.31+, deepTools 3.5+, GoPeaks 1.0+, LanceOtron (pip).
"Analyze CUT&RUN or CUT&Tag chromatin profiling data" -> Use the lower-background, lower-input alternatives to traditional ChIP. CUT&RUN tethers MNase to an antibody via Protein A; CUT&Tag tethers Tn5 via Protein A/G. Both bypass cross-linking, fragmentation, and IP washes — producing 10-100× lower background, allowing 100-1000× lower cell input, and shifting the peak-calling problem from "find signal in noise" to "find signal in near-zero background."
-f BAMPE --keep-dup all, or both for consensusCUT&RUN/CUT&Tag has different QC thresholds, different peak calling defaults, different spike-in protocols, and different antibody requirements than traditional ChIP. Treating it as ChIP fails silently.
| Variant | Chimera | Year | Use case | Failure mode | |---------|---------|------|----------|--------------| | CUT&RUN (Skene Henikoff) | pA-MNase | 2017 | Native chromatin profiling; broad antibody compatibility | Native (no fixation) — gentler; MNase digest needs careful Ca²⁺ control | | CUT&Tag (Kaya-Okur Henikoff) | pA-Tn5 (rabbit only) | 2019 | Lower cell input (~5000); faster; library-ready output | Rabbit-only antibody; PCR cycles can over-amplify | | CUT&Tag-IT (Active Motif) | pA-Tn5 commercial | 2020 | Standardized lots; reproducible | Cost; vendor-locked | | pAG-Tn5 CUT&Tag | pAG-Tn5 | 2020 | Binds both rabbit AND mouse IgG | More versatile; identical performance otherwise | | AutoCut&Tag | pAG-Tn5 plate-based | 2021 | High-throughput (96-well) | Throughput at the cost of per-sample optimization | | CUTAC (CUT&Tag-then-ATAC) | pAG-Tn5 + protocol modification | 2020 | Chromatin accessibility variant of CUT&Tag | Less common; not standard CUT&Tag | | scCUT&Tag | pAG-Tn5 in droplets | 2021 | Single-cell histone mark profiling | Very sparse (~1000-5000 reads/cell) |
| Tool | Model | Strength | Fails when |
|------|-------|----------|------------|
| SEACR (Meers 2019) | Empirical threshold on signal block totals; IgG-aware "stringent" mode | Designed for sparse CUT&RUN data; "stringent + norm + IgG" is the recommended default | Wrong mode (top-X% without IgG; "non" mode if no upstream spike-in normalization); broad mark with very flat signal landscape |
| MACS2 -f BAMPE --keep-dup all | Local Poisson | Familiar; integrates well with downstream tools (DiffBind) | Default -q 0.05 may be too lenient for low-background CUT&Tag; consider -q 0.01 |
| GoPeaks (Yashar 2022) | Sliding-window thresholding | Broad-mark-oriented; faster than SEACR on broad data | Newer; smaller user base |
| LanceOtron (Hentges 2022) | CNN trained on ENCODE peaks | Parameter-free; handles both narrow and broad | Less validated for CUT&RUN/Tag specifically; web-only or pip |
| MACS2 + SEACR consensus | Intersection | Highest confidence; per 2025 btaf375 benchmark, best for cross-paper reproducibility | Most conservative; may miss true peaks at marginal regions |
2025 benchmark (Bioinformatics btaf375):
Goal: Call CUT&RUN/CUT&Tag peaks from aligned BAMs using SEACR with IgG-aware threshold.
Approach: Align with Henikoff parameters, convert BAM to fragment bedGraph via bamtobed-bedpe, then invoke SEACR with norm stringent mode and IgG control.
# 1. Align with bowtie2 (Henikoff lab standard parameters)
bowtie2 --local --very-sensitive --no-mixed --no-discordant \
--phred33 -I 10 -X 700 \
-x hg38 -1 reads_R1.fq -2 reads_R2.fq \
-S aln.sam
# 2. Convert SAM to BAM, sort, index
samtools view -bS aln.sam | samtools sort -o aln.bam
samtools index aln.bam
# 3. Generate bedGraph for SEACR (paired-end fragments)
samtools view -bS -F 0x04 aln.bam | bedtools bamtobed -bedpe -i - > aln.bedpe
awk '$1==$4 && $6-$2 < 1000 {print $0}' aln.bedpe > aln.clean.bedpe
cut -f 1,2,6 aln.clean.bedpe | sort -k1,1 -k2,2n -k3,3n > aln.fragments.bed
bedtools genomecov -bg -i aln.fragments.bed -g hg38.chrom.sizes > aln.bedgraph
# 4. Same for IgG control
# (... produce igg.bedgraph similarly ...)
# 5. SEACR with stringent + norm + IgG control (recommended default).
# Final argument is the OUTPUT PREFIX; SEACR appends ".stringent.bed" / ".relaxed.bed".
bash SEACR_1.4.sh aln.bedgraph igg.bedgraph norm stringent target_peaks
# Output file: target_peaks.stringent.bed
# Alternative: no IgG control, use top 1% of peaks
# bash SEACR_1.4.sh aln.bedgraph 0.01 non stringent target_peaks
SEACR mode selection:
norm (recommended): scales target to IgG distributionnon: use ONLY if upstream spike-in normalization was applied; otherwise use normstringent: top-half of signal blocks (recommended default)relaxed: full distribution (use only for very sparse signal)bash SEACR_1.4.sh target.bg igg.bg norm stringent out_prefix -> writes out_prefix.stringent.bedbash SEACR_1.4.sh target.bg 0.01 non stringent out_prefix -> writes out_prefix.stringent.bedCUT&RUN/CUT&Tag spike-in is "free" because the pA-MNase or pA-Tn5 carries E. coli DNA from bacterial production. Carryover is variable across batches but stable within a batch.
# Align reads to combined hg38 + E. coli genome (or sequential)
bowtie2 -x hg38_ecoli_combined -1 R1.fq -2 R2.fq -S aln.sam
# Count E. coli reads per sample
ECOLI_READS=$(samtools view -c -F 4 aln.bam ecoli_chr1)
# Scaling factor: smallest E. coli read count / per-sample E. coli reads
# Apply BEFORE peak calling for cross-condition comparison
Expected E. coli alignment fractions:
The carryover is variable between batches of bacterial production; single-experiment carryover spike-in is noisier than deliberate Drosophila spike-in (ChIP-Rx). For high-stakes cross-condition claims, add deliberate Drosophila spike-in despite the E. coli carryover. See chip-seq/spike-in-normalization.
| Metric | Traditional ChIP | CUT&RUN/CUT&Tag | |--------|------------------|------------------| | FRiP (TF) | > 0.05 | > 0.10 (often > 0.25) | | FRiP (histone) | > 0.10 | > 0.25 | | Library size requirement | 20-50M | 3-10M (often sufficient) | | Input control | Required | IgG only (no input meaningful) | | Fragment size (CUT&Tag) | Sub-nucleosomal or mono-nucleosomal | Sharp peak at 25-75 bp (Tn5 staggered cuts) | | Fragment size (CUT&RUN) | Variable | Mono- + di-nucleosomal pattern | | Duplicates | Remove (MarkDuplicates) | Keep for CUT&Tag (low PCR cycles, dups have biology) | | Spike-in alignment | Deliberate (Drosophila); 0.5-5% | Automatic E. coli; 0.5-5% | | Cell input | 1-10M | 5,000-100,000 |
Critical: --keep-dup all in MACS for CUT&Tag. PCR cycles are 12-15 (vs 5-8 for ChIP); duplicates at high-coverage TF binding sites contain biology. Standard ChIP convention --keep-dup auto will over-deduplicate CUT&Tag data.
samtools view -f 0x2 sample.bam | awk '{print $9}' | awk '$1>0' \
| sort -n | uniq -c | awk '{print $2, $1}' > frag_sizes.tsv
Expected for CUT&Tag:
Expected for CUT&RUN:
Trigger: Using non mode without prior spike-in normalization; using relaxed mode on standard CUT&Tag.
Mechanism: non assumes target is already scaled to IgG (typical for ChIP-Rx-style spike-in); on raw counts, it inflates false positives. relaxed includes the full distribution; appropriate only for sparse signal.
Fix: Default to norm stringent with IgG control. Use non only when upstream spike-in scaling has been applied.
--keep-dup auto removes biologyTrigger: Using MACS2 default dedup settings on CUT&Tag.
Mechanism: Low PCR cycles (12-15) in CUT&Tag mean PCR duplicates contain real biology at high-coverage sites; auto-dedup over-filters.
Symptom: Peak counts much lower than published for same antibody / cell line.
Fix: macs2 callpeak --keep-dup all -f BAMPE for CUT&Tag. For CUT&RUN, dedup behavior depends on PCR cycles — verify with library complexity (NRF).
Trigger: Using pA-Tn5 (Henikoff original) with mouse primary antibody.
Mechanism: Protein A binds rabbit IgG much better than mouse IgG; mouse antibodies give weak signal with pA-Tn5.
Fix: Use pAG-Tn5 (binds both); or switch to a rabbit primary antibody for the same target.
Trigger: Default 0.05% digitonin on all cell lines.
Mechanism: Optimal digitonin varies by cell type (some need 0.02%, some 0.1%); over-permeabilization releases chromatin into supernatant; under-permeabilization prevents antibody access.
Symptom: Inconsistent signal across cell lines; high IgG signal (under-permeabilized) or low target signal (over-permeabilized).
Fix: Titrate digitonin per cell line using a known-positive H3K4me3 antibody as control.
Trigger: Switching bead type between protocols without adjusting volume.
Mechanism: ConA magnetic beads (e.g., Bangs) and sepharose ConA have different binding capacities; protocols designed for one give wrong cell loading for the other.
Fix: Follow Henikoff lab protocol exactly for the chosen bead; or titrate cell number per bead volume.
Trigger: 100-150 bp reads on 25-75 bp CUT&Tag fragments.
Mechanism: Reads longer than fragments read through both adapters; downstream alignment loses the fragment.
Symptom: Many reads with adapter sequence at 3' end; alignment rate drops.
Fix: Aggressive adapter trimming with cutadapt: -e 0.1 -O 5 --minimum-length 25. Use 50 bp paired-end sequencing for CUT&Tag instead of 150 bp.
Trigger: Running MACS2 without -f BAMPE on CUT&Tag PE data.
Mechanism: MACS2 in -f BAM mode tries to model fragment size from cross-correlation; CUT&Tag fragments are 25-75 bp, not the 200 bp ChIP expects; modeling fails or produces wrong estimate.
Fix: Always use -f BAMPE for CUT&Tag; MACS uses actual fragment spans from mate pairs.
| Pattern | Likely cause | Action |
|---------|--------------|--------|
| Peak count much lower in CUT&Tag vs ChIP | Lower background reveals signal vs noise; CUT&Tag often has FEWER but cleaner peaks | Both correct; CUT&Tag specificity > sensitivity |
| Peak count much higher in CUT&Tag vs ChIP | --keep-dup all retained PCR duplicates as peaks | Verify NRF; if low, consider deduplicating with caution |
| Same antibody, different signal | Native chromatin vs cross-linked accessibility differs | Native CUT&RUN may miss DSG-dependent cofactors (BRD4); add brief fixation |
| FRiP very high (>50%) | Likely real for CUT&Tag (low background); confirm with motif enrichment | Verify motif enrichment at peaks; if missing, suspect technical artifact |
| H3K4me3 CUT&Tag peak count differs from ChIP | Expected; CUT&Tag has higher specificity | Trust CUT&Tag for sharp marks |
| H3K27me3 CUT&RUN/Tag misses regions | Broad domains require deeper sequencing; CUT&Tag was designed for sharp marks | Use CUT&RUN or traditional ChIP for very broad marks |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| SEACR "input file not bedgraph" | Wrong file format | Use bedtools genomecov -bg; not bedGraphToBigWig output |
| MACS2 modeling fails on CUT&Tag | Default -f BAM on PE | -f BAMPE |
| Peak count for mouse antibody | Used pA-Tn5 not pAG-Tn5 | Switch to pAG-Tn5 OR use rabbit primary |
| Very high adapter content in FASTQ | Read length > fragment length | Trim aggressively; consider 50 bp PE for CUT&Tag |
| Sample-to-sample carryover variability >5x | E. coli carryover variable between bacterial production batches | Add deliberate Drosophila spike-in for cross-condition |
| IgG signal as strong as target | Failed antibody / over-permeabilization | Validate antibody on positive control; titrate digitonin |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.