skills/genomics-bioinformatics/macs3-peak-calling/SKILL.md
Poisson-model peak caller for ChIP-seq/ATAC-seq BAMs. MACS3 callpeak finds enriched regions (TF sites or histone marks) vs input/IgG; outputs BED narrowPeak/broadPeak for motif analysis, annotation, and differential binding. Use narrow peaks for TF ChIP-seq and ATAC-seq; broad for H3K27me3, H3K9me3, and other broad marks.
npx skillsauth add jaechang-hits/sciagent-skills macs3-peak-callingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
MACS3 (Model-based Analysis of ChIP-seq) identifies regions of significant read enrichment (peaks) from ChIP-seq, ATAC-seq, CUT&RUN, and CUT&TAG experiments. It models the fragment length distribution from paired-end data or estimates it from mono-nucleosomal read shifting in single-end data, then applies a Poisson model to identify fold-enrichment over an input/IgG control. MACS3 produces BED-format narrowPeak (for transcription factors) or broadPeak (for histone marks) files with signal and q-value tracks for visualization in IGV or UCSC Genome Browser.
--nomodel --extsize 200 or let MACS3 estimate fragment length)--nomodel --shift -100 --extsize 200 -f BAMPE)--broad-B --SPMRmacs3 (Python ≥ 3.8)Check before installing: The tool may already be available in the current environment (e.g., inside a
pixi/condaenv). Runcommand -v macs3first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool viapixi run macs3rather than baremacs3.
# Install with pip or conda
pip install macs3
# or
conda install -c bioconda macs3
# Verify
macs3 --version
# macs3 3.0.2
# Call peaks for TF ChIP-seq (narrow peaks, with input control)
macs3 callpeak \
-t chip.bam \
-c input.bam \
-f BAM \
-g hs \
-n sample_tf \
--outdir peaks/ \
-q 0.05
# Output: peaks/sample_tf_peaks.narrowPeak
wc -l peaks/sample_tf_peaks.narrowPeak
MACS3 requires sorted, indexed BAM files from genome alignment.
# Sort and index ChIP and control BAMs (if not already done)
samtools sort -@ 8 chip_raw.bam -o chip.bam
samtools sort -@ 8 input_raw.bam -o input.bam
samtools index chip.bam
samtools index input.bam
# Check read counts
echo "ChIP reads: $(samtools view -c -F 4 chip.bam)"
echo "Input reads: $(samtools view -c -F 4 input.bam)"
Use the default mode for transcription factor binding site identification.
# TF ChIP-seq with input control
macs3 callpeak \
-t chip.bam \
-c input.bam \
-f BAM \
-g hs \
-n tf_chip \
--outdir peaks/ \
-q 0.05 \
--keep-dup auto
echo "Peaks called: $(wc -l < peaks/tf_chip_peaks.narrowPeak)"
echo "Summit file: peaks/tf_chip_summits.bed"
# Without input control (less recommended)
macs3 callpeak \
-t chip.bam \
-f BAM \
-g hs \
-n tf_noinput \
--outdir peaks/ \
--nolambda
Use --broad for spread histone modifications like H3K27me3 or H3K36me3.
# H3K27me3 broad histone mark
macs3 callpeak \
-t h3k27me3.bam \
-c input.bam \
-f BAM \
-g hs \
-n h3k27me3 \
--outdir peaks/ \
--broad \
--broad-cutoff 0.1 \
-q 0.05
echo "Broad peaks: $(wc -l < peaks/h3k27me3_peaks.broadPeak)"
# H3K4me3 (sharp mark — use narrow peaks)
macs3 callpeak \
-t h3k4me3.bam \
-c input.bam \
-f BAM \
-g hs \
-n h3k4me3 \
--outdir peaks/ \
-q 0.05
ATAC-seq requires special handling for the Tn5 insertion site.
# ATAC-seq with paired-end BAM (recommended)
macs3 callpeak \
-t atac.bam \
-f BAMPE \
-g hs \
-n atac_sample \
--outdir peaks/ \
--nomodel \
--nolambda \
-q 0.05 \
--keep-dup all
echo "ATAC peaks: $(wc -l < peaks/atac_sample_peaks.narrowPeak)"
# Single-end ATAC-seq: shift reads to center on Tn5 cut site
macs3 callpeak \
-t atac_se.bam \
-f BAM \
-g hs \
-n atac_se \
--outdir peaks/ \
--nomodel \
--shift -100 \
--extsize 200 \
--keep-dup all
Produce bedGraph and bigWig files for genome browser visualization.
# Generate bedGraph normalized to million reads (SPMR)
macs3 callpeak \
-t chip.bam \
-c input.bam \
-f BAM \
-g hs \
-n chip_track \
--outdir tracks/ \
-B \
--SPMR \
--keep-dup auto
# Convert bedGraph to bigWig for IGV/UCSC
# Requires bedGraphToBigWig and chrom.sizes
sort -k1,1 -k2,2n tracks/chip_track_treat_pileup.bdg > tracks/chip_sorted.bdg
bedGraphToBigWig tracks/chip_sorted.bdg genome/hg38.chrom.sizes tracks/chip.bw
echo "BigWig track: tracks/chip.bw"
Parse narrowPeak output and annotate peaks to genomic features.
import pandas as pd
# Load narrowPeak file
# Columns: chrom, start, end, name, score, strand, signalValue, pValue, qValue, peak
cols = ["chrom", "start", "end", "name", "score", "strand",
"signalValue", "pValue", "qValue", "peak"]
peaks = pd.read_csv("peaks/tf_chip_peaks.narrowPeak", sep="\t",
header=None, names=cols)
print(f"Total peaks: {len(peaks)}")
print(f"Peaks on chr1: {(peaks['chrom'] == 'chr1').sum()}")
print(f"Median peak width: {(peaks['end'] - peaks['start']).median():.0f} bp")
print(f"Peaks with q-value < 0.01: {(peaks['qValue'] > 2).sum()}") # -log10(q) > 2
# Filter high-confidence peaks
high_conf = peaks[peaks["qValue"] > 2].copy() # q < 0.01
high_conf["width"] = high_conf["end"] - high_conf["start"]
print(f"\nHigh-confidence peaks: {len(high_conf)}")
high_conf.to_csv("high_confidence_peaks.bed", sep="\t", index=False, header=False,
columns=["chrom", "start", "end", "name", "score", "strand"])
| Parameter | Default | Range/Options | Effect |
|-----------|---------|---------------|--------|
| -t / --treatment | required | BAM/BED/SAM | ChIP or ATAC treatment file |
| -c / --control | — | BAM/BED/SAM | Input/IgG control; omit --nolambda if absent |
| -g / --gsize | required | hs, mm, ce, dm, or integer | Effective genome size; hs=2.7e9 (human), mm=1.87e9 (mouse) |
| -q / --qvalue | 0.05 | 0–1 | FDR threshold for peak calling |
| -p / --pvalue | — | 0–1 | P-value cutoff (use instead of q-value for strict control) |
| --broad | off | flag | Call broad peaks for diffuse histone marks |
| --broad-cutoff | 0.1 | 0–1 | Q-value cutoff for broad region merging |
| --nomodel | off | flag | Skip fragment length modeling; required for ATAC-seq |
| --extsize | 200 | 50–1000 | Fragment extension size when --nomodel is set |
| --shift | 0 | -500–500 | Read shift in bp; use -100 with --extsize 200 for ATAC-seq |
| --keep-dup | 1 | auto, all, integer | Duplicate handling; auto uses Poisson model, all keeps all (ATAC-seq) |
| -B / --bdg | off | flag | Write bedGraph signal tracks |
| --SPMR | off | flag | Normalize bedGraph to signal per million reads |
#!/bin/bash
# Call peaks for multiple ChIP-seq samples with the same input
INPUT="input.bam"
GENOME="hs"
OUTDIR="peaks"
mkdir -p "$OUTDIR"
SAMPLES=(H3K4me3 H3K27ac H3K27me3 CTCF)
MODES=(narrow narrow broad narrow)
for i in "${!SAMPLES[@]}"; do
sample="${SAMPLES[$i]}"
mode="${MODES[$i]}"
echo "Calling peaks: $sample ($mode)"
if [ "$mode" == "broad" ]; then
BROAD_FLAG="--broad --broad-cutoff 0.1"
else
BROAD_FLAG=""
fi
macs3 callpeak \
-t "${sample}.bam" \
-c "$INPUT" \
-f BAM \
-g "$GENOME" \
-n "$sample" \
--outdir "$OUTDIR" \
$BROAD_FLAG \
-q 0.05 \
--keep-dup auto
echo "$sample: $(wc -l < $OUTDIR/${sample}_peaks.*Peak) peaks"
done
# Call peaks on individual replicates (lenient thresholds for IDR)
for rep in rep1 rep2; do
macs3 callpeak \
-t "chip_${rep}.bam" \
-c input.bam \
-f BAM \
-g hs \
-n "tf_${rep}" \
--outdir peaks/ \
-p 0.1 \
--keep-dup auto
done
# Run IDR to find reproducible peaks
# pip install idr
idr --samples peaks/tf_rep1_peaks.narrowPeak peaks/tf_rep2_peaks.narrowPeak \
--input-file-type narrowPeak \
--output-file peaks/tf_idr_peaks.txt \
--idr-threshold 0.05 \
--plot
echo "IDR peaks: $(wc -l < peaks/tf_idr_peaks.txt)"
| Output | Format | Description |
|--------|--------|-------------|
| *_peaks.narrowPeak | BED6+4 | Narrow peaks with signal, p-value, q-value, summit offset |
| *_peaks.broadPeak | BED6+3 | Broad peaks (when --broad): chrom, start, end, signal, p-val, q-val |
| *_summits.bed | BED3+2 | Peak summit positions (1 bp) with score; use for motif analysis |
| *_treat_pileup.bdg | bedGraph | Treatment signal track (when -B) |
| *_control_lambda.bdg | bedGraph | Control/local lambda track (when -B) |
| *_model.r | R script | Fragment size model; run Rscript *_model.r to plot |
| Problem | Cause | Solution |
|---------|-------|----------|
| Very few peaks called | Stringent q-value or low read depth | Relax to -p 1e-3; check sequencing depth (≥10M aligned reads recommended) |
| Too many peaks (>100k) | Threshold too loose or no input control | Add --control input.bam; use -q 0.01; filter on signalValue |
| Peak calling fails with "no reads" | BAM file is not sorted or indexed | Run samtools sort and samtools index before MACS3 |
| ATAC-seq peaks in mitochondria | High mtDNA content | Filter: samtools view -h chip.bam | grep -v chrM | samtools view -bS > filtered.bam |
| Fragment model fails | Too few reads or unusual read length | Add --nomodel --extsize 200 to skip modeling |
| bedGraph output very large | High coverage data without normalization | Add --SPMR to normalize to signal per million reads |
| --broad misses narrow peaks | Signal is actually sharp | Check ChIP target: TFs and H3K4me3 need narrow mode |
| gsize mismatch | Using wrong genome size for assembly | Use hs for hg19/hg38, mm for mm9/mm10; or provide exact integer |
tools
Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.
tools
smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock-blind-docking; for the Python-bindings/Vinardo workflow use autodock-vina-docking.
development
mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.
development
Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.