9.TF-differential-binding/SKILL.md
The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.
npx skillsauth add bisnake2001/chromskills TF-differential-bindingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables comprehensive differential TF binding analysis using DiffBind in R. DiffBind integrates read counting, normalization, and statistical modeling to identify differentially bound peaks between conditions.
To perform DiffBind differential binding analysis:
${proj_dir} in Step 0.DBA object from the sample sheet.Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.
Recommended applications include:
${sample}_TF_DB_analysis/
DBs/
DB_results.csv # DESeq2 results (log2FC, p-values)
DB_up.bed
DB_down.bed
plots/ # visualization outputs
PCA.pdf
volcano.pdf
heatmap.pdf
logs/ # analysis logs
temp/ # other temp files
Call:
mcp__project-init-tools__project_initwith:
sample: sample name (e.g. c1_vs_c2)task: TF_DBThe tool will:
${sample}_TF_DB directory.${sample}_TF_DB directory, which will be used as ${proj_dir}.Create a CSV sample sheet (samplesheet.csv) with the following columns:
| SampleID | Tissue | Factor | Condition | bamReads | Peaks | PeakCaller | |-----------|------------|------------|-----------|--------|-------------|-------------| | TF_A_1 | A | TF | Control | Control1.bam | Control1_peaks.narrowPeak | narrow | | TF_A_2 | A | TF | Control | Control2.bam | Control2_peaks.narrowPeak | narrow | | TF_B_1 | A | TF | Treated | Treated1.bam | Treated1_peaks.narrowPeak | narrow | | TF_B_2 | A | TF | Treated | Treated2.bam | Treated2_peaks.narrowPeak | narrow |
library(DiffBind)
samples <- read.csv("samplesheet.csv")
dbObj <- dba(sampleSheet=samples)
Key parameters:
sampleSheet: CSV file with BAM and peak informationCount reads overlapping consensus peaks across samples:
# Generate a consensus peakset
dbObj <- dba.count(dbObj, summits=250)
Notes:
summits: re-centers peaks ±250 bp around summits for consistency.Define conditions for comparison:
# Define experimental contrasts (e.g., Treated vs Control)
dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION, minMembers=2)
Alternatives:
DBA_TISSUE, DBA_TREATMENT, or custom metadata.dba.show(dbObj, bContrasts=TRUE)
# Perform analysis
dbObj <- dba.analyze(dbObj, method=DBA_DESEQ2)
Parameters:
method: choose DBA_DESEQ2 (default) or DBA_EDGERth: FDR threshold (default 0.05)fold: minimum log2 fold changebUsePval=TRUE: use p-values instead of FDR cutoffdba.plotHeatmap(dbObj, correlations=TRUE, scale="row")
dba.plotPCA(dbObj, attributes=DBA_CONDITION, label=DBA_ID)
# Volcano plot
allResults <- dba.report(dbObj, method=DBA_DESEQ2, th=1)
with(allResults, plot(Fold, -log10(FDR),
col=ifelse(FDR < 0.05 & abs(Fold) > 1, "red", "grey"),
pch=16, main="Volcano Plot"))
Output: heatmap.pdf Volcano.pdf PCA.pdf
Export significant differential peaks:
write.csv(as.data.frame(allResults), "DB_results.csv", row.names = FALSE)
library(rtracklayer)
# Extract results with FDR < 0.05 and |log2FC| > 1
sigSites <- dba.report(dbObj, method=DBA_DESEQ2, th=0.05, fold=1)
print("Differential binding results summary:")
print(summary(sigSites))
# get the peaks that up or down in treated condition
diff_up <- sigSites[sigSites$Fold > 0]
diff_down <- sigSites[sigSites$Fold < 0]
export(diff_up, "DB_up_${treated_condition}.bed")
export(diff_down, "DB_down_${treated_condition}.bed")
Output: DB_results.csv DB_up_${treated_condition}.bed DB_down_${treated_condition}.bed
| Problem | Possible Cause | Solution |
|----------|----------------|-----------|
| No differential peaks found | Insufficient replicates or low coverage | Increase sequencing depth or lower FDR threshold |
| Errors in sample sheet | Column names incorrect or missing | Use standard DiffBind column format |
| Inconsistent genome build | Mixed genome assemblies | Ensure all BAM and peak files use the same genome reference |
| Over-normalization | Strong batch effects | Include batch term in design or run dba.contrast(..., block=...) |
development
Align ChIP-seq or ATAC-seq FASTQ files to a reference genome using Bowtie2, with strict input validation, library layout detection, output organization and logging. Use it when raw sequencing reads must be converted into sorted/indexed BAM files before downstream QC, peak calling, or footprinting.
development
Align bisulfite sequencing DNA methylation reads using Bismark only, with explicit validation of reference preparation, library layout detection, output organization, logging, and alignment QC. Use it for WGBS, RRBS, or other bisulfite-converted DNA methylation sequencing data when raw FASTQ files must be aligned before methylation extraction and downstream analysis.
data-ai
Perform peak calling for ChIP-seq or ATAC-seq data using MACS3, with intelligent parameter detection from user feedback. Use it when you want to call peaks for ChIP-seq data or ATAC-seq data.
development
The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.