24.global-methylation-profile/SKILL.md
This skill performs genome-wide DNA methylation profiling. It supports single-sample and multi-sample workflows to compute methylation density distributions, genomic feature distribution of the methylation profile, and sample-level clustering/PCA. Use it when you want to systematically characterize global methylation patterns from WGBS or similar per-CpG methylation call files.
npx skillsauth add bisnake2001/chromskills global-methylation-profileInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Main steps include:
Use the global-methylation-profiling skill when you want to:
<sample.bed
global_methylation_profile/
stats/
summary_statistics.tsv
...
plots/
sample1_genomic_feature_pie.pdf
sample2_genomic_feature_pie.pdf
... # Other samples
allSamples_methylation_density_overlay.pdf
PCA_scatterplot.pdf
sample_correlation_heatmap.pdf
...
logs/
temp/ # all the temp files
library(methylKit)
# Example input: Bismark coverage files (chr, start, end, numCs, numTs, strand)
file.list <- list(
"sample1.cov",
"sample2.cov",
"sample3.cov"
)
sample.id <- list("S1", "S2", "S3")
treatment <- c(0, 1, 1) # e.g. 0 = control, 1 = treated
# Read methylation data
myobj <- methRead(
location = file.list,
sample.id = sample.id,
assembly = "hg38", # provided by user
treatment = treatment,
context = "CpG",
pipeline = list(
fraction = FALSE, # percMeth is 0–100, fraction is 0-1, depend on inputs
chr.col = 1,
start.col = 2,
end.col = 3,
strand.col = 6, # provided by user
coverage.col = 10, # provided by user
freqC.col = 11 # provided by user
)
)
# Optional filtering: remove low / extremely high coverage CpGs
filtered.myobj <- filterByCoverage(
myobj,
lo.count = 10, lo.perc = NULL,
hi.count = 99.9, hi.perc = TRUE
)
# Unite CpGs across samples (common CpG sites)
meth <- unite(filtered.myobj, destrand = TRUE)
Annotate CpGs with genomic features (promoter, exon, intron, intergenic, etc.) with genomation and summarize where CpGs (or methylated CpGs) are located for each sample
library(genomation)
library(TxDb.Hsapiens.UCSC.hg38.knownGene) # depend on user inputs
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
# exons
exons <- unlist(exonsBy(txdb))
names(exons) <- NULL
mcols(exons)$type <- "exon"
# introns
introns <- unlist(intronsByTranscript(txdb))
names(introns) <- NULL
mcols(introns)$type <- "intron"
# promoters
promoters.gr <- promoters(txdb, upstream = 2000, downstream = 200)
names(promoters.gr) <- NULL
mcols(promoters.gr)$type <- "promoter"
# TSS(1bp)
TSSes <- promoters(txdb, upstream = 1, downstream = 1)
names(TSSes) <- NULL
mcols(TSSes)$type <- "TSS"
# 3'UTR
utr3 <- unlist(threeUTRsByTranscript(txdb))
names(utr3) <- NULL
mcols(utr3)$type <- "UTR3"
# 5'UTR
utr5 <- unlist(fiveUTRsByTranscript(txdb))
names(utr5) <- NULL
mcols(utr5)$type <- "UTR5"
gene.obj <- GRangesList(
promoters = promoters.gr,
exons = exons,
introns = introns,
TSSes = TSSes
UTR5 = utr5,
UTR3 = utr3,
... # other features
)
for (i in seq_along(filtered.myobj)) {
sample_id <- filtered.myobj[[i]]@sample.id
cpg.gr <- as(filtered.myobj[[i]], "GRanges")
ann.gene <- annotateWithGeneParts(cpg.gr, gene.obj)
feature.summary <- getTargetAnnotationStats(ann.gene, percentage = TRUE)
out_tab <- as.data.frame(feature.summary)
write.table(
out_tab,
file = file.path(plot_dir, paste0(sample_id, "_feature_annotation_stats.tsv")),
sep = "\t", quote = FALSE, row.names = FALSE
)
pdf(file.path(plot_dir, paste0(sample_id, "_genomic_feature_distribution.pdf")))
plotTargetAnnotation(
ann.gene,
main = paste("Genomic feature distribution of CpGs -", sample_id)
)
dev.off()
}
Step 3: Compute & visualize genome-wide methylation density distributions
# Convert to percent methylation matrix: rows = CpGs, cols = samples
meth.mat <- percMethylation(meth) # values 0–100
df.long <- reshape2::melt(
as.data.frame(meth.mat),
variable.name = "Sample",
value.name = "Methylation"
)
ggplot(df.long, aes(x = Methylation, color = Sample)) +
geom_density() +
theme_bw() +
xlab("Percent methylation") +
ggtitle("Genome-wide methylation density across samples")
# Meth matrix: rows = CpGs, cols = samples (0–100)
meth.mat <- percMethylation(meth)
# (Optional) Filter CpGs by variability
cpg.sd <- apply(meth.mat, 1, sd, na.rm = TRUE)
keep.var <- cpg.sd > 0
meth.var <- meth.mat[keep.var, ]
if (sum(keep.var) > 10000) {
keep.idx <- order(cpg.sd[keep.var], decreasing = TRUE)[1:10000]
meth.var <- meth.var[keep.idx, ]
}
# Z-score transformation (per CpG) – helps clustering
meth.scaled <- t(scale(t(meth.var))) # rows scaled
pca <- prcomp(t(meth.scaled), center = FALSE, scale. = FALSE)
pca.df <- data.frame(
Sample = colnames(meth.scaled),
PC1 = pca$x[, 1],
PC2 = pca$x[, 2],
Treatment = factor(treatment, labels = c("Control", "Treatment"))
)
ggplot(pca.df, aes(x = PC1, y = PC2, color = Treatment, label = Sample)) +
geom_point(size = 3) +
geom_text(vjust = -1) +
theme_bw() +
ggtitle("PCA of global CpG methylation") +
xlab(paste0("PC1 (", round(summary(pca)$importance[2, 1] * 100, 1), "%)")) +
ylab(paste0("PC2 (", round(summary(pca)$importance[2, 2] * 100, 1), "%)"))
dist.samples <- dist(t(meth.scaled), method = "euclidean")
hc <- hclust(dist.samples, method = "complete")
plot(hc, main = "Hierarchical clustering of samples (methylation)",
xlab = "", sub = "")
cor.samples <- cor(meth.var, use = "pairwise.complete.obs")
pheatmap(cor.samples,
clustering_method = "complete",
main = "Sample correlation based on CpG methylation")
development
Align ChIP-seq or ATAC-seq FASTQ files to a reference genome using Bowtie2, with strict input validation, library layout detection, output organization and logging. Use it when raw sequencing reads must be converted into sorted/indexed BAM files before downstream QC, peak calling, or footprinting.
development
Align bisulfite sequencing DNA methylation reads using Bismark only, with explicit validation of reference preparation, library layout detection, output organization, logging, and alignment QC. Use it for WGBS, RRBS, or other bisulfite-converted DNA methylation sequencing data when raw FASTQ files must be aligned before methylation extraction and downstream analysis.
data-ai
Perform peak calling for ChIP-seq or ATAC-seq data using MACS3, with intelligent parameter detection from user feedback. Use it when you want to call peaks for ChIP-seq data or ATAC-seq data.
devops
The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.