0.dna_methyl_alignment/SKILL.md
Align bisulfite sequencing DNA methylation reads using Bismark only, with explicit validation of reference preparation, library layout detection, output organization, logging, and alignment QC. Use it for WGBS, RRBS, or other bisulfite-converted DNA methylation sequencing data when raw FASTQ files must be aligned before methylation extraction and downstream analysis.
npx skillsauth add bisnake2001/chromskills dna-methylation-alignment-bismarkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill performs bisulfite-aware sequence alignment for DNA methylation sequencing using Bismark only. It is designed for autonomous execution from FASTQ input through aligned BAM generation and basic QC, while preventing unsafe assumptions about genome build, library layout, or assay design.
Main steps include:
Accepted FASTQ naming patterns include:
${sample}.fastq.gz
${sample}.fq.gz
${sample}_R1.fastq.gz
${sample}_R2.fastq.gz
${sample}_1.fastq.gz
${sample}_2.fastq.gz
Reference resources must be supplied by the user as one of the following:
/path/to/reference_genome_folder
/path/to/reference.fa
If the user supplies a FASTA, prepare a Bismark genome folder before alignment.
all_methylation_alignment/
aligned_bam/
${sample}.sorted.bam
${sample}.sorted.bam.bai
qc/
${sample}.flagstat.txt
${sample}.idxstats.txt
logs/
${sample}_alignment.log
${sample}_used_parameters.txt
temp/
All outputs must be placed under ${proj_dir} returned in Step 0.
The agent must ask for the following when missing:
The agent must not guess:
Use file and sample names only for tentative classification:
WGBS → likely whole-genome bisulfite sequencingRRBS → likely reduced representation bisulfite sequencingmethylation, BSseq, bisulfite → ambiguousIf naming is ambiguous, ask the user.
Use FASTQ grouping rules:
${sample}_R1 and ${sample}_R2 or ${sample}_1 and ${sample}_2 → paired-end8 threads unless the user specifies otherwiseCreate a task directory for methylation alignment outputs.
Suggested call:
mcp__project-init-tools__project_initwith:
sample: alltask: methylation_alignmentgenome: provided by userThe tool will return ${proj_dir}. Use it for all output placement.
If a project-init MCP tool is not available in the runtime, create this directory structure manually:
all_methylation_alignment/
aligned_bam/
logs/
temp/
Set ${proj_dir} to all_methylation_alignment.
Call:
mcp__bismark-tools__detect_fastq_sampleswith:
input_dir: directory containing FASTQ filesThe tool will:
Rules:
*.fastq.gz, *.fq.gz)If the user supplied a FASTA and wants the agent to prepare a Bismark genome folder, call:
mcp__bismark-tools__prepare_bismark_genomewith:
reference_fasta: user-provided FASTA pathgenome_folder: destination directory for Bismark genome preparationIf the user supplied an existing Bismark genome folder, validate it before alignment by calling:
mcp__bismark-tools__validate_bismark_genomewith:
genome_folder: user-provided genome folderStop and ask the user to correct the path if validation fails.
For each detected sample, call:
mcp__bismark-tools__run_bismark_alignmentwith:
sample_name: sample identifierfastq_r1: path to R1 FASTQ or single-end FASTQfastq_r2: path to R2 FASTQ for paired-end data, otherwise omitgenome_folder: validated Bismark genome folderout_dir: ${proj_dir}/aligned_bamlog_dir: ${proj_dir}/logstemp_dir: ${proj_dir}/tempthreads: user-specified or default 8keep_intermediate_bam: false by defaultTool behavior:
keep_intermediate_bam=trueExpected output:
${proj_dir}/aligned_bam/${sample}.sorted.bam
${proj_dir}/aligned_bam/${sample}.sorted.bam.bai
${proj_dir}/logs/${sample}_alignment.log
This skill is alignment-only. Methylation extraction, deduplication policy, and cytosine report generation belong to downstream skills unless explicitly requested elsewhere.
For each sample, the agent must write:
${proj_dir}/logs/${sample}_used_parameters.txt
Example content:
Sample: WGBS_rep1
Assay type: WGBS
Library layout: paired-end
Aligner: Bismark
Reference genome build: hg38
Bismark genome folder: /refs/hg38/bismark_genome
Threads: 8
Intermediate BAM kept: no
Reasoning:
- Sample name contains WGBS, so assay classified as whole-genome bisulfite sequencing
- Paired FASTQ mates were detected automatically
- User provided hg38 Bismark genome folder
- Alignment-only workflow selected; methylation extraction deferred to downstream analysis
Stop execution and ask the user for correction if any of the following occurs:
Do not continue to downstream QC if alignment fails.
mcp__bismark-tools__detect_fastq_samplesmcp__bismark-tools__prepare_bismark_genome (only when user provides FASTA)mcp__bismark-tools__validate_bismark_genomemcp__bismark-tools__run_bismark_alignmentThe agent must ask before execution when any of the following are missing or ambiguous:
The agent must not invent these values.
development
Align ChIP-seq or ATAC-seq FASTQ files to a reference genome using Bowtie2, with strict input validation, library layout detection, output organization and logging. Use it when raw sequencing reads must be converted into sorted/indexed BAM files before downstream QC, peak calling, or footprinting.
data-ai
Perform peak calling for ChIP-seq or ATAC-seq data using MACS3, with intelligent parameter detection from user feedback. Use it when you want to call peaks for ChIP-seq data or ATAC-seq data.
devops
The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.
development
The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.