0.reads-mapping/SKILL.md
Align ChIP-seq or ATAC-seq FASTQ files to a reference genome using Bowtie2, with strict input validation, library layout detection, output organization and logging. Use it when raw sequencing reads must be converted into sorted/indexed BAM files before downstream QC, peak calling, or footprinting.
npx skillsauth add bisnake2001/chromskills reads-mappingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill performs core sequence alignment for ChIP-seq and ATAC-seq data starting from FASTQ files using Bowtie2. It is designed for autonomous execution with explicit user confirmation for biologically important parameters that must not be guessed.
Main steps include:
Accepted FASTQ naming patterns include:
${sample}.fastq.gz
${sample}.fq.gz
${sample}_R1.fastq.gz
${sample}_R2.fastq.gz
${sample}_1.fastq.gz
${sample}_2.fastq.gz
Reference resources must be supplied by the user as one of the following:
/path/to/reference.fa
/path/to/bowtie2_index_prefix
all_alignment/
aligned_bam/
${sample}.sorted.bam
${sample}.sorted.bam.bai
logs/
${sample}_alignment.log
${sample}_used_parameters.txt
temp/
All outputs must be placed under ${proj_dir} returned in Step 0.
The agent must ask for the following when missing:
The agent must not guess:
Use file and sample names only for tentative classification:
ATAC, OmniATAC, scATAC → likely ATAC-seqCTCF, MYC, H3K27ac, H3K4me3, H3K27me3 → likely ChIP-seqIf naming is ambiguous, ask the user.
Use FASTQ grouping rules:
${sample}_R1 and ${sample}_R2 or ${sample}_1 and ${sample}_2 → paired-end8 threads unless the user specifies otherwiseCreate a task directory for alignment outputs.
Suggested call:
mcp__project-init-tools__project_initwith:
sample: alltask: alignmentgenome: provided by userThe tool will return ${proj_dir}. Use it for all output placement.
If a project-init MCP tool is not available in the runtime, create this directory structure manually:
all_alignment/
aligned_bam/
qc/
logs/
temp/
Set ${proj_dir} to all_alignment.
Call:
mcp__bowtie2-tools__detect_fastq_sampleswith:
input_dir: directory containing FASTQ filesThe tool will:
Rules:
*.fastq.gz, *.fq.gz)If the user supplied a FASTA, call:
mcp__bowtie2-tools__build_bowtie2_indexwith:
reference_fasta: user-provided FASTA pathindex_prefix: desired Bowtie2 index prefixIf the user supplied an existing Bowtie2 prefix, validate it before alignment by calling:
mcp__bowtie2-tools__validate_bowtie2_indexwith:
index_prefix: user-provided index prefixStop and ask the user to correct the path if validation fails.
For each detected sample, call:
mcp__bowtie2-tools__run_bowtie2_alignmentwith:
sample_name: sample identifierfastq_r1: path to R1 FASTQ or single-end FASTQfastq_r2: path to R2 FASTQ for paired-end data, otherwise omitassay_type: chipseq or atacseqindex_prefix: validated Bowtie2 index prefixout_dir: ${proj_dir}/aligned_bamlog_dir: ${proj_dir}/logsthreads: user-specified or default 8keep_sam: false by defaultTool behavior:
keep_sam=trueExpected output:
${proj_dir}/aligned_bam/${sample}.sorted.bam
${proj_dir}/aligned_bam/${sample}.sorted.bam.bai
${proj_dir}/logs/${sample}_alignment.log
Notes:
For each sample, the agent must write:
${proj_dir}/logs/${sample}_used_parameters.txt
Example content:
Sample: ATAC_rep1
Assay type: ATAC-seq
Library layout: paired-end
Aligner: bowtie2
Reference genome build: hg38
Reference index: /refs/hg38/bowtie2/hg38
Threads: 8
Intermediate SAM kept: no
Reasoning:
- Sample name contains ATAC, so assay classified as ATAC-seq
- Paired FASTQ mates were detected automatically
- User provided hg38 Bowtie2 index
- Alignment-only workflow selected; duplicate handling and Tn5 shifting deferred to downstream preprocessing/peak-calling
Stop execution and ask the user for correction if any of the following occurs:
bowtie2 or bowtie2-build executable not found in PATHsamtools not found in PATHDo not continue to downstream QC if alignment fails.
mcp__bowtie2-tools__detect_fastq_samplesmcp__bowtie2-tools__build_bowtie2_index (only when user provides FASTA)mcp__bowtie2-tools__validate_bowtie2_indexmcp__bowtie2-tools__run_bowtie2_alignmentThe agent must ask before execution when any of the following are missing or ambiguous:
The agent must not invent these values.
development
Align bisulfite sequencing DNA methylation reads using Bismark only, with explicit validation of reference preparation, library layout detection, output organization, logging, and alignment QC. Use it for WGBS, RRBS, or other bisulfite-converted DNA methylation sequencing data when raw FASTQ files must be aligned before methylation extraction and downstream analysis.
data-ai
Perform peak calling for ChIP-seq or ATAC-seq data using MACS3, with intelligent parameter detection from user feedback. Use it when you want to call peaks for ChIP-seq data or ATAC-seq data.
devops
The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.
development
The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.