14_toolBased.known-motif-scan/SKILL.md
This skill identifies the locations of known transcription factor (TF) binding motifs within genomic regions such as ChIP-seq or ATAC-seq peaks. It utilizes HOMER to search for specific sequence motifs defined by position-specific scoring matrices (PSSMs) from known motif databases. Use this skill when you need to detect the presence and precise genomic coordinates of known TF binding motifs within experimentally defined regions such as ChIP-seq or ATAC-seq peaks.
npx skillsauth add bisnake2001/chromskills motif-scanningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables comprehensive motif scanning using HOMER tools for genomic peak files. It scans genomic regions for specific transcription factor binding motifs using position-specific scoring matrices and identifies exact motif locations. To perform motif scanning:
(1) Peak formats supported
${sample}_known_motif_scan/
results/
combined_motifs.txt # combined motif hits from all TFs
### Option 1: Scan motif in the specific genomic regions
${sample}_motif_find.txt
${sample}_motif_find.bed
### Option 2: Scan motif in the genome
${sample}.genomewide.txt
${sample}.genomewide.bed
### Option 3: Annotate peaks with motif hits
${sample}.anno_motif.txt
${sample}.motif_pos.bed (if `mbed` is True)
logs/ # analysis logs
motif_scan.log
Before calling any tool, ask the user:
sample): used as prefix and for the output directory ${sample}_known_motif_scan.genome): e.g. hg38, mm10, danRer11.
Call:
mcp__project-init-tools__project_initwith:
sample: the user-provided sample nametask: known_motif_scanThe tool will:
${sample}_known_motif_scan directory.${sample}_known_motif_scan directory, which will be used as ${proj_dir}.Call:
mcp__homer-tools__check_genome_installationWith:
genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11The tool will:
This step is optional. Only perform this step if the input file is a BED file. If the input file is a gene list, skip this step.
From 1 format to chr1 format
From MT format to chrM format
Call:
mcp__file-format-tools__standardize_bed_chrom_nameswith:
input_bed: the user-provided BED fileoutput_bed: the path to save the standardized BED fileThe tool will:
Here are two options depending on the user's request. Pick one of them based on the user's request.
If the user provides a TF name or a set of TF names instead of a motif file, locate the motif file for the TF.
Call:
mcp__homer-tools__locate_motif_fileWith:
proj_dir: directory to save the known motif scan results. In this skill, it is the full path of the ${sample}_known_motif_scan directory returned by mcp__project-init-tools__project_initTF_name: the user-provided TF name or a set of TF names separated by comma, e.g. TF1,TF2,TF3motif_type: Typically do not need to specify for model organisms. If the user provides data in "insects", "plants", "rna", "worms", "yeast", choose one as the appropriate motif type.The tool will:
If the user provides a custom motif file, use the custom motif file. If the custom motif file is in MEME format, convert it to HOMER format:
Call:
mcp__file-format-tools__meme_to_homerWith:
proj_dir: directory to save the known motif scan results. In this skill, it is the full path of the ${sample}_known_motif_scan directory returned by mcp__project-init-tools__project_initmeme_file: the user-provided MEME motif fileThe tool will:
Here are 3 options depending on the user's request. Pick one of them based on the user's request.
Call:
mcp__homer-tools__find_motifsWith:
sample: the user-provided sample nameproj_dir: directory to save the known motif scan results. In this skill, it is the full path of the ${sample}_known_motif_scan directory returned by mcp__project-init-tools__project_initinput_file: the user-provided file containing genome regions. May end with .bed, .narrowPeak, .broadPeak, etc.genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11size: region size for motif finding for genome regions, typically 200-500bp for transcription factors (default: 200). If the input file is a gene list, set to None.mask: mask repeat regions for cleaner motif analysis (default: True)threads: number of processors to use (default: 4)num_motifs: number of motifs to find (default: 25)lengths: motif lengths to search (default: 8,10,12)find: the path to the motif file. May be the motif file returned by mcp__homer-tools__locate_motif_file. This parameter must be set for this step.nomotif: True to not use de novo motif findingThe tool will:
${proj_dir}/results/ directory:
"{sample}_motif_find.txt" (To get this, find parameter must be set)Call:
mcp__homer-tools__homer_pos2bedWith:
pos_file: the path to the known motif scan results. It will be under ${proj_dir}/results/ directory, and ends with .motif.txt.The tool will:
${proj_dir}/results/ directory:
"{sample}_motif_find.bed"Call:
mcp__homer-tools__scan_motif_genome_wide
With:
sample: the user-provided sample nameproj_dir: directory to save the known motif scan results. In this skill, it is the full path of the ${sample}_known_motif_scan directory returned by mcp__project-init-tools__project_initmotif_file: the path to the motif file. May be the motif file returned by mcp__homer-tools__locate_motif_file.genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11mask: mask repeat regions for cleaner motif analysis (default: True)threads: number of processors to use (default: 4)The tool will:
${proj_dir}/results/ directory:
${sample}.genomewide.txtCall:
mcp__homer-tools__homer_pos2bedWith:
pos_file: the path to the known motif scan results. It will be under ${proj_dir}/results/ directory, and ends with .genomewide.txt.The tool will:
${proj_dir}/results/ directory:
${sample}.genomewide.bedCall:
mcp__homer-tools__annotate_peaks_motif_scan
With:
sample: the user-provided sample nameproj_dir: directory to save the known motif scan results. In this skill, it is the full path of the ${sample}_known_motif_scan directory returned by mcp__project-init-tools__project_initpeakfile: the user-provided peak file in BED format. May end with .bed, .narrowPeak, .broadPeak, etc.genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11motif_file: the path to the motif file. May be the motif file returned by mcp__homer-tools__locate_motif_file.size: region size around peak centers (default: 200)nmotifs: number of motifs to report per peak (default: None)mbed: output motif hits in BED format (default: True). If True, a .motif_pos.bed file will be created under ${proj_dir}/results/ directory.mscore: include motif scores in the output (default: False)cpu: number of processors for parallel processing (default: 1)bedgraph: output in bedGraph format (default: False)hist: include histogram output with given number of bins (default: None)The tool will:
${proj_dir}/results/ directory:
${sample}.anno_motif.txt${sample}.motif_pos.bed (if mbed is True)-cpu option for parallel processingdevelopment
Align ChIP-seq or ATAC-seq FASTQ files to a reference genome using Bowtie2, with strict input validation, library layout detection, output organization and logging. Use it when raw sequencing reads must be converted into sorted/indexed BAM files before downstream QC, peak calling, or footprinting.
development
Align bisulfite sequencing DNA methylation reads using Bismark only, with explicit validation of reference preparation, library layout detection, output organization, logging, and alignment QC. Use it for WGBS, RRBS, or other bisulfite-converted DNA methylation sequencing data when raw FASTQ files must be aligned before methylation extraction and downstream analysis.
data-ai
Perform peak calling for ChIP-seq or ATAC-seq data using MACS3, with intelligent parameter detection from user feedback. Use it when you want to call peaks for ChIP-seq data or ATAC-seq data.
devops
The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.