12_toolBased.De-novo-motif-discovery/SKILL.md
This skill identifies novel transcription factor binding motifs in the promoter regions of genes, or directly from genomic regions of interest such as ChIP-seq peaks, ATAC-seq accessible sites, or differentially acessible regions. It employs HOMER (Hypergeometric Optimization of Motif Enrichment) to detect both known and previously uncharacterized sequence motifs enriched within the supplied genomic intervals. Use the skill when you need to uncover sequence motifs enriched or want to know which TFs might regulate the target regions.
npx skillsauth add bisnake2001/chromskills De-novo-motif-discoveryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables comprehensive de novo motif discovery using HOMER tools for genomic peak files. It discovers novel transcription factor binding motifs from genomic regions without requiring prior knowledge of motif patterns. To perform de novo motif discovery:
Use this skill when you need to uncover sequence motifs enriched in the promoter regions of a set of genes, or directly from a set of genomic regions, such as peaks from ChIP-seq or ATAC-seq, without prior assumptions about which transcription factors are involved. Typical use cases include:
Input files should be in one of the following formats:
- BED files: Standard genomic interval format
- narrowPeak: narrow peak format
- broadPeak: broad peak format
- gene list: A list of genes provided by user or generated in previous analysis. May end with .txt, .tsv, .csv, etc.
${sample}_de_novo_motif_discovery/
results/
homerResults.html # De novo motif discovery results
seq.autonorm.tsv # Sequence composition statistics
motifFindingParameters.txt # Parameters used for analysis
homerMotifs.all.motifs
homerMotifs.motifs12
homerMotifs.motifs10
homerMotifs.motifs8
nonRedundant.motifs
homerResults/
motif1.similar1.motif
motif1.info.html
motif1.logo.svg
motif1.motif
motif1.similar.html
motif1.similar2.motif
motif1.similar3.motif
motif1.similar4.motif
motif1RV.logo.svg
motif1RV.motif
# ...
logs/ # analysis logs
motif.log
Before calling any tool, ask the user:
sample): used as prefix and for the output directory ${sample}_de_novo_motif_discovery.genome): e.g. hg38, mm10, danRer11.
Call:
mcp__project-init-tools__project_initwith:
sample: the user-provided sample nametask: de_novo_motif_discoveryThe tool will:
${sample}_de_novo_motif_discovery directory.${sample}_de_novo_motif_discovery directory, which will be used as ${proj_dir}.Call:
mcp__homer-tools__check_genome_installationWith:
genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11The tool will:
This step is optional. Only perform this step if the input file is a BED file. If the input file is a gene list, skip this step.
From 1 format to chr1 format
From MT format to chrM format
Call:
mcp__file-format-tools__standardize_bed_chrom_nameswith:
input_bed: the user-provided BED fileoutput_bed: the path to save the standardized BED fileThe tool will:
Here are three options for different situations. Pick one of them based on the user's request.
Call:
mcp__homer-tools__find_motifsWith:
sample: the user-provided sample nameproj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_initinput_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.mask: mask repeat regions (default: True)threads: number of processors to use (default: 4)num_motifs: number of motifs to find (default: 25)lengths: motif lengths to search (default: 8,10,12)The tool will:
${proj_dir}/results/ directory.Call:
mcp__homer-tools__find_motifsWith:
sample: the user-provided sample nameproj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_initinput_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11background_file: the user-provided file containing background genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.mask: mask repeat regions (default: True)threads: number of processors to use (default: 4)num_motifs: number of motifs to find (default: 25)lengths: motif lengths to search (default: 8,10,12)The tool will:
${proj_dir}/results/ directory.Call:
mcp__homer-tools__find_motifsWith:
sample: the user-provided sample nameproj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_initinput_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.mask: mask repeat regions (default: True)threads: number of processors to use (default: 4)num_motifs: number of motifs to find (default: 25)lengths: motif lengths to search (default: 8,10,12)noknown: True to not use known motifsThe tool will:
${proj_dir}/results/ directory.Here are additional parameters for calling mcp__homer-tools__find_motifs tool, which are not commonly used. Add these parameters only when necessary:
cpg: Enrich for CpG islands (default: False)chopify: Chop sequences into smaller fragments (default: False)norevopp: Don't search reverse complement (default: False)rna: For RNA motif finding (default: False)bits: Set information content threshold (default: None)-mask for cleaner motif discovery-p option for parallel processingdevelopment
Align ChIP-seq or ATAC-seq FASTQ files to a reference genome using Bowtie2, with strict input validation, library layout detection, output organization and logging. Use it when raw sequencing reads must be converted into sorted/indexed BAM files before downstream QC, peak calling, or footprinting.
development
Align bisulfite sequencing DNA methylation reads using Bismark only, with explicit validation of reference preparation, library layout detection, output organization, logging, and alignment QC. Use it for WGBS, RRBS, or other bisulfite-converted DNA methylation sequencing data when raw FASTQ files must be aligned before methylation extraction and downstream analysis.
data-ai
Perform peak calling for ChIP-seq or ATAC-seq data using MACS3, with intelligent parameter detection from user feedback. Use it when you want to call peaks for ChIP-seq data or ATAC-seq data.
devops
The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.