Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

bisnake2001/De-novo-motif-discovery

Name: De-novo-motif-discovery
Author: bisnake2001

12_toolBased.De-novo-motif-discovery/SKILL.md

npx skillsauth add bisnake2001/chromskills De-novo-motif-discovery

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

HOMER De Novo Motif Discovery

Overview

This skill enables comprehensive de novo motif discovery using HOMER tools for genomic peak files. It discovers novel transcription factor binding motifs from genomic regions without requiring prior knowledge of motif patterns. To perform de novo motif discovery:

Always refer to the Inputs & Outputs section to check inputs and build the output architecture.
Genome assembly: Always returned from user feedback (hg38, mm10, hg19, mm9, etc), never determined by yourself.
Check chromosome names: Standardize chromosome names to format with "chr" (1 -> chr1, MT -> chrM).
Set analysis parameters: Region size, number of motifs, motif lengths
Run HOMER de novo motif discovery command

When to use this skill

Use this skill when you need to uncover sequence motifs enriched in the promoter regions of a set of genes, or directly from a set of genomic regions, such as peaks from ChIP-seq or ATAC-seq, without prior assumptions about which transcription factors are involved. Typical use cases include:

Performing motif enrichment analysis in promoters of a gene list provided by user or generated in previous analysis to infer potential transcription factors that might regulate the target genes.
Performing motif enrichment analysis in TF-binding sites or differential TF-binding regions provided by user or generated in previous analysis to infer potential transcription factors that might be co-factors of the target TFs.
Performing motif enrichment analysis on ATAC-seq peaks or differential accessible regions provided by user or generated in previous analysis to infer potential transcriptional regulators of accessible chromatin regions.
Exploring novel sequence patterns for the binding motif of a specific TF.

Inputs & Outputs

Inputs

Input files should be in one of the following formats: - BED files: Standard genomic interval format - narrowPeak: narrow peak format - broadPeak: broad peak format - gene list: A list of genes provided by user or generated in previous analysis. May end with .txt, .tsv, .csv, etc.

Outputs

${sample}_de_novo_motif_discovery/
    results/
        homerResults.html # De novo motif discovery results
        seq.autonorm.tsv # Sequence composition statistics
        motifFindingParameters.txt # Parameters used for analysis
        homerMotifs.all.motifs
        homerMotifs.motifs12
        homerMotifs.motifs10
        homerMotifs.motifs8
        nonRedundant.motifs

        homerResults/
            motif1.similar1.motif
            motif1.info.html
            motif1.logo.svg
            motif1.motif
            motif1.similar.html
            motif1.similar2.motif
            motif1.similar3.motif
            motif1.similar4.motif
            motif1RV.logo.svg
            motif1RV.motif
            # ...

    logs/ # analysis logs 
        motif.log

Decision Tree

Step 0 — Gather Required Information from the User

Before calling any tool, ask the user:

Sample name (sample): used as prefix and for the output directory ${sample}_de_novo_motif_discovery.
Genome assembly (genome): e.g. hg38, mm10, danRer11.
- Never guess or auto-detect.

Step 1: Initialize Project

Make director for this project:

Call:

mcp__project-init-tools__project_init

with:

sample: the user-provided sample name
task: de_novo_motif_discovery

The tool will:

Create ${sample}_de_novo_motif_discovery directory.
Get the full path of the ${sample}_de_novo_motif_discovery directory, which will be used as ${proj_dir}.

Step 2: Prepare genome file for homer

Call:

mcp__homer-tools__check_genome_installation

With:

genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11

The tool will:

Check if the genome is installed in HOMER.
If not, install the genome.

Step 3 (Optional): Standardize chromosome names for BED files

This step is optional. Only perform this step if the input file is a BED file. If the input file is a gene list, skip this step.

From 1 format to chr1 format From MT format to chrM format

Call:

mcp__file-format-tools__standardize_bed_chrom_names

with:

input_bed: the user-provided BED file
output_bed: the path to save the standardized BED file

The tool will:

Standardize the chromosome names in the BED file.
Return the path of the standardized BED file.

Step 4: De Novo Motif Discovery

Here are three options for different situations. Pick one of them based on the user's request.

De novo + known motifs
De novo + known motifs + background
De novo only

Option 1: De novo + known motifs

Call:

mcp__homer-tools__find_motifs

With:

sample: the user-provided sample name
proj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_init
input_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.
genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11
size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.
mask: mask repeat regions (default: True)
threads: number of processors to use (default: 4)
num_motifs: number of motifs to find (default: 25)
lengths: motif lengths to search (default: 8,10,12)

The tool will:

Discover motifs in the genome regions in the bed file or the promoters of the genes in the gene list. The motifs could be known motifs or de novo motifs.
Return the path of the de novo motif discovery results under ${proj_dir}/results/ directory.

Option 2: De novo + known motifs + background

Call:

mcp__homer-tools__find_motifs

With:

sample: the user-provided sample name
proj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_init
input_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.
genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11
background_file: the user-provided file containing background genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.
size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.
mask: mask repeat regions (default: True)
threads: number of processors to use (default: 4)
num_motifs: number of motifs to find (default: 25)
lengths: motif lengths to search (default: 8,10,12)

The tool will:

Discover motifs in the genome regions in the bed file or the promoters of the genes in the gene list with background genome regions or gene list provided. The motifs could be known motifs or de novo motifs.
Return the path of the de novo motif discovery results under ${proj_dir}/results/ directory.

Option 3: De novo only

Call:

mcp__homer-tools__find_motifs

With:

sample: the user-provided sample name
proj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_init
input_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.
genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11
size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.
mask: mask repeat regions (default: True)
threads: number of processors to use (default: 4)
num_motifs: number of motifs to find (default: 25)
lengths: motif lengths to search (default: 8,10,12)
noknown: True to not use known motifs

The tool will:

Discover motifs in the genome regions in the bed file or the promoters of the genes in the gene list without searching for known motif enrichment.
Return the path of the de novo motif discovery results under ${proj_dir}/results/ directory.

Here are additional parameters for calling mcp__homer-tools__find_motifs tool, which are not commonly used. Add these parameters only when necessary:

cpg: Enrich for CpG islands (default: False)
chopify: Chop sequences into smaller fragments (default: False)
norevopp: Don't search reverse complement (default: False)
rna: For RNA motif finding (default: False)
bits: Set information content threshold (default: None)

Quality Control and Best Practices

Pre-processing Steps

Filter peaks: Remove low-quality or artifact peaks
Size selection: Use appropriate region size (-size parameter)
Background selection: Choose appropriate background for enrichment analysis
Repeat masking: Use -mask for cleaner motif discovery

Parameter Optimization

Region size: Typically 200-500bp for transcription factors
Motif length: 8-12bp for most transcription factors
Number of motifs: 10-25 for initial discovery
Threads: Use available CPU cores for faster processing

Troubleshooting

Common Issues

Memory errors: Reduce region size or number of motifs
Slow performance: Use -p option for parallel processing
No motifs found: Check input file format and region size
Genome not found: Verify genome assembly name and installation

Error Handling

Ensure HOMER is properly installed and configured
Check that genome data is downloaded and accessible
Verify input file formats and chromosome naming
Ensure sufficient disk space for output files

bisnake2001/De-novo-motif-discovery

12_toolBased.De-novo-motif-discovery/SKILL.md

This skill identifies novel transcription factor binding motifs in the promoter regions of genes, or directly from genomic regions of interest such as ChIP-seq peaks, ATAC-seq accessible sites, or differentially acessible regions. It employs HOMER (Hypergeometric Optimization of Motif Enrichment) to detect both known and previously uncharacterized sequence motifs enriched within the supplied genomic intervals. Use the skill when you need to uncover sequence motifs enriched or want to know which TFs might regulate the target regions.

4 stars

tools

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add bisnake2001/chromskills De-novo-motif-discovery

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:21 PM1.9s1 file scanned

SKILL.md

name:: De-novo-motif-discovery
description:: This skill identifies novel transcription factor binding motifs in the promoter regions of genes, or directly from genomic regions of interest such as ChIP-seq peaks, ATAC-seq accessible sites, or differentially acessible regions. It employs HOMER (Hypergeometric Optimization of Motif Enrichment) to detect both known and previously uncharacterized sequence motifs enriched within the supplied genomic intervals. Use the skill when you need to uncover sequence motifs enriched or want to know which TFs might regulate the target regions.

HOMER De Novo Motif Discovery

Overview

Always refer to the Inputs & Outputs section to check inputs and build the output architecture.
Genome assembly: Always returned from user feedback (hg38, mm10, hg19, mm9, etc), never determined by yourself.
Check chromosome names: Standardize chromosome names to format with "chr" (1 -> chr1, MT -> chrM).
Set analysis parameters: Region size, number of motifs, motif lengths
Run HOMER de novo motif discovery command

When to use this skill

Performing motif enrichment analysis in promoters of a gene list provided by user or generated in previous analysis to infer potential transcription factors that might regulate the target genes.
Performing motif enrichment analysis in TF-binding sites or differential TF-binding regions provided by user or generated in previous analysis to infer potential transcription factors that might be co-factors of the target TFs.
Performing motif enrichment analysis on ATAC-seq peaks or differential accessible regions provided by user or generated in previous analysis to infer potential transcriptional regulators of accessible chromatin regions.
Exploring novel sequence patterns for the binding motif of a specific TF.

Inputs & Outputs

Inputs

Outputs

${sample}_de_novo_motif_discovery/
    results/
        homerResults.html # De novo motif discovery results
        seq.autonorm.tsv # Sequence composition statistics
        motifFindingParameters.txt # Parameters used for analysis
        homerMotifs.all.motifs
        homerMotifs.motifs12
        homerMotifs.motifs10
        homerMotifs.motifs8
        nonRedundant.motifs

        homerResults/
            motif1.similar1.motif
            motif1.info.html
            motif1.logo.svg
            motif1.motif
            motif1.similar.html
            motif1.similar2.motif
            motif1.similar3.motif
            motif1.similar4.motif
            motif1RV.logo.svg
            motif1RV.motif
            # ...

    logs/ # analysis logs 
        motif.log

Decision Tree

Step 0 — Gather Required Information from the User

Before calling any tool, ask the user:

Sample name (sample): used as prefix and for the output directory ${sample}_de_novo_motif_discovery.
Genome assembly (genome): e.g. hg38, mm10, danRer11.
- Never guess or auto-detect.

Step 1: Initialize Project

Make director for this project:

Call:

mcp__project-init-tools__project_init

with:

sample: the user-provided sample name
task: de_novo_motif_discovery

The tool will:

Create ${sample}_de_novo_motif_discovery directory.
Get the full path of the ${sample}_de_novo_motif_discovery directory, which will be used as ${proj_dir}.

Step 2: Prepare genome file for homer

Call:

mcp__homer-tools__check_genome_installation

With:

genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11

The tool will:

Check if the genome is installed in HOMER.
If not, install the genome.

Step 3 (Optional): Standardize chromosome names for BED files

This step is optional. Only perform this step if the input file is a BED file. If the input file is a gene list, skip this step.

From 1 format to chr1 format From MT format to chrM format

Call:

mcp__file-format-tools__standardize_bed_chrom_names

with:

input_bed: the user-provided BED file
output_bed: the path to save the standardized BED file

The tool will:

Standardize the chromosome names in the BED file.
Return the path of the standardized BED file.

Step 4: De Novo Motif Discovery

Here are three options for different situations. Pick one of them based on the user's request.

De novo + known motifs
De novo + known motifs + background
De novo only

Option 1: De novo + known motifs

Call:

mcp__homer-tools__find_motifs

With:

sample: the user-provided sample name
proj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_init
input_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.
genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11
size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.
mask: mask repeat regions (default: True)
threads: number of processors to use (default: 4)
num_motifs: number of motifs to find (default: 25)
lengths: motif lengths to search (default: 8,10,12)

The tool will:

Discover motifs in the genome regions in the bed file or the promoters of the genes in the gene list. The motifs could be known motifs or de novo motifs.
Return the path of the de novo motif discovery results under ${proj_dir}/results/ directory.

Option 2: De novo + known motifs + background

Call:

mcp__homer-tools__find_motifs

With:

sample: the user-provided sample name
proj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_init
input_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.
genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11
background_file: the user-provided file containing background genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.
size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.
mask: mask repeat regions (default: True)
threads: number of processors to use (default: 4)
num_motifs: number of motifs to find (default: 25)
lengths: motif lengths to search (default: 8,10,12)

The tool will:

Discover motifs in the genome regions in the bed file or the promoters of the genes in the gene list with background genome regions or gene list provided. The motifs could be known motifs or de novo motifs.
Return the path of the de novo motif discovery results under ${proj_dir}/results/ directory.

Option 3: De novo only

Call:

mcp__homer-tools__find_motifs

With:

sample: the user-provided sample name
proj_dir: directory to save the de novo motif discovery results. In this skill, it is the full path of the ${sample}_de_novo_motif_discovery directory returned by mcp__project-init-tools__project_init
input_file: the user-provided file containing genome regions or gene list. May end with .bed, .narrowPeak, .broadPeak, .txt, .tsv, .csv, etc.
genome: the user-provided genome assembly, e.g. hg38, mm10, danRer11
size: region size for motif finding for genome regions (default: 200). If the input file is a gene list, set to None.
mask: mask repeat regions (default: True)
threads: number of processors to use (default: 4)
num_motifs: number of motifs to find (default: 25)
lengths: motif lengths to search (default: 8,10,12)
noknown: True to not use known motifs

The tool will:

Discover motifs in the genome regions in the bed file or the promoters of the genes in the gene list without searching for known motif enrichment.
Return the path of the de novo motif discovery results under ${proj_dir}/results/ directory.

Here are additional parameters for calling mcp__homer-tools__find_motifs tool, which are not commonly used. Add these parameters only when necessary:

cpg: Enrich for CpG islands (default: False)
chopify: Chop sequences into smaller fragments (default: False)
norevopp: Don't search reverse complement (default: False)
rna: For RNA motif finding (default: False)
bits: Set information content threshold (default: None)

Quality Control and Best Practices

Pre-processing Steps

Filter peaks: Remove low-quality or artifact peaks
Size selection: Use appropriate region size (-size parameter)
Background selection: Choose appropriate background for enrichment analysis
Repeat masking: Use -mask for cleaner motif discovery

Parameter Optimization

Region size: Typically 200-500bp for transcription factors
Motif length: 8-12bp for most transcription factors
Number of motifs: 10-25 for initial discovery
Threads: Use available CPU cores for faster processing

Troubleshooting

Common Issues

Memory errors: Reduce region size or number of motifs
Slow performance: Use -p option for parallel processing
No motifs found: Check input file format and region size
Genome not found: Verify genome assembly name and installation

Error Handling

Ensure HOMER is properly installed and configured
Check that genome data is downloaded and accessible
Verify input file formats and chromosome naming
Ensure sufficient disk space for output files

Related Skills

bisnake2001/reads-mapping

development

VerifiedTrustedCommunity

Align ChIP-seq or ATAC-seq FASTQ files to a reference genome using Bowtie2, with strict input validation, library layout detection, output organization and logging. Use it when raw sequencing reads must be converted into sorted/indexed BAM files before downstream QC, peak calling, or footprinting.

5SKILL.mdUpdated Apr 29, 2026

bisnake2001/reads-mapping

bisnake2001/dna-methylation-alignment-bismark

development

VerifiedTrustedCommunity

Align bisulfite sequencing DNA methylation reads using Bismark only, with explicit validation of reference preparation, library layout detection, output organization, logging, and alignment QC. Use it for WGBS, RRBS, or other bisulfite-converted DNA methylation sequencing data when raw FASTQ files must be aligned before methylation extraction and downstream analysis.

5SKILL.mdUpdated Apr 26, 2026

bisnake2001/dna-methylation-alignment-bismark

bisnake2001/peak-calling

data-ai

VerifiedTrustedCommunity

Perform peak calling for ChIP-seq or ATAC-seq data using MACS3, with intelligent parameter detection from user feedback. Use it when you want to call peaks for ChIP-seq data or ATAC-seq data.

4SKILL.mdUpdated Apr 23, 2026

bisnake2001/peak-calling

bisnake2001/TF-differential-binding

devops

VerifiedTrustedCommunity

The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.

4SKILL.mdUpdated Apr 3, 2026

bisnake2001/TF-differential-binding

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/bisnake2001/chromskills.git

# Copy into Claude Code skills folder (global)
cp -r chromskills/12_toolBased.De-novo-motif-discovery ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

bisnake2001/chromskills

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT