Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

sahuno/snakemake

Name: snakemake
Author: sahuno

claude/skills/snakemake/SKILL.md

npx skillsauth add sahuno/llm_configs snakemake

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Snakemake Workflow Skill

Build production-grade Snakemake 9 workflows on SLURM HPC with reproducible run organization, container integration, and battle-tested pitfall avoidance.

When to Use This Skill

Use when the user needs:

A new Snakemake workflow or additional rules for an existing one
Debugging a Snakemake/SLURM error (check references/debug_patterns.md)
A SLURM profile for Snakemake (check references/slurm_profiles.md)
To convert ad-hoc scripts into a reproducible pipeline

Don't use when the task has <3 steps, no per-sample parallelism, and no dependency DAG. A bash script is simpler and Snakemake overhead isn't free.

Core Architecture

1. One Rule = One Tool

Each rule wraps exactly one tool or operation. Rules compose vertically via input: dependencies. Optional rule blocks are gated by config booleans (if USE_FEATURE:). Adding a rule should never break existing rules.

2. Workflow vs Results Separation

The workflow is a tool; each run is an experiment. Never write outputs into the workflow directory.

Workflow directory (versioned, reusable):

workflows/{workflow_name}/
├── Snakefile
├── config_template.yaml
├── scripts/                    # Pluggable scripts with argparse CLI
├── profiles/slurm/config.yaml  # Workflow-specific SLURM profile
├── test/                       # Test fixtures (<5 min on cpushort)
│   ├── test_config.yaml
│   ├── test_manifest.tsv
│   └── test_regions.bed
└── CHANGELOG.md

Run root (one directory = one experiment):

{output_dir}/
├── config.yaml              # COPY of config (frozen at run start)
├── run_snakemake.sh          # Exact reproduction command
├── manifest.tsv              # COPY of sample sheet
├── run_metadata.yaml         # Auto-generated (date, versions, samples)
├── results/{rule_name}/{sample}/   # ALL rule outputs
├── benchmarks/               # benchmark: directive outputs
├── qc/                       # QC gate sentinel files
└── logs/                     # ALL rule logs

3. Single `output_dir` Config Key

All subdirectories derived internally — never add separate config keys for logs, figures, or matrices:

OUTDIR     = config["output_dir"]
RESULTSDIR = os.path.join(OUTDIR, "results")
LOGDIR     = os.path.join(OUTDIR, "logs")
BENCHDIR   = os.path.join(OUTDIR, "benchmarks")
QCDIR      = os.path.join(OUTDIR, "qc")

4. Externalize Complex Logic

Shell one-liners are fine inline (awk, bgzip && tabix). Externalize when the shell block needs if/for/while, variable manipulation, or multi-step Python. Rule of thumb: if you can't understand it in 10 seconds, externalize it.

Scripts go under workflow_dir/scripts/ with argparse CLI
Reference via os.path.join(workflow.basedir, "scripts", "script.py")
Never inline complex Python in run: blocks for SLURM-submitted rules — run: executes in the coordinator process, not on the compute node

5. Validate Before Every Submission

After any Snakefile edit:

snakemake --lint — catches style issues
snakemake -n — dry-run validates the full DAG
snakemake --dag | dot -Tpdf > dag.pdf — visualize dependencies

Dry-run is the minimum test. A small-data end-to-end test is preferred.

6. Built-in Resource Management

| Feature | When to Use | |---------|-------------| | benchmark: | Every compute-heavy rule — informs production resource allocation | | temp() | Intermediate files (auto-deleted after downstream rules complete) | | protected() | Expensive final outputs (prevents accidental deletion) | | retries: 2 | External tool rules (transient SLURM failures) | | retries: 0 | Python scripts and QC gates (fail fast on bugs/data issues) |

7. QC Gates as Workflow Rules

QC checks are Snakemake rules, not informal post-hoc steps. Pattern: QC rule produces a .pass sentinel; downstream rules depend on it.

rule qc_alignment:
    """Fail if mapping rate < 80%."""
    input:
        flagstat = os.path.join(RESULTSDIR, "alignment", "{sample}", "{sample}.flagstat"),
    output:
        qc_pass = os.path.join(QCDIR, "alignment_{sample}.pass"),
    run:
        import re
        with open(input.flagstat) as f:
            text = f.read()
        mapped_pct = float(re.search(r"(\d+\.\d+)% mapped", text).group(1))
        if mapped_pct < 80.0:
            raise ValueError(f"QC FAIL: {wildcards.sample} mapping rate {mapped_pct}% < 80%")
        with open(output.qc_pass, "w") as f:
            f.write(f"PASS: mapping_rate={mapped_pct}%\n")

Gate after: alignment, pileup, DMR calling. Don't gate on soft thresholds — log and report those instead.

8. Config is the Run Manifest

All cohort-specific details (paths, regex patterns, sample ID formats, exclusion keywords) are config args, never hardcoded. This enables reuse across cohorts without code changes. Config documents itself with comments.

9. Container Discipline

Never use :latest tags — pin exact versions (onttools_v3.9.sif)
Every rule needing container packages must have singularity: IMG
Load container paths from softwares_containers_config.yaml — never guess
If a rule has singularity: directive, do NOT add singularity exec in shell:

10. Test Suite

Every workflow ships with test/ containing test_config.yaml, test_manifest.tsv, and test_regions.bed. Tests must complete in <5 minutes on cpushort using slurmMinimal profile.

Snakemake 9 + SLURM Critical Pitfalls

These are battle-tested fixes. Memorize them — they cause the most debugging time:

| Pitfall | Symptom | Fix | |---------|---------|-----| | Built-in mem_mb: 1000 | SLURM_MEM_PER_NODE vs SLURM_MEM_PER_CPU fatal | Add mem_mb: 0 to default-resources | | mem: in profile | Same fatal conflict | Never use mem: — use mem_mb_per_cpu | | Missing slurm_account | Silent job rejection | Always set slurm_account in default-resources | | Coordinator uses --mem=XG | Propagates to child jobs via --export=ALL | Use --mem-per-cpu; add unset SLURM_MEM_PER_NODE | | --singularity-args (with --) | Key not recognized | Use singularity-args: (no -- prefix) | | run: block on SLURM | Executes in coordinator, not compute node | Use shell: + script for heavy work | | Rule missing singularity: | ModuleNotFoundError on compute | Add singularity: IMG to every rule needing packages | | Stale lock | Directory cannot be locked | snakemake --unlock then --rerun-incomplete | | --profile with --directory | Profile not found | Always use absolute path for --profile | | sacctmgr: not found | Login node missing SLURM CLI | Submit coordinator as SLURM batch job |

For the full pitfall table and debug patterns, read references/debug_patterns.md.

Reference Files

Read these when you need detailed templates or troubleshooting:

| File | When to Read | |------|-------------| | references/snakefile_template.md | Creating a new Snakefile — full template with all conventions | | references/slurm_profiles.md | Writing or debugging SLURM profiles (3 profile tiers) | | references/config_template.md | Creating config files and run scripts | | references/script_interface.md | Writing pluggable Python scripts for scripts/ | | references/debug_patterns.md | Diagnosing Snakemake/SLURM errors | | references/completion_checklist.md | Before declaring a workflow complete |

Retry Guidance by Rule Type

| Rule Type | Retries | Reason | |-----------|---------|--------| | External tool (modkit, samtools, STAR) | 2 | SLURM nodes can timeout or OOM transiently | | Python script (aggregate, tensor, plot) | 0 | Code bugs should fail immediately | | File conversion (awk, bgzip, tabix) | 1 | Rare NFS issues | | QC gate rules | 0 | QC failures are data issues, not transient |

Incremental Sample Addition

Adding samples to sample_manifest triggers only new per-sample rules (Snakemake checks output files). Cohort-level rules re-run because the expand list changed.

Use the same output_dir when adding samples to an existing cohort
Force cohort re-run only: snakemake --forcerun build_cohort_matrices
If output_dir changes, everything re-runs (no prior outputs)

Workflow Completion Checklist

Before declaring any workflow complete, run through the checklist in references/completion_checklist.md. The critical items:

Dry-run succeeds (snakemake -n)
Profile has mem_mb: 0 and slurm_account in default-resources
All container-dependent rules have singularity: IMG
Compute-heavy rules have benchmark: directive
unset SLURM_MEM_PER_NODE in run script
No hardcoded absolute paths in Snakefile
Test suite exists and passes

Validated Reference Workflow

The ont_modkit_pileup workflow serves as the reference implementation:

5 conditional rules: convert_bed, pileup, aggregate, cohort_matrices, tensor, correlation
Key patterns: --include-bed for single-pass pileup, chr prefix auto-detection, BAM index auto-detection (.bai, .bam.csi, .csi)
run: for lightweight cohort steps; shell: + singularity: for compute
Located at: workflows/ont_modkit_pileup/

sahuno/snakemake

claude/skills/snakemake/SKILL.md

Expert Snakemake workflow engineer for bioinformatics pipelines on SLURM HPC. Specializes in creating, debugging, and running Snakemake 9 workflows with battle-tested SLURM profiles, proper container integration, and reproducible run organization. Use this skill proactively whenever the user asks to: create/write/build a Snakemake workflow or pipeline, debug a Snakemake error or failed SLURM job, add rules to an existing Snakefile, write or fix a SLURM profile for Snakemake, organize pipeline outputs or run directories, convert a shell script or ad-hoc analysis into a reproducible Snakemake workflow, or troubleshoot Snakemake 9 + SLURM executor issues (memory conflicts, container propagation, stale locks). Also trigger when the user mentions snakemake dry-run, snakemake DAG, snakemake profile, workflow-profile, SLURM executor plugin, modkit pileup pipeline, or any multi-sample bioinformatics pipeline that needs per-sample parallelism with a dependency DAG. Do NOT trigger for: tasks with <3 steps and no parallelism (bash script is better), pure Nextflow workflows, or one-off data exploration.

tools

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add sahuno/llm_configs snakemake

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 7:12 AM4.4s21 files scanned

SKILL.md

name:: snakemake
description:: |
Do NOT trigger for:: tasks with <3 steps and no parallelism (bash script is better),
version:: 1.0.0
author:: Samuel Ahuno ([email protected])

Snakemake Workflow Skill

Build production-grade Snakemake 9 workflows on SLURM HPC with reproducible run organization, container integration, and battle-tested pitfall avoidance.

When to Use This Skill

Use when the user needs:

A new Snakemake workflow or additional rules for an existing one
Debugging a Snakemake/SLURM error (check references/debug_patterns.md)
A SLURM profile for Snakemake (check references/slurm_profiles.md)
To convert ad-hoc scripts into a reproducible pipeline

Don't use when the task has <3 steps, no per-sample parallelism, and no dependency DAG. A bash script is simpler and Snakemake overhead isn't free.

Core Architecture

1. One Rule = One Tool

2. Workflow vs Results Separation

The workflow is a tool; each run is an experiment. Never write outputs into the workflow directory.

Workflow directory (versioned, reusable):

workflows/{workflow_name}/
├── Snakefile
├── config_template.yaml
├── scripts/                    # Pluggable scripts with argparse CLI
├── profiles/slurm/config.yaml  # Workflow-specific SLURM profile
├── test/                       # Test fixtures (<5 min on cpushort)
│   ├── test_config.yaml
│   ├── test_manifest.tsv
│   └── test_regions.bed
└── CHANGELOG.md

Run root (one directory = one experiment):

{output_dir}/
├── config.yaml              # COPY of config (frozen at run start)
├── run_snakemake.sh          # Exact reproduction command
├── manifest.tsv              # COPY of sample sheet
├── run_metadata.yaml         # Auto-generated (date, versions, samples)
├── results/{rule_name}/{sample}/   # ALL rule outputs
├── benchmarks/               # benchmark: directive outputs
├── qc/                       # QC gate sentinel files
└── logs/                     # ALL rule logs

3. Single `output_dir` Config Key

All subdirectories derived internally — never add separate config keys for logs, figures, or matrices:

OUTDIR     = config["output_dir"]
RESULTSDIR = os.path.join(OUTDIR, "results")
LOGDIR     = os.path.join(OUTDIR, "logs")
BENCHDIR   = os.path.join(OUTDIR, "benchmarks")
QCDIR      = os.path.join(OUTDIR, "qc")

4. Externalize Complex Logic

Scripts go under workflow_dir/scripts/ with argparse CLI
Reference via os.path.join(workflow.basedir, "scripts", "script.py")
Never inline complex Python in run: blocks for SLURM-submitted rules — run: executes in the coordinator process, not on the compute node

5. Validate Before Every Submission

After any Snakefile edit:

snakemake --lint — catches style issues
snakemake -n — dry-run validates the full DAG
snakemake --dag | dot -Tpdf > dag.pdf — visualize dependencies

Dry-run is the minimum test. A small-data end-to-end test is preferred.

6. Built-in Resource Management

7. QC Gates as Workflow Rules

QC checks are Snakemake rules, not informal post-hoc steps. Pattern: QC rule produces a .pass sentinel; downstream rules depend on it.

rule qc_alignment:
    """Fail if mapping rate < 80%."""
    input:
        flagstat = os.path.join(RESULTSDIR, "alignment", "{sample}", "{sample}.flagstat"),
    output:
        qc_pass = os.path.join(QCDIR, "alignment_{sample}.pass"),
    run:
        import re
        with open(input.flagstat) as f:
            text = f.read()
        mapped_pct = float(re.search(r"(\d+\.\d+)% mapped", text).group(1))
        if mapped_pct < 80.0:
            raise ValueError(f"QC FAIL: {wildcards.sample} mapping rate {mapped_pct}% < 80%")
        with open(output.qc_pass, "w") as f:
            f.write(f"PASS: mapping_rate={mapped_pct}%\n")

Gate after: alignment, pileup, DMR calling. Don't gate on soft thresholds — log and report those instead.

8. Config is the Run Manifest

9. Container Discipline

Never use :latest tags — pin exact versions (onttools_v3.9.sif)
Every rule needing container packages must have singularity: IMG
Load container paths from softwares_containers_config.yaml — never guess
If a rule has singularity: directive, do NOT add singularity exec in shell:

10. Test Suite

Every workflow ships with test/ containing test_config.yaml, test_manifest.tsv, and test_regions.bed. Tests must complete in <5 minutes on cpushort using slurmMinimal profile.

Snakemake 9 + SLURM Critical Pitfalls

These are battle-tested fixes. Memorize them — they cause the most debugging time:

For the full pitfall table and debug patterns, read references/debug_patterns.md.

Reference Files

Read these when you need detailed templates or troubleshooting:

Retry Guidance by Rule Type

Incremental Sample Addition

Adding samples to sample_manifest triggers only new per-sample rules (Snakemake checks output files). Cohort-level rules re-run because the expand list changed.

Use the same output_dir when adding samples to an existing cohort
Force cohort re-run only: snakemake --forcerun build_cohort_matrices
If output_dir changes, everything re-runs (no prior outputs)

Workflow Completion Checklist

Before declaring any workflow complete, run through the checklist in references/completion_checklist.md. The critical items:

Dry-run succeeds (snakemake -n)
Profile has mem_mb: 0 and slurm_account in default-resources
All container-dependent rules have singularity: IMG
Compute-heavy rules have benchmark: directive
unset SLURM_MEM_PER_NODE in run script
No hardcoded absolute paths in Snakefile
Test suite exists and passes

Validated Reference Workflow

The ont_modkit_pileup workflow serves as the reference implementation:

5 conditional rules: convert_bed, pileup, aggregate, cohort_matrices, tensor, correlation
Key patterns: --include-bed for single-pass pileup, chr prefix auto-detection, BAM index auto-detection (.bai, .bam.csi, .csi)
run: for lightweight cohort steps; shell: + singularity: for compute
Located at: workflows/ont_modkit_pileup/

Related Skills

sahuno/scatter-gather

development

VerifiedTrustedCommunity

Decide whether and how to scatter genomics workloads across chromosomes or region tiles, then gather the per-shard outputs back together correctly. Use proactively whenever the user mentions parallelizing per-chromosome, sharding by chrom, tiling the genome, splitting a BAM/VCF/BED by region, merging per-chrom outputs, or has a workflow with obvious per-chromosome parallelism (variant calling, methylation pileup/DMR, coverage, liftover, peak calling, SV calling). Also triggers on /scatter-gather, "scatter X across chromosomes", "shard this", "chunked variant calling", "merge per-chrom VCFs", "gather these bedmethyl files", "concat these bigwigs", or any per-region parallelism question. **Trigger even when the user is also using Snakemake or Nextflow** — those skills handle DAG plumbing while this one defines *what* to scatter, *whether* it's even safe to scatter (some computations like DSS DMLtest pool globally and break under naive sharding), and *how* to gather each output format without silent corruption. Especially trigger on questions about merging per-chromosome BAM / VCF / BED / bedMethyl / bigwig outputs, or whether a scatter-gather is equivalent to running on the whole genome.

SKILL.mdUpdated May 7, 2026

sahuno/scatter-gather

sahuno/igv-reports

tools

VerifiedTrustedCommunity

Build self-contained, offline HTML genomic-region reports with igv-reports (create_report). Each HTML bundles igv.js viewers per region with embedded BAM/VCF data slices and default tracks (CpG islands, gencode, RepeatMasker); a reviewer clicks the variant table to inspect read-level evidence with no internet, no server, no IGV install. USE this skill whenever the user wants an HTML, clickable, or browseable viewer of genomic data — phrases like "HTML IGV report", "offline IGV", "self-contained HTML", "clickable viewer", "create_report", "igv-reports", "email this viewer", or any browseable HTML of reads at variants, fusion breakpoints, SV junctions, viral integrations, ChIP peaks, or ROIs. Trigger even when the user doesn't say "igv-reports" — giveaway is HTML/clickable/offline plus genomic regions. Also fire on /igv-reports. DO NOT use for static PNG/PDF/SVG IGV screenshots — use the igv-screenshots skill. Supports hg38, mm10, mm39, T2T. Defaults: --flanking 300, --standalone, genome-tagged output.

SKILL.mdUpdated May 7, 2026

sahuno/chimeric-read-validation

development

VerifiedTrustedCommunity

Verify that structural-variant / breakpoint calls are actually real by checking the chimeric reads that support them. Use whenever the user has caller output (Severus, Manta, Sniffles2, Delly, GRIDSS, MELT, Arriba, SvABA) and wants to validate / audit / QC / double-check their calls — viral integrations (HTLV-1, HBV, HPV, EBV), gene fusions (BCR-ABL, IGH translocations), mobile element insertions (L1, Alu, SVA), translocations. Trigger on phrasings like "is this integration real?", "should I trust this fusion call?", "are these false positives?", "are these PASS calls actually supported by reads?", "QC my SV calls", or any per-call chimeric-read / contamination / bimodality / T-vs-N read overlap question. Also fires on BAM @PG -Y / SA-tag questions on chimeric BAMs, and on /chimeric-read-validation. Output is a per-call TSV with pass / needs_review / fail verdicts. Do not use for calling SVs (use the caller), IGV screenshots (use igv-reports), or RNA-level fusion FDR (use Arriba).

SKILL.mdUpdated May 7, 2026

sahuno/chimeric-read-validation

sahuno/runtime-resource-study

tools

VerifiedTrustedCommunity

Run a stage-gated runtime/resource optimization study for any bioinformatics tool or command-line program on a SLURM HPC cluster. Walks through preflight, OFAT factor scan, 2^k confirmation factorial, build-mode + alternative-implementation comparison, input-size scan, out-of-sample validation, and produces a fitted predictive resource model (wall_s and peak_rss as functions of input size), a machine-readable model.yaml with caveats, a full REPORT.md, and a one-page exec summary PDF. Trigger PROACTIVELY whenever the user asks to "benchmark", "optimize", "tune", "characterize runtime/memory", "find best config", "build a resource model", "how does X scale", or "what should I put in my Snakemake resources directive for tool Y" — for any compute-bound bioinformatics step (sort, dedup, alignment, variant calling, methylation calling, basecalling, indexing, pileup, liftover). Also triggers on /runtime-resource-study or /benchmark-tool. Skip only for one-off quick timing where a single number suffices and no model is needed.

SKILL.mdUpdated Apr 30, 2026

sahuno/runtime-resource-study

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/sahuno/llm_configs.git

# Copy into Claude Code skills folder (global)
cp -r llm_configs/claude/skills/snakemake ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

sahuno/llm_configs

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

sahuno/snakemake

$ install --global

Security Scan Results

SKILL.md

Snakemake Workflow Skill

When to Use This Skill

Core Architecture

1. One Rule = One Tool

2. Workflow vs Results Separation

3. Single output_dir Config Key

4. Externalize Complex Logic

5. Validate Before Every Submission

6. Built-in Resource Management

7. QC Gates as Workflow Rules

8. Config is the Run Manifest

9. Container Discipline

10. Test Suite

Snakemake 9 + SLURM Critical Pitfalls

Reference Files

Retry Guidance by Rule Type

Incremental Sample Addition

Workflow Completion Checklist

Validated Reference Workflow

Related Skills

sahuno/scatter-gather

sahuno/igv-reports

sahuno/chimeric-read-validation

sahuno/runtime-resource-study

sahuno/snakemake

$ install --global

Security Scan Results

SKILL.md

Snakemake Workflow Skill

When to Use This Skill

Core Architecture

1. One Rule = One Tool

2. Workflow vs Results Separation

3. Single output_dir Config Key

4. Externalize Complex Logic

5. Validate Before Every Submission

6. Built-in Resource Management

7. QC Gates as Workflow Rules

8. Config is the Run Manifest

9. Container Discipline

10. Test Suite

Snakemake 9 + SLURM Critical Pitfalls

Reference Files

Retry Guidance by Rule Type

Incremental Sample Addition

Workflow Completion Checklist

Validated Reference Workflow

Related Skills

sahuno/scatter-gather

sahuno/igv-reports

sahuno/chimeric-read-validation

sahuno/runtime-resource-study

3. Single `output_dir` Config Key

3. Single `output_dir` Config Key