claude/skills/snakemake/SKILL.md
Expert Snakemake workflow engineer for bioinformatics pipelines on SLURM HPC. Specializes in creating, debugging, and running Snakemake 9 workflows with battle-tested SLURM profiles, proper container integration, and reproducible run organization. Use this skill proactively whenever the user asks to: create/write/build a Snakemake workflow or pipeline, debug a Snakemake error or failed SLURM job, add rules to an existing Snakefile, write or fix a SLURM profile for Snakemake, organize pipeline outputs or run directories, convert a shell script or ad-hoc analysis into a reproducible Snakemake workflow, or troubleshoot Snakemake 9 + SLURM executor issues (memory conflicts, container propagation, stale locks). Also trigger when the user mentions snakemake dry-run, snakemake DAG, snakemake profile, workflow-profile, SLURM executor plugin, modkit pileup pipeline, or any multi-sample bioinformatics pipeline that needs per-sample parallelism with a dependency DAG. Do NOT trigger for: tasks with <3 steps and no parallelism (bash script is better), pure Nextflow workflows, or one-off data exploration.
npx skillsauth add sahuno/llm_configs snakemakeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build production-grade Snakemake 9 workflows on SLURM HPC with reproducible run organization, container integration, and battle-tested pitfall avoidance.
Use when the user needs:
references/debug_patterns.md)references/slurm_profiles.md)Don't use when the task has <3 steps, no per-sample parallelism, and no dependency DAG. A bash script is simpler and Snakemake overhead isn't free.
Each rule wraps exactly one tool or operation. Rules compose vertically via input:
dependencies. Optional rule blocks are gated by config booleans (if USE_FEATURE:).
Adding a rule should never break existing rules.
The workflow is a tool; each run is an experiment. Never write outputs into the workflow directory.
Workflow directory (versioned, reusable):
workflows/{workflow_name}/
├── Snakefile
├── config_template.yaml
├── scripts/ # Pluggable scripts with argparse CLI
├── profiles/slurm/config.yaml # Workflow-specific SLURM profile
├── test/ # Test fixtures (<5 min on cpushort)
│ ├── test_config.yaml
│ ├── test_manifest.tsv
│ └── test_regions.bed
└── CHANGELOG.md
Run root (one directory = one experiment):
{output_dir}/
├── config.yaml # COPY of config (frozen at run start)
├── run_snakemake.sh # Exact reproduction command
├── manifest.tsv # COPY of sample sheet
├── run_metadata.yaml # Auto-generated (date, versions, samples)
├── results/{rule_name}/{sample}/ # ALL rule outputs
├── benchmarks/ # benchmark: directive outputs
├── qc/ # QC gate sentinel files
└── logs/ # ALL rule logs
output_dir Config KeyAll subdirectories derived internally — never add separate config keys for logs, figures, or matrices:
OUTDIR = config["output_dir"]
RESULTSDIR = os.path.join(OUTDIR, "results")
LOGDIR = os.path.join(OUTDIR, "logs")
BENCHDIR = os.path.join(OUTDIR, "benchmarks")
QCDIR = os.path.join(OUTDIR, "qc")
Shell one-liners are fine inline (awk, bgzip && tabix). Externalize when the
shell block needs if/for/while, variable manipulation, or multi-step Python.
Rule of thumb: if you can't understand it in 10 seconds, externalize it.
workflow_dir/scripts/ with argparse CLIos.path.join(workflow.basedir, "scripts", "script.py")run: blocks for SLURM-submitted rules —
run: executes in the coordinator process, not on the compute nodeAfter any Snakefile edit:
snakemake --lint — catches style issuessnakemake -n — dry-run validates the full DAGsnakemake --dag | dot -Tpdf > dag.pdf — visualize dependenciesDry-run is the minimum test. A small-data end-to-end test is preferred.
| Feature | When to Use |
|---------|-------------|
| benchmark: | Every compute-heavy rule — informs production resource allocation |
| temp() | Intermediate files (auto-deleted after downstream rules complete) |
| protected() | Expensive final outputs (prevents accidental deletion) |
| retries: 2 | External tool rules (transient SLURM failures) |
| retries: 0 | Python scripts and QC gates (fail fast on bugs/data issues) |
QC checks are Snakemake rules, not informal post-hoc steps. Pattern: QC rule
produces a .pass sentinel; downstream rules depend on it.
rule qc_alignment:
"""Fail if mapping rate < 80%."""
input:
flagstat = os.path.join(RESULTSDIR, "alignment", "{sample}", "{sample}.flagstat"),
output:
qc_pass = os.path.join(QCDIR, "alignment_{sample}.pass"),
run:
import re
with open(input.flagstat) as f:
text = f.read()
mapped_pct = float(re.search(r"(\d+\.\d+)% mapped", text).group(1))
if mapped_pct < 80.0:
raise ValueError(f"QC FAIL: {wildcards.sample} mapping rate {mapped_pct}% < 80%")
with open(output.qc_pass, "w") as f:
f.write(f"PASS: mapping_rate={mapped_pct}%\n")
Gate after: alignment, pileup, DMR calling. Don't gate on soft thresholds — log and report those instead.
All cohort-specific details (paths, regex patterns, sample ID formats, exclusion keywords) are config args, never hardcoded. This enables reuse across cohorts without code changes. Config documents itself with comments.
:latest tags — pin exact versions (onttools_v3.9.sif)singularity: IMGsoftwares_containers_config.yaml — never guesssingularity: directive, do NOT add singularity exec in shell:Every workflow ships with test/ containing test_config.yaml, test_manifest.tsv,
and test_regions.bed. Tests must complete in <5 minutes on cpushort using slurmMinimal profile.
These are battle-tested fixes. Memorize them — they cause the most debugging time:
| Pitfall | Symptom | Fix |
|---------|---------|-----|
| Built-in mem_mb: 1000 | SLURM_MEM_PER_NODE vs SLURM_MEM_PER_CPU fatal | Add mem_mb: 0 to default-resources |
| mem: in profile | Same fatal conflict | Never use mem: — use mem_mb_per_cpu |
| Missing slurm_account | Silent job rejection | Always set slurm_account in default-resources |
| Coordinator uses --mem=XG | Propagates to child jobs via --export=ALL | Use --mem-per-cpu; add unset SLURM_MEM_PER_NODE |
| --singularity-args (with --) | Key not recognized | Use singularity-args: (no -- prefix) |
| run: block on SLURM | Executes in coordinator, not compute node | Use shell: + script for heavy work |
| Rule missing singularity: | ModuleNotFoundError on compute | Add singularity: IMG to every rule needing packages |
| Stale lock | Directory cannot be locked | snakemake --unlock then --rerun-incomplete |
| --profile with --directory | Profile not found | Always use absolute path for --profile |
| sacctmgr: not found | Login node missing SLURM CLI | Submit coordinator as SLURM batch job |
For the full pitfall table and debug patterns, read references/debug_patterns.md.
Read these when you need detailed templates or troubleshooting:
| File | When to Read |
|------|-------------|
| references/snakefile_template.md | Creating a new Snakefile — full template with all conventions |
| references/slurm_profiles.md | Writing or debugging SLURM profiles (3 profile tiers) |
| references/config_template.md | Creating config files and run scripts |
| references/script_interface.md | Writing pluggable Python scripts for scripts/ |
| references/debug_patterns.md | Diagnosing Snakemake/SLURM errors |
| references/completion_checklist.md | Before declaring a workflow complete |
| Rule Type | Retries | Reason | |-----------|---------|--------| | External tool (modkit, samtools, STAR) | 2 | SLURM nodes can timeout or OOM transiently | | Python script (aggregate, tensor, plot) | 0 | Code bugs should fail immediately | | File conversion (awk, bgzip, tabix) | 1 | Rare NFS issues | | QC gate rules | 0 | QC failures are data issues, not transient |
Adding samples to sample_manifest triggers only new per-sample rules (Snakemake
checks output files). Cohort-level rules re-run because the expand list changed.
output_dir when adding samples to an existing cohortsnakemake --forcerun build_cohort_matricesoutput_dir changes, everything re-runs (no prior outputs)Before declaring any workflow complete, run through the checklist in
references/completion_checklist.md. The critical items:
snakemake -n)mem_mb: 0 and slurm_account in default-resourcessingularity: IMGbenchmark: directiveunset SLURM_MEM_PER_NODE in run scriptThe ont_modkit_pileup workflow serves as the reference implementation:
--include-bed for single-pass pileup, chr prefix auto-detection,
BAM index auto-detection (.bai, .bam.csi, .csi)run: for lightweight cohort steps; shell: + singularity: for computeworkflows/ont_modkit_pileup/development
Decide whether and how to scatter genomics workloads across chromosomes or region tiles, then gather the per-shard outputs back together correctly. Use proactively whenever the user mentions parallelizing per-chromosome, sharding by chrom, tiling the genome, splitting a BAM/VCF/BED by region, merging per-chrom outputs, or has a workflow with obvious per-chromosome parallelism (variant calling, methylation pileup/DMR, coverage, liftover, peak calling, SV calling). Also triggers on /scatter-gather, "scatter X across chromosomes", "shard this", "chunked variant calling", "merge per-chrom VCFs", "gather these bedmethyl files", "concat these bigwigs", or any per-region parallelism question. **Trigger even when the user is also using Snakemake or Nextflow** — those skills handle DAG plumbing while this one defines *what* to scatter, *whether* it's even safe to scatter (some computations like DSS DMLtest pool globally and break under naive sharding), and *how* to gather each output format without silent corruption. Especially trigger on questions about merging per-chromosome BAM / VCF / BED / bedMethyl / bigwig outputs, or whether a scatter-gather is equivalent to running on the whole genome.
tools
Build self-contained, offline HTML genomic-region reports with igv-reports (create_report). Each HTML bundles igv.js viewers per region with embedded BAM/VCF data slices and default tracks (CpG islands, gencode, RepeatMasker); a reviewer clicks the variant table to inspect read-level evidence with no internet, no server, no IGV install. USE this skill whenever the user wants an HTML, clickable, or browseable viewer of genomic data — phrases like "HTML IGV report", "offline IGV", "self-contained HTML", "clickable viewer", "create_report", "igv-reports", "email this viewer", or any browseable HTML of reads at variants, fusion breakpoints, SV junctions, viral integrations, ChIP peaks, or ROIs. Trigger even when the user doesn't say "igv-reports" — giveaway is HTML/clickable/offline plus genomic regions. Also fire on /igv-reports. DO NOT use for static PNG/PDF/SVG IGV screenshots — use the igv-screenshots skill. Supports hg38, mm10, mm39, T2T. Defaults: --flanking 300, --standalone, genome-tagged output.
development
Verify that structural-variant / breakpoint calls are actually real by checking the chimeric reads that support them. Use whenever the user has caller output (Severus, Manta, Sniffles2, Delly, GRIDSS, MELT, Arriba, SvABA) and wants to validate / audit / QC / double-check their calls — viral integrations (HTLV-1, HBV, HPV, EBV), gene fusions (BCR-ABL, IGH translocations), mobile element insertions (L1, Alu, SVA), translocations. Trigger on phrasings like "is this integration real?", "should I trust this fusion call?", "are these false positives?", "are these PASS calls actually supported by reads?", "QC my SV calls", or any per-call chimeric-read / contamination / bimodality / T-vs-N read overlap question. Also fires on BAM @PG -Y / SA-tag questions on chimeric BAMs, and on /chimeric-read-validation. Output is a per-call TSV with pass / needs_review / fail verdicts. Do not use for calling SVs (use the caller), IGV screenshots (use igv-reports), or RNA-level fusion FDR (use Arriba).
tools
Run a stage-gated runtime/resource optimization study for any bioinformatics tool or command-line program on a SLURM HPC cluster. Walks through preflight, OFAT factor scan, 2^k confirmation factorial, build-mode + alternative-implementation comparison, input-size scan, out-of-sample validation, and produces a fitted predictive resource model (wall_s and peak_rss as functions of input size), a machine-readable model.yaml with caveats, a full REPORT.md, and a one-page exec summary PDF. Trigger PROACTIVELY whenever the user asks to "benchmark", "optimize", "tune", "characterize runtime/memory", "find best config", "build a resource model", "how does X scale", or "what should I put in my Snakemake resources directive for tool Y" — for any compute-bound bioinformatics step (sort, dedup, alignment, variant calling, methylation calling, basecalling, indexing, pileup, liftover). Also triggers on /runtime-resource-study or /benchmark-tool. Skip only for one-off quick timing where a single number suffices and no model is needed.