claude/skills/nfcore-module/SKILL.md
End-to-end builder for new nf-core modules. Scaffolds all required files, runs lint and nf-test in a loop until both pass, and produces PR-ready artifacts (description, Slack draft, checklist). Use this skill proactively whenever the user wants to: create a new nf-core module, add a tool to nf-core/modules, write a DORADO_BASECALLER or MODKIT_LOCALIZE style process, wrap a bioinformatics tool in Nextflow for nf-core, or asks "how do I submit a module to nf-core". Also trigger for: adding GPU support to a module, wrapping an R or Python script as an nf-core process, handling licensed/ non-bioconda tools in nf-core, fixing nf-core lint failures on a new module. Do NOT trigger for: editing existing pipelines, writing Snakemake rules, or debugging non-module Nextflow code.
npx skillsauth add sahuno/llm_configs nfcore-moduleInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build new nf-core modules end-to-end: scaffold → lint loop → test loop → PR artifacts.
Encodes hard-won lessons from building dorado/basecaller, modkit/localize,
and modkit/localize/plot.
references/rules_engine.mdreferences/module_templates.mdreferences/pr_artifacts.mdStart by reading references/rules_engine.md to load the rules into context.
Extract what you can: tool name (tool/subtool), container image, one-line description,
conda availability, GPU requirement, test data path.
Search modules/nf-core/ for the closest existing module to use as a pattern:
ls modules/nf-core/<tool>/ 2>/dev/null
find modules/nf-core/ -name "main.nf" | xargs grep -l "<similar_tool>" 2>/dev/null | head -5
Tell the user what you found and which files you're using as reference.
Ask every question you need in a single message. Minimum questions for any module:
Always ask if not provided:
tool/subtool name — e.g. dorado/basecaller, samtools/sortAsk if potentially GPU:
5. Does this tool require a GPU at runtime? (yes = CUDA, no = CPU only)
— If yes: what GPU flag does the tool use? (e.g. --device cuda:all, --gpu)
Ask if not on bioconda: 6. Why is it not on bioconda? (licence restriction, not yet submitted, custom script) 7. Is there an official upstream Docker Hub image, or does one need to be built?
Always ask: 8. Do you have real test data to provide? If yes, what path(s)? — If no: STOP and say "I need real test data to write working tests. Please provide at least one input file. Do not proceed with synthetic data."
Do not proceed until test data paths are confirmed.
| Type | Condition | Key markers |
|------|-----------|-------------|
| A Standard | On bioconda, no GPU | conda: bioconda::pkg=version, quay.io/biocontainers/... |
| B Licensed | Not on bioconda, no GPU | conda null, Docker Hub image, licence comment |
| C GPU | Requires CUDA GPU | conda null, label 'process_gpu', --nv flag |
| D Script | R/Python template, no binary tool | environment.yml, Wave URI, template directive |
Tell the user which type you've classified and why before scaffolding.
Read references/module_templates.md for the exact file templates per module type.
Create the full directory tree:
modules/nf-core/<tool>/<subtool>/
├── main.nf
├── meta.yml
├── environment.yml
├── templates/ # Type D only
│ └── script.R / .py
└── tests/
├── main.nf.test
├── nextflow.config
└── data/ # symlink or note real data location
ext keys — ONLY these are allowed in main.nf:
ext.args, ext.args2, ext.args3, ext.prefix, ext.when
Never use ext.device, ext.model, ext.threads, or any custom key.
Hardcode defaults in the script body instead (e.g. --device cuda:all).
tests/nextflow.config — zero local paths, ever:
process {
withName: 'TOOL_SUBTOOL' {
ext.args = params.module_args ?: ''
}
}
singularity.registry = ''
docker.registry = ''
// Add only for GPU modules:
// singularity.runOptions = '--nv'
Test data references — use concatenation, not interpolation:
// ✅ correct — resolves at Nextflow runtime
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true)
// ❌ wrong — params is null in nf-test Groovy context
file("${params.modules_testdata_base_path}genomics/...", checkIfExists: true)
nf-test assertions — check output channels, not directories:
// ✅ correct
{ assert process.out.bam }
// ❌ wrong — Java Path has no .isDirectory()
{ assert path("output/").isDirectory() }
nf-core modules lint --fix <tool/subtool> # auto-fix first
nf-core modules lint <tool/subtool> # then check
Repeat until [✗] 0 Tests Failed. Fix each failure before re-running.
⚠ Run the R2 audit immediately after every lint --fix call:
grep -n 'projectDir\|test_bam\|\.sif\|ext\.device\|ext\.model' \
modules/nf-core/<tool>/<subtool>/tests/nextflow.config
Expected: no output. If anything is found, remove it before re-running lint.
Why:
nf-core modules lint --fixhas been observed to inject banned patterns intotests/nextflow.config— specificallyext.device(banned ext key, R1) and local.sifcontainer overrides (R2 violation). These silently survive subsequent lint checks because lint doesn't validate ext keys in test configs. They only fail on CI.
Expected warnings (not failures) for Type B/C — acceptable, explain in PR:
container_links: quay.io 404 for Docker Hub imagesmain_nf_container: container version mismatch warningprocess_standard_label: process_gpu not in standard listCommon failures and fixes:
main_nf_ext_key: invalid ext key → remove it, hardcode in script bodycorrect_meta_inputs: meta.yml doesn't match main.nf → run --fixhas_meta_topics: missing topics section → run --fixenvironment_yml: missing/wrong schema → add # yaml-language-server headernf-test test modules/nf-core/<tool>/<subtool>/tests/main.nf.test \
--profile conda --update-snapshot
Fix failures and re-run until all tests pass.
Stub tests (always required, run in CI):
options "-stub"{ assert process.success } + { assert snapshot(process.out).match() }GPU real tests (Type C only, excluded from CI):
tag "gpu"--profile singularity,gpu --tag gpuReal test data paths — must use params.modules_testdata_base_path concatenation (R4),
never ${projectDir}/tests/... or any absolute path.
This step catches violations that lint won't catch and that only fail on CI or a reviewer's machine. Run both greps. Expected result: no output. Any match is a blocker.
# Audit nextflow.config for banned patterns
grep -n 'projectDir\|test_bam\|test_reference\|models_dir\|\.sif' \
modules/nf-core/<tool>/<subtool>/tests/nextflow.config
# Audit test file for hardcoded local paths
grep -n 'projectDir\|/data/\|/home/' \
modules/nf-core/<tool>/<subtool>/tests/main.nf.test
What to fix if found:
| Pattern | Fix |
|---------|-----|
| params { test_bam = ... } | Delete the entire params block |
| // container = /path/to/...sif | Delete the line (even comments are flagged by reviewers) |
| file("${projectDir}/tests/...") | Replace with file(params.modules_testdata_base_path + 'path/to/file', checkIfExists: true) |
| Any /data/ or /home/ path | Replace with testdata base path or remove entirely |
After fixing any violations, re-run the test loop (Phase 4) to confirm tests still pass.
Read references/pr_artifacts.md for templates.
Always produce all three regardless of module type:
.github/PULL_REQUEST_TEMPLATE.md#new-modules draft — required for Type B/C/D (non-standard decisions)End with:
"Module is lint-clean and all stub tests pass. PR artifacts above are ready. When you want to open the PR, say 'open the PR' and I'll run
gh pr create."
Never run gh pr create until the user explicitly says so.
development
Decide whether and how to scatter genomics workloads across chromosomes or region tiles, then gather the per-shard outputs back together correctly. Use proactively whenever the user mentions parallelizing per-chromosome, sharding by chrom, tiling the genome, splitting a BAM/VCF/BED by region, merging per-chrom outputs, or has a workflow with obvious per-chromosome parallelism (variant calling, methylation pileup/DMR, coverage, liftover, peak calling, SV calling). Also triggers on /scatter-gather, "scatter X across chromosomes", "shard this", "chunked variant calling", "merge per-chrom VCFs", "gather these bedmethyl files", "concat these bigwigs", or any per-region parallelism question. **Trigger even when the user is also using Snakemake or Nextflow** — those skills handle DAG plumbing while this one defines *what* to scatter, *whether* it's even safe to scatter (some computations like DSS DMLtest pool globally and break under naive sharding), and *how* to gather each output format without silent corruption. Especially trigger on questions about merging per-chromosome BAM / VCF / BED / bedMethyl / bigwig outputs, or whether a scatter-gather is equivalent to running on the whole genome.
tools
Build self-contained, offline HTML genomic-region reports with igv-reports (create_report). Each HTML bundles igv.js viewers per region with embedded BAM/VCF data slices and default tracks (CpG islands, gencode, RepeatMasker); a reviewer clicks the variant table to inspect read-level evidence with no internet, no server, no IGV install. USE this skill whenever the user wants an HTML, clickable, or browseable viewer of genomic data — phrases like "HTML IGV report", "offline IGV", "self-contained HTML", "clickable viewer", "create_report", "igv-reports", "email this viewer", or any browseable HTML of reads at variants, fusion breakpoints, SV junctions, viral integrations, ChIP peaks, or ROIs. Trigger even when the user doesn't say "igv-reports" — giveaway is HTML/clickable/offline plus genomic regions. Also fire on /igv-reports. DO NOT use for static PNG/PDF/SVG IGV screenshots — use the igv-screenshots skill. Supports hg38, mm10, mm39, T2T. Defaults: --flanking 300, --standalone, genome-tagged output.
development
Verify that structural-variant / breakpoint calls are actually real by checking the chimeric reads that support them. Use whenever the user has caller output (Severus, Manta, Sniffles2, Delly, GRIDSS, MELT, Arriba, SvABA) and wants to validate / audit / QC / double-check their calls — viral integrations (HTLV-1, HBV, HPV, EBV), gene fusions (BCR-ABL, IGH translocations), mobile element insertions (L1, Alu, SVA), translocations. Trigger on phrasings like "is this integration real?", "should I trust this fusion call?", "are these false positives?", "are these PASS calls actually supported by reads?", "QC my SV calls", or any per-call chimeric-read / contamination / bimodality / T-vs-N read overlap question. Also fires on BAM @PG -Y / SA-tag questions on chimeric BAMs, and on /chimeric-read-validation. Output is a per-call TSV with pass / needs_review / fail verdicts. Do not use for calling SVs (use the caller), IGV screenshots (use igv-reports), or RNA-level fusion FDR (use Arriba).
tools
Run a stage-gated runtime/resource optimization study for any bioinformatics tool or command-line program on a SLURM HPC cluster. Walks through preflight, OFAT factor scan, 2^k confirmation factorial, build-mode + alternative-implementation comparison, input-size scan, out-of-sample validation, and produces a fitted predictive resource model (wall_s and peak_rss as functions of input size), a machine-readable model.yaml with caveats, a full REPORT.md, and a one-page exec summary PDF. Trigger PROACTIVELY whenever the user asks to "benchmark", "optimize", "tune", "characterize runtime/memory", "find best config", "build a resource model", "how does X scale", or "what should I put in my Snakemake resources directive for tool Y" — for any compute-bound bioinformatics step (sort, dedup, alignment, variant calling, methylation calling, basecalling, indexing, pileup, liftover). Also triggers on /runtime-resource-study or /benchmark-tool. Skip only for one-off quick timing where a single number suffices and no model is needed.