claude/skills/docker-hpc/SKILL.md
Use this skill when the user wants to build, create, or set up a Docker image for any scientific or bioinformatics tool — especially for HPC environments where Apptainer/Singularity will pull the result. Trigger for requests like: "dockerize X", "build a Docker image for Y", "containerize this tool", "I need a container for Z", "add this to my containers repo", or "push a Docker image to Docker Hub via GitHub Actions". This skill handles writing Dockerfiles, generating GitHub Actions CI workflows to build and push images without a local Docker daemon, and managing monorepo-style containers repos. Covers bioinformatics tools (samtools, STAR, CellRanger, dorado, modkit), R/Bioconductor packages (DESeq2, edgeR), conda/pip packages, and GPU/CUDA tools. Do NOT trigger for: writing Apptainer/Singularity .def files (use singularity-build skill), pulling existing images, or running containers interactively.
npx skillsauth add sahuno/llm_configs docker-hpcInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build principle-compliant Docker images for scientific computing on HPC and push them to Docker Hub automatically via GitHub Actions. No Docker daemon needed on the HPC — GitHub Actions does the build; Apptainer pulls the result.
All scripts are in scripts/ relative to this skill file.
| Script | Purpose |
|--------|---------|
| scripts/preflight.sh | Verify git, gh CLI, Docker Hub token, and username |
| scripts/detect_context.sh | Decide: standalone repo or add to existing monorepo |
| scripts/generate_dockerfile.py | Template engine → principle-compliant Dockerfile |
| scripts/setup_repo.sh | Create standalone GitHub repo + secret + push workflow |
| scripts/add_to_monorepo.sh | Add tool to existing containers/self_made/ monorepo |
Templates live in templates/ (Dockerfiles and GitHub Actions workflows).
Dockerfile principles are in docs/dockerfile_principles.md.
bash scripts/preflight.sh
Read the output carefully. If any check fails, stop and fix it with the user before
continuing. The script emits DOCKER_USER=<username> and DOCKER_TOKEN=<varname>
on success — capture these for use in later steps.
bash scripts/detect_context.sh
Output is one of:
MODE=standalone — no monorepo detected; create a new dedicated repoMODE=monorepo REPO=<owner/repo> — existing monorepo found; add to itIf monorepo mode, confirm the detected repo with the user before proceeding.
Ask for:
samtools, dorado)1.21, 1.4.0)ont | cuda | r | python-ml | biocli | generic
samtools --version)gh auth status usernamepython3 scripts/generate_dockerfile.py \
--tool-name <name> \
--tool-type <type> \
--version <version> \
--packages-conda "<space-separated>" \
--packages-pip "<space-separated>" \
--packages-r-bioc "<space-separated Bioconductor packages — r type only>" \
--packages-r-cran "<space-separated CRAN packages — r type only>" \
--validation-cmd "<cmd>" \
--maintainer <maintainer>
For r tool type, always ask the user to separate packages into Bioconductor vs CRAN.
Pass Bioconductor packages (DESeq2, edgeR, limma, etc.) via --packages-r-bioc
and CRAN packages (ggplot2, dplyr, etc.) via --packages-r-cran.
The script prints the Dockerfile to stdout. Show it to the user for review. Do not proceed to Step 5 until the user approves the Dockerfile.
If the tool type doesn't fit any template cleanly (truly exotic tool),
generate the Dockerfile using your knowledge of the 10 principles in
docs/dockerfile_principles.md rather than forcing a bad template match.
Proprietary binary tools (CellRanger, Guppy, GATK bundle, STARsolo, etc.) are the most common exotic case. These are distributed as pre-built tarballs — conda cannot install them. For these:
ubuntu:22.04 or condaforge/miniforge3 as base (not a tool-specific image)RUN curl or instruct the user to COPY a local tarballARG VERSION so the Dockerfile is reusable across releasesENV PATH to include the binary directory (principle 7)RUN <tool> --version validation (principle 5)--build-arg DOWNLOAD_URL pattern
and note that the user must supply a pre-signed/authenticated URL at build time
See principle 9 in docs/dockerfile_principles.md for a complete template.Standalone mode:
bash scripts/setup_repo.sh \
--tool-name <name> \
--version <version> \
--dockerfile <path-to-approved-dockerfile> \
--docker-user <DOCKER_USER> \
--token-var <DOCKER_TOKEN>
Monorepo mode:
bash scripts/add_to_monorepo.sh \
--repo <owner/repo> \
--tool-name <name> \
--version <version> \
--dockerfile <path-to-approved-dockerfile> \
--token-var <DOCKER_TOKEN>
Standalone: git tag v<version> && git push origin v<version>
Monorepo: git tag <tool>-v<version> && git push origin <tool>-v<version>
gh run list --repo <owner/repo> --limit 3
gh run watch --repo <owner/repo>
Report the final image location:
docker pull <docker-user>/<tool>:<version>
apptainer pull docker://<docker-user>/<tool>:<version>
| Tool type | Base image | Package manager | Template file |
|-----------|-----------|-----------------|---------------|
| ont | nanoporetech/dorado or ubuntu:22.04 | mamba + pip | ont_tools.Dockerfile |
| cuda | nvidia/cuda:12.4.1-base-ubuntu22.04 | mamba | cuda_base.Dockerfile |
| r | bioconductor/bioconductor_docker:devel | mamba + BiocManager | r_base.Dockerfile |
| python-ml | condaforge/miniforge3 + CUDA | mamba + pip | cuda_base.Dockerfile |
| biocli | condaforge/miniforge3 | mamba | conda_base.Dockerfile |
| generic | condaforge/miniforge3 | mamba | conda_base.Dockerfile |
The preflight script resolves the Docker Hub token in this order:
DOCKERHUB_TOKEN env var → use directlyAPPTAINER_DOCKER_PASSWORD env var → reuse (warn user to verify write scope)hub.docker.com → Account Settings → Securitygit push --forcelinux/amd64 unless the user explicitly asks otherwise.def files, redirect the user to the singularity-build skilldevelopment
Decide whether and how to scatter genomics workloads across chromosomes or region tiles, then gather the per-shard outputs back together correctly. Use proactively whenever the user mentions parallelizing per-chromosome, sharding by chrom, tiling the genome, splitting a BAM/VCF/BED by region, merging per-chrom outputs, or has a workflow with obvious per-chromosome parallelism (variant calling, methylation pileup/DMR, coverage, liftover, peak calling, SV calling). Also triggers on /scatter-gather, "scatter X across chromosomes", "shard this", "chunked variant calling", "merge per-chrom VCFs", "gather these bedmethyl files", "concat these bigwigs", or any per-region parallelism question. **Trigger even when the user is also using Snakemake or Nextflow** — those skills handle DAG plumbing while this one defines *what* to scatter, *whether* it's even safe to scatter (some computations like DSS DMLtest pool globally and break under naive sharding), and *how* to gather each output format without silent corruption. Especially trigger on questions about merging per-chromosome BAM / VCF / BED / bedMethyl / bigwig outputs, or whether a scatter-gather is equivalent to running on the whole genome.
tools
Build self-contained, offline HTML genomic-region reports with igv-reports (create_report). Each HTML bundles igv.js viewers per region with embedded BAM/VCF data slices and default tracks (CpG islands, gencode, RepeatMasker); a reviewer clicks the variant table to inspect read-level evidence with no internet, no server, no IGV install. USE this skill whenever the user wants an HTML, clickable, or browseable viewer of genomic data — phrases like "HTML IGV report", "offline IGV", "self-contained HTML", "clickable viewer", "create_report", "igv-reports", "email this viewer", or any browseable HTML of reads at variants, fusion breakpoints, SV junctions, viral integrations, ChIP peaks, or ROIs. Trigger even when the user doesn't say "igv-reports" — giveaway is HTML/clickable/offline plus genomic regions. Also fire on /igv-reports. DO NOT use for static PNG/PDF/SVG IGV screenshots — use the igv-screenshots skill. Supports hg38, mm10, mm39, T2T. Defaults: --flanking 300, --standalone, genome-tagged output.
development
Verify that structural-variant / breakpoint calls are actually real by checking the chimeric reads that support them. Use whenever the user has caller output (Severus, Manta, Sniffles2, Delly, GRIDSS, MELT, Arriba, SvABA) and wants to validate / audit / QC / double-check their calls — viral integrations (HTLV-1, HBV, HPV, EBV), gene fusions (BCR-ABL, IGH translocations), mobile element insertions (L1, Alu, SVA), translocations. Trigger on phrasings like "is this integration real?", "should I trust this fusion call?", "are these false positives?", "are these PASS calls actually supported by reads?", "QC my SV calls", or any per-call chimeric-read / contamination / bimodality / T-vs-N read overlap question. Also fires on BAM @PG -Y / SA-tag questions on chimeric BAMs, and on /chimeric-read-validation. Output is a per-call TSV with pass / needs_review / fail verdicts. Do not use for calling SVs (use the caller), IGV screenshots (use igv-reports), or RNA-level fusion FDR (use Arriba).
tools
Run a stage-gated runtime/resource optimization study for any bioinformatics tool or command-line program on a SLURM HPC cluster. Walks through preflight, OFAT factor scan, 2^k confirmation factorial, build-mode + alternative-implementation comparison, input-size scan, out-of-sample validation, and produces a fitted predictive resource model (wall_s and peak_rss as functions of input size), a machine-readable model.yaml with caveats, a full REPORT.md, and a one-page exec summary PDF. Trigger PROACTIVELY whenever the user asks to "benchmark", "optimize", "tune", "characterize runtime/memory", "find best config", "build a resource model", "how does X scale", or "what should I put in my Snakemake resources directive for tool Y" — for any compute-bound bioinformatics step (sort, dedup, alignment, variant calling, methylation calling, basecalling, indexing, pileup, liftover). Also triggers on /runtime-resource-study or /benchmark-tool. Skip only for one-off quick timing where a single number suffices and no model is needed.