claude/skills/runtime-resource-study/SKILL.md
Run a stage-gated runtime/resource optimization study for any bioinformatics tool or command-line program on a SLURM HPC cluster. Walks through preflight, OFAT factor scan, 2^k confirmation factorial, build-mode + alternative-implementation comparison, input-size scan, out-of-sample validation, and produces a fitted predictive resource model (wall_s and peak_rss as functions of input size), a machine-readable model.yaml with caveats, a full REPORT.md, and a one-page exec summary PDF. Trigger PROACTIVELY whenever the user asks to "benchmark", "optimize", "tune", "characterize runtime/memory", "find best config", "build a resource model", "how does X scale", or "what should I put in my Snakemake resources directive for tool Y" — for any compute-bound bioinformatics step (sort, dedup, alignment, variant calling, methylation calling, basecalling, indexing, pileup, liftover). Also triggers on /runtime-resource-study or /benchmark-tool. Skip only for one-off quick timing where a single number suffices and no model is needed.
npx skillsauth add sahuno/llm_configs runtime-resource-studyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Stage-gated benchmark methodology for compute-bound HPC tools. Produces a predictive
wall(N, file_size) + rss(N) model that drives Snakemake / Nextflow resources
directives, plus a publication-quality report.
| Artefact | Purpose |
|---|---|
| model.yaml | Machine-readable model — coefficients, R², validation residuals, caveats, helper formulas |
| REPORT.md | Full narrative — math callout, TL;DR, per-stage results, recommendations |
| exec_summary.{pdf,png,svg} | One-page US Letter summary for sharing |
| benchmark.csv per stage | One row per replicate (raw measurements) |
| summary_table.csv per stage | Per-condition median + min/max + CV |
All under <project_root>/<tool>/<command>/ with reproducible drivers.
/data1/greenbab/users/ahunos/projects/biotoolsBenchmarks/samtools/sort/ is a complete
working instance. Skim its REPORT.md for deliverable shape and src/ for script patterns.references/stage_design.md to pick factor levels for the user's tool. Different
tool classes (sort, align, variant-call) have different relevant factors.Decide and write down (output: Stage0_setup.md in project root):
--gres=gpu). If yes, read references/gpu_tools.md before continuing — the factor menu, runner, and partition all change. Use scripts/03_run_one_gpu.sh.template instead of the CPU runner.gpu_tools.md.)$0.05/core-hour if local rate unknown. For GPU tools, also $/gpu-hour (default A100=$1.50, L40S=$0.80, H100=$3.00) and $/kWh for energy (default $0.12)Inspect the input: record count, file size, schema/version, technology, sort state, key metadata. Without this, factor levels are guessed.
Use scripts/01_inspect_input.sh.template — adapt the metadata-extraction commands to
your input format (the template covers BAM, VCF, FASTQ stubs).
Output: results/preflight/<input>.preflight.yaml.
Vary one factor at a time around the baseline. Identify which factors move the needle and at what magnitude.
Driver: copy scripts/submit_grid.py.template, fill in FACTORS and BASELINE for
your tool. See references/stage_design.md for choosing levels.
Output: results/<date>_stage2_ofat/.
2^k full factorial near the candidate optimum from Stage 2. Detect interactions and tighten variance bounds at the recommended config.
Output: results/<date>_stage3_factorial/.
Closes the "is this the right tool?" gap. Test:
5+ replicates per build mode (build comparisons need higher n — differences are often
small and noisy). Capture /proc/cpuinfo model name per replicate.
Per rules/apptainer_vs_conda.md: on this cluster, prefer SIF for short HTSlib-class
jobs (NFS cold-start tax on conda binaries).
Output: results/<date>_stage4_buildmode/.
Subsample reproducibly (seed=42) at 10/25/50/75/100% of the calibration input. Fit
wall_s = a + b·N_records and rss = c + d·N_records. Target R² ≥ 0.95.
| Input format | Subsample command |
|---|---|
| BAM | samtools view -s 42.<frac> -b -o out.bam in.bam |
| VCF | bcftools view in.vcf.gz \| awk 'BEGIN{srand(42)} /^#/{print; next} rand()<<frac>>{print}' \| bgzip > out.vcf.gz |
| FASTQ | seqkit sample -p <frac> -s 42 in.fq.gz -o out.fq.gz |
| POD5 | pod5 subset --threads 8 in.pod5 -o out.pod5 --include-fraction <frac> |
Output: initial model.yaml.
Hold out 1–2 inputs ~10× larger than calibration max. Predict, run, compare. Report error in % terms.
If the 1-term linear model has > ±20% error, refit with file_size as a second
predictor: wall_s = a + b·N + c·file_size_bytes. Discovered in the samtools sort
study — the page-cache-vs-IO-bound regime boundary makes 1-term linear fits fail at
extrapolation (~30× scale beyond calibration).
Update model.yaml with validation residuals and the 2-term fit if needed.
Three things in this final stage:
lme4::lmer(wall ~ factor + (1|host)) and report % variance explained by host vs condition vs replicate.
If host explains >25% of variance, you have hardware heterogeneity that needs
--exclude=<bad_nodes> or tighter partition selection. Use
scripts/fit_model.py --variance-partition.wall × threads × cost_per_core_hour for each
condition. Identify the cost-Pareto frontier — sometimes -@ 16 is cheaper than
-@ 32 even though it's slower. Use scripts/cost_accounting.py.assets/REPORT_template.md with your data; render
scripts/exec_summary.R.template for the one-page summary.These prevent silent confounders that invalidate every conclusion.
samtools view -c, bcftools view, wc -l, etc.) goes on a SLURM job. Even one-off
inventory tasks. See project memory login_node_discipline.md.--exclusive, single-node. Fresh node → cold page
cache → honest fs_in measurements. Never loop conditions inside an allocation. See
rules/mskcc_partitions.md./usr/bin/time — it doesn't exist
on this cluster. Install via conda alongside the tool and resolve as a sibling of the
tool's binary. See rules/gnu_time.md.--exclude=isca071 on cpushort. After Stage 1, inventory CPU model per replicate;
if any node clusters separately, add to the exclude list and rerun affected conditions.--gres=gpu:<type>:N, never just --gres=gpu:N. GPU
vintage perf differences are 3–10× (much larger than CPU heterogeneity). Capture
gpu_model, gpu_driver, peak GPU memory, and mean GPU utilisation per replicate via
the nvidia-smi sidecar in 03_run_one_gpu.sh.template. See references/gpu_tools.md.manifest.tsv is the record of intent (what
conditions WERE planned). benchmark.csv is the record of actuals. They should match
row-for-row at the end.submit_batch injects --mem 64G on cmdline which conflicts
with #SBATCH --mem-per-cpu; doesn't support --array. See rules/slurm_mcp.md.
For batches > 20 jobs, prefer direct sbatch via Python.In scripts/ — fill placeholders marked {{NAME}}:
| Template | Purpose | Edit |
|---|---|---|
| 01_inspect_input.sh.template | Preflight | Metadata-extraction commands per input format |
| 03_run_one.sh.template | Single-condition runner (CPU) with GNU time | The ${TOOL_BIN} ${ARGS} line + CSV header |
| 03_run_one_gpu.sh.template | Single-condition runner (GPU) — adds nvidia-smi sidecar, --nv for containers, GPU identity capture | The # === Build command === block + tool-specific factors |
| submit_grid.py.template | Stage driver: manifest + sbatch + submit | FACTORS dict, BASELINE |
| summarise_stage.R.template | Per-stage figures + table | Factor labels, plot titles |
| exec_summary.R.template | One-page composite | Plug in your CSV paths |
Bundled (run as-is):
| Script | Purpose |
|---|---|
| fit_model.py | Linear regression, variance partitioning, cross-validation |
| cost_accounting.py | wall × threads × $/core-hour → CPU-hours, $/sample |
For each stage (1–6):
manifest.tsv (intent)benchmark.csv (actuals)summary_table.csv (medians + CV)figures/{png,pdf,svg}/ (≥1 headline figure)For the project as a whole:
model.yaml (validated, with caveats)REPORT.mdexec_summary.pdfreferences/stage_design.md — picking factors, levels, replicate counts per tool classreferences/analysis_recipes.md — variance partitioning, model fit + validation, cost accountingreferences/templates_guide.md — how to fill each templatereferences/exemplar_walkthrough.md — tour of the samtools sort study end-to-endreferences/gpu_tools.md — read this first for any GPU tool (basecaller, GPU aligner, GPU variant caller). Different factor menu, partitions, container --nv, nvidia-smi sidecar, energy accounting.--time for SLURM and doesn't care about the mathFor these, a short /usr/bin/time wrap and a dry estimate is enough.
development
Decide whether and how to scatter genomics workloads across chromosomes or region tiles, then gather the per-shard outputs back together correctly. Use proactively whenever the user mentions parallelizing per-chromosome, sharding by chrom, tiling the genome, splitting a BAM/VCF/BED by region, merging per-chrom outputs, or has a workflow with obvious per-chromosome parallelism (variant calling, methylation pileup/DMR, coverage, liftover, peak calling, SV calling). Also triggers on /scatter-gather, "scatter X across chromosomes", "shard this", "chunked variant calling", "merge per-chrom VCFs", "gather these bedmethyl files", "concat these bigwigs", or any per-region parallelism question. **Trigger even when the user is also using Snakemake or Nextflow** — those skills handle DAG plumbing while this one defines *what* to scatter, *whether* it's even safe to scatter (some computations like DSS DMLtest pool globally and break under naive sharding), and *how* to gather each output format without silent corruption. Especially trigger on questions about merging per-chromosome BAM / VCF / BED / bedMethyl / bigwig outputs, or whether a scatter-gather is equivalent to running on the whole genome.
tools
Build self-contained, offline HTML genomic-region reports with igv-reports (create_report). Each HTML bundles igv.js viewers per region with embedded BAM/VCF data slices and default tracks (CpG islands, gencode, RepeatMasker); a reviewer clicks the variant table to inspect read-level evidence with no internet, no server, no IGV install. USE this skill whenever the user wants an HTML, clickable, or browseable viewer of genomic data — phrases like "HTML IGV report", "offline IGV", "self-contained HTML", "clickable viewer", "create_report", "igv-reports", "email this viewer", or any browseable HTML of reads at variants, fusion breakpoints, SV junctions, viral integrations, ChIP peaks, or ROIs. Trigger even when the user doesn't say "igv-reports" — giveaway is HTML/clickable/offline plus genomic regions. Also fire on /igv-reports. DO NOT use for static PNG/PDF/SVG IGV screenshots — use the igv-screenshots skill. Supports hg38, mm10, mm39, T2T. Defaults: --flanking 300, --standalone, genome-tagged output.
development
Verify that structural-variant / breakpoint calls are actually real by checking the chimeric reads that support them. Use whenever the user has caller output (Severus, Manta, Sniffles2, Delly, GRIDSS, MELT, Arriba, SvABA) and wants to validate / audit / QC / double-check their calls — viral integrations (HTLV-1, HBV, HPV, EBV), gene fusions (BCR-ABL, IGH translocations), mobile element insertions (L1, Alu, SVA), translocations. Trigger on phrasings like "is this integration real?", "should I trust this fusion call?", "are these false positives?", "are these PASS calls actually supported by reads?", "QC my SV calls", or any per-call chimeric-read / contamination / bimodality / T-vs-N read overlap question. Also fires on BAM @PG -Y / SA-tag questions on chimeric BAMs, and on /chimeric-read-validation. Output is a per-call TSV with pass / needs_review / fail verdicts. Do not use for calling SVs (use the caller), IGV screenshots (use igv-reports), or RNA-level fusion FDR (use Arriba).
tools
End-to-end builder for new nf-core modules. Scaffolds all required files, runs lint and nf-test in a loop until both pass, and produces PR-ready artifacts (description, Slack draft, checklist). Use this skill proactively whenever the user wants to: create a new nf-core module, add a tool to nf-core/modules, write a DORADO_BASECALLER or MODKIT_LOCALIZE style process, wrap a bioinformatics tool in Nextflow for nf-core, or asks "how do I submit a module to nf-core". Also trigger for: adding GPU support to a module, wrapping an R or Python script as an nf-core process, handling licensed/ non-bioconda tools in nf-core, fixing nf-core lint failures on a new module. Do NOT trigger for: editing existing pipelines, writing Snakemake rules, or debugging non-module Nextflow code.