
This skill creates publication-quality horizontal barplots with long categorical y-axis labels. Use when user asks to "create a GO enrichment barplot", "make a barplot with pathway names", "plot KEGG results as horizontal bars", "barplot with long y-axis labels", "enrichment figure with proper dimensions", "horizontal barplot that fits all labels", or needs help sizing figures with long text on the y-axis. Automatically calculates optimal width and height based on label length and number of categories. Supports GO, KEGG, Reactome pathway visualization.
This skill generates IGV screenshots using igver for genomic visualization. Use when user asks to "take IGV screenshot", "IGV snapshot", "visualize BAM in IGV", "igver screenshot", "show me the reads at", "screenshot genomic region", "visualize methylation in IGV", "IGV plot of BAM", "generate IGV images", "look at this region in IGV", "screenshot structural variant", "haplotype visualization in IGV", or any mention of IGV/igver with BAM files. Handles Singularity/Docker execution, chromosome naming mismatches, methylation coloring (ONT/PacBio), and batch generation for large region sets.
Expert Snakemake workflow engineer for bioinformatics pipelines on SLURM HPC. Specializes in creating, debugging, and running Snakemake 9 workflows with battle-tested SLURM profiles, proper container integration, and reproducible run organization. Use this skill proactively whenever the user asks to: create/write/build a Snakemake workflow or pipeline, debug a Snakemake error or failed SLURM job, add rules to an existing Snakefile, write or fix a SLURM profile for Snakemake, organize pipeline outputs or run directories, convert a shell script or ad-hoc analysis into a reproducible Snakemake workflow, or troubleshoot Snakemake 9 + SLURM executor issues (memory conflicts, container propagation, stale locks). Also trigger when the user mentions snakemake dry-run, snakemake DAG, snakemake profile, workflow-profile, SLURM executor plugin, modkit pileup pipeline, or any multi-sample bioinformatics pipeline that needs per-sample parallelism with a dependency DAG. Do NOT trigger for: tasks with <3 steps and no parallelism (bash script is better), pure Nextflow workflows, or one-off data exploration.
Expert Singularity/Apptainer container builder for bioinformatics tools on MSKCC HPC (RHEL 8, no sudo). Builds containers using --fakeroot with root-mapped namespace, conda-based package management (never apt-get), and SLURM-safe build scripts. Use this skill proactively whenever the user asks to: create/build a Singularity or Apptainer container, write a .def definition file, containerize a bioinformatics tool or software package, build a SIF image, troubleshoot a failed container build (exit status 1/15/141/255), fix a "command gcc failed" or "cannot find libc" or "CUDA_INCLUDE_DIRS" error in a container build, or package any software into a reproducible container image. Also trigger when the user mentions: .def file, .sif file, apptainer build, singularity build, fakeroot build, container for dorado/samtools/modkit or any bioinformatics tool, or asks to install software that requires root. Once triggered, the skill triages pull-vs-build (Step 0): it tries to acquire and verify a pre-built image before building from scratch. Do NOT trigger for: running an existing container (apptainer exec/run) or Docker/Podman build workflows (use the docker-hpc skill for those).
Run a stage-gated runtime/resource optimization study for any bioinformatics tool or command-line program on a SLURM HPC cluster. Walks through preflight, OFAT factor scan, 2^k confirmation factorial, build-mode + alternative-implementation comparison, input-size scan, out-of-sample validation, and produces a fitted predictive resource model (wall_s and peak_rss as functions of input size), a machine-readable model.yaml with caveats, a full REPORT.md, and a one-page exec summary PDF. Trigger PROACTIVELY whenever the user asks to "benchmark", "optimize", "tune", "characterize runtime/memory", "find best config", "build a resource model", "how does X scale", or "what should I put in my Snakemake resources directive for tool Y" — for any compute-bound bioinformatics step (sort, dedup, alignment, variant calling, methylation calling, basecalling, indexing, pileup, liftover). Also triggers on /runtime-resource-study or /benchmark-tool. Skip only for one-off quick timing where a single number suffices and no model is needed.
Build self-contained, offline HTML genomic-region reports with igv-reports (create_report). Each HTML bundles igv.js viewers per region with embedded BAM/VCF data slices and default tracks (CpG islands, gencode, RepeatMasker); a reviewer clicks the variant table to inspect read-level evidence with no internet, no server, no IGV install. USE this skill whenever the user wants an HTML, clickable, or browseable viewer of genomic data — phrases like "HTML IGV report", "offline IGV", "self-contained HTML", "clickable viewer", "create_report", "igv-reports", "email this viewer", or any browseable HTML of reads at variants, fusion breakpoints, SV junctions, viral integrations, ChIP peaks, or ROIs. Trigger even when the user doesn't say "igv-reports" — giveaway is HTML/clickable/offline plus genomic regions. Also fire on /igv-reports. DO NOT use for static PNG/PDF/SVG IGV screenshots — use the igv-screenshots skill. Supports hg38, mm10, mm39, T2T. Defaults: --flanking 300, --standalone, genome-tagged output.
Decide whether and how to scatter genomics workloads across chromosomes or region tiles, then gather the per-shard outputs back together correctly. Use proactively whenever the user mentions parallelizing per-chromosome, sharding by chrom, tiling the genome, splitting a BAM/VCF/BED by region, merging per-chrom outputs, or has a workflow with obvious per-chromosome parallelism (variant calling, methylation pileup/DMR, coverage, liftover, peak calling, SV calling). Also triggers on /scatter-gather, "scatter X across chromosomes", "shard this", "chunked variant calling", "merge per-chrom VCFs", "gather these bedmethyl files", "concat these bigwigs", or any per-region parallelism question. **Trigger even when the user is also using Snakemake or Nextflow** — those skills handle DAG plumbing while this one defines *what* to scatter, *whether* it's even safe to scatter (some computations like DSS DMLtest pool globally and break under naive sharding), and *how* to gather each output format without silent corruption. Especially trigger on questions about merging per-chromosome BAM / VCF / BED / bedMethyl / bigwig outputs, or whether a scatter-gather is equivalent to running on the whole genome.
Build a per-patient × per-assay sample-availability heatmap (SPECTRUM-style cohort overview) from a wide TSV. Patient-level annotations (signature, BRCA1/2, CCNE1) on top, site strip on left, three-state cells (Yes/No/NA) with customizable colours and optional NA→No collapse. Use proactively when the user asks for a cohort overview, sample availability heatmap, assay coverage plot, data availability figure, or SPECTRUM-style heatmap; has a TSV with one row per sample and wants to visualize which assays (DLP, ONT, WGS, mpIF, scRNA/scATAC-seq, etc.) were run per patient; or types `/cohort-overview`. Keywords: "cohort overview", "sample availability", "assay matrix", "per-patient heatmap", "SPECTRUM heatmap", "coverage overview", "what samples do we have". Uses ComplexHeatmap: row-split by assay, per-patient sample slots, patient annotations on top, site strip on left.
Verify that structural-variant / breakpoint calls are actually real by checking the chimeric reads that support them. Use whenever the user has caller output (Severus, Manta, Sniffles2, Delly, GRIDSS, MELT, Arriba, SvABA) and wants to validate / audit / QC / double-check their calls — viral integrations (HTLV-1, HBV, HPV, EBV), gene fusions (BCR-ABL, IGH translocations), mobile element insertions (L1, Alu, SVA), translocations. Trigger on phrasings like "is this integration real?", "should I trust this fusion call?", "are these false positives?", "are these PASS calls actually supported by reads?", "QC my SV calls", or any per-call chimeric-read / contamination / bimodality / T-vs-N read overlap question. Also fires on BAM @PG -Y / SA-tag questions on chimeric BAMs, and on /chimeric-read-validation. Output is a per-call TSV with pass / needs_review / fail verdicts. Do not use for calling SVs (use the caller), IGV screenshots (use igv-reports), or RNA-level fusion FDR (use Arriba).
Use this skill when the user wants to build, create, or set up a Docker image for any scientific or bioinformatics tool — especially for HPC environments where Apptainer/Singularity will pull the result. Trigger for requests like: "dockerize X", "build a Docker image for Y", "containerize this tool", "I need a container for Z", "add this to my containers repo", or "push a Docker image to Docker Hub via GitHub Actions". This skill handles writing Dockerfiles, generating GitHub Actions CI workflows to build and push images without a local Docker daemon, and managing monorepo-style containers repos. Covers bioinformatics tools (samtools, STAR, CellRanger, dorado, modkit), R/Bioconductor packages (DESeq2, edgeR), conda/pip packages, and GPU/CUDA tools. Do NOT trigger for: writing Apptainer/Singularity .def files (use singularity-build skill), pulling existing images, or running containers interactively.
End-to-end builder for new nf-core modules. Scaffolds all required files, runs lint and nf-test in a loop until both pass, and produces PR-ready artifacts (description, Slack draft, checklist). Use this skill proactively whenever the user wants to: create a new nf-core module, add a tool to nf-core/modules, write a DORADO_BASECALLER or MODKIT_LOCALIZE style process, wrap a bioinformatics tool in Nextflow for nf-core, or asks "how do I submit a module to nf-core". Also trigger for: adding GPU support to a module, wrapping an R or Python script as an nf-core process, handling licensed/ non-bioconda tools in nf-core, fixing nf-core lint failures on a new module. Do NOT trigger for: editing existing pipelines, writing Snakemake rules, or debugging non-module Nextflow code.
Orchestrates a phased workflow for rigorously analyzing and presenting research papers — paper ingestion, comprehension quiz, causal-claims breakdown with quoted evidence (necessity vs. sufficiency vs. what was not proven), statistics & reproducibility audit (sample sizes, multiple-testing, deposition, code/reagent identity), critical evaluation, slide outline, slide draft, Q&A rehearsal, and post-talk writeup. Use this skill whenever the user mentions journal club, lab meeting paper presentation, paper deep-dive, preparing slides for a research paper, walking through a study, or asks for help understanding/critiquing/presenting a specific paper — even if they don't explicitly say "journal club." Also trigger on requests like "give me a structured breakdown of this paper", "what causal claims does paper X make and what couldn't they prove", "extract necessity and sufficiency experiments", "what mechanism does this paper establish", "audit the stats / reproducibility of this paper", "is the data deposited", "did they correct for multiple testing", "check the GitHub repo / antibody RRIDs", or any combination of paper-id (PMID/PMCID/DOI) plus a request for structured analysis, mechanism, causation, statistical rigor, reproducibility, or critical reading.
This skill creates publication-quality heatmaps with automatically calculated dimensions for Nature journal specifications. Use when user asks to "create a heatmap", "make a DE gene heatmap", "plot expression heatmap", "heatmap with proper dimensions", "clustered heatmap", "gene expression heatmap", "correlation heatmap", "heatmap for DESeq2 results", "ComplexHeatmap", "pheatmap with correct size", or any mention of heatmap visualization. Automatically calculates optimal height based on number of genes (rows) and determines whether to show row labels. Fixed width of 180mm for publication quality. Supports DESeq2, correlation matrices, and clustering analysis.