claude/skills/cohort-overview/SKILL.md
Build a per-patient × per-assay sample-availability heatmap (SPECTRUM-style cohort overview) from a wide TSV. Patient-level annotations (signature, BRCA1/2, CCNE1) on top, site strip on left, three-state cells (Yes/No/NA) with customizable colours and optional NA→No collapse. Use proactively when the user asks for a cohort overview, sample availability heatmap, assay coverage plot, data availability figure, or SPECTRUM-style heatmap; has a TSV with one row per sample and wants to visualize which assays (DLP, ONT, WGS, mpIF, scRNA/scATAC-seq, etc.) were run per patient; or types `/cohort-overview`. Keywords: "cohort overview", "sample availability", "assay matrix", "per-patient heatmap", "SPECTRUM heatmap", "coverage overview", "what samples do we have". Uses ComplexHeatmap: row-split by assay, per-patient sample slots, patient annotations on top, site strip on left.
npx skillsauth add sahuno/llm_configs cohort-overviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Recreates a SPECTRUM-style cohort data-availability heatmap showing, for each patient, which assays were run on which samples, alongside patient-level genomic annotations and per-sample site labels.
What you'll produce: a results/{png,pdf,svg}/ directory with a single
heatmap. Rows are grouped by assay; within each assay block, samples are
stacked by a consistent per-patient slot index (so sample S02 of patient 017
always sits in the same row across every assay block). Columns are patients,
ordered by the user's first patient-level annotation (typically a mutational
signature).
Trigger whenever a user with a per-sample table wants a "what have we profiled" overview. Common phrasings: "make a cohort overview", "plot sample availability across assays", "which patients have DLP + WGS + scRNA-seq", "recreate the SPECTRUM data heatmap".
Skip this skill for single-assay plots, per-gene heatmaps, or anything that isn't a patient × assay availability matrix.
This skill ships two scripts. Do not rewrite them from scratch — copy them
into the user's project (usually scripts/) and adapt only if the user asks:
scripts/01_generate_mock_cohort.R — generates data/cohort_wide.tsv
with 85 patients / ~290 samples / 7 assays. Useful when the user wants a
demo run, or to sanity-check their own TSV against the expected schema.
scripts/02_plot_cohort_heatmap.R — the main plotter. Uses optparse,
ComplexHeatmap, and writes PNG/PDF/SVG. Fully configurable via CLI flags.
Both live at <skill-dir>/scripts/ (the skill directory Claude Code loaded
this SKILL.md from). Copy them with cp; don't regenerate them.
| Column group | Columns | Notes |
|--------------------|----------------------------------------------------------------------|--------------------------------------------------------------|
| sample keys | patient_id, sample_id | sample_id unique per row; patient_id repeats |
| sample attribute | site | Adnexa, Omentum, Blood, Ascites, … |
| patient-level | signature, BRCA1, BRCA2, CCNE1 | must be constant within a patient_id — script validates |
| assay availability | DLP, ONT, WGS, mpIF, scATAC-seq, scRNA-seq, scRNAseqVDJ | cell values: Yes | No | NA |
NA means "not attempted / unknown" — semantically distinct from No
("attempted, failed or negative"). The plot encodes them as different
colours by default.
A 3-line example lives at examples/cohort_wide.tsv. Read it only if the
user is debugging their own file format.
--patient-cols column (default: signature).--assays order.site, then sample_id). Block height = the largest
number of samples any patient has.Yes → black, No → grey, NA → white
(all three overridable via CLI). The default palette is chosen so that
ink = attempt: dark cells = success, grey cells = attempted-but-failed,
blank cells = never attempted. This lets you see coverage gaps at a glance.Yes/No/NA, ask what they mean before proceeding.scripts/01_generate_mock_cohort.R from
their project directory to produce data/cohort_wide.tsv.patient_id, the plotter will
error. Warn the user and ask whether to take the first value, the mode,
or fix upstream.mkdir -p <project>/scripts <project>/data <project>/results
cp <skill-dir>/scripts/01_generate_mock_cohort.R <project>/scripts/
cp <skill-dir>/scripts/02_plot_cohort_heatmap.R <project>/scripts/
Rscript scripts/02_plot_cohort_heatmap.R --input data/cohort_wide.tsv
Rscript scripts/02_plot_cohort_heatmap.R --help # full option list
Figures land under results/{png,pdf,svg}/. Default stem is
cohort_overview_heatmap. Open the PNG and verify: patients ordered by
signature, all listed assays present as row blocks, cell colours match
--yes-color / --no-color / --na-color.
A reference PNG from the mock data is at examples/cohort_overview_heatmap_reference.png
if you want to show the user what "correct" looks like.
02_plot_cohort_heatmap.R)| Flag | Default | Purpose |
|--------------------------------------|----------------------------------------------|-----------------------------------------------------------|
| -i, --input | required | path to wide TSV |
| -o, --outdir | results | output root (creates pdf/, png/, svg/) |
| -n, --name | cohort_overview_heatmap | file stem |
| -W, --width | 14 | figure width (inches) |
| -H, --height | 8 | figure height (inches) |
| --assays | 7-assay default | comma-separated assay columns, in plot order |
| --patient-cols | signature,BRCA1,BRCA2,CCNE1 | patient-level annotation columns; 1st orders columns |
| --sig-order | FBI,HRD,HRD-Del,HRD-Dup,TD,Undetermined,NA | factor order for the 1st patient-level col |
| --patient-id-col | patient_id | patient-id column name |
| --sample-id-col | sample_id | sample-id column name |
| --site-col | site | site column name |
| --yes-color | #000000 | cell colour for Yes (attempted, succeeded) |
| --no-color | #BDBDBD | cell colour for No (attempted, failed/negative) |
| --na-color | #FFFFFF | cell colour for NA (not attempted / unknown) |
| --na-as-no | off | render NA as No (collapses legend to Yes/No) |
| --no-png / --no-pdf / --no-svg | off | skip a format |
--assays "DLP,WGS,ONT" (order matters — top to bottom).--patient-cols "signature,TP53,RB1" —
the first column still determines patient ordering. Unknown columns get an
auto palette (uses RColorBrewer if available).--na-as-no collapses NA into No and shortens the
legend to Yes/No.--width 20 --height 10 for 150+ patients.optparse, dplyr, tidyr, readr, ComplexHeatmap, circlizesvglite (true SVG), showtext + sysfonts (Arial),
RColorBrewer (nicer palettes for custom --patient-cols)If packages are missing, install with:
install.packages(c("optparse","dplyr","tidyr","readr","circlize","svglite","showtext","sysfonts","RColorBrewer"))
# ComplexHeatmap via Bioconductor:
if (!require("BiocManager")) install.packages("BiocManager")
BiocManager::install("ComplexHeatmap")
<project>/
├── data/cohort_wide.tsv
├── scripts/
│ ├── 01_generate_mock_cohort.R
│ └── 02_plot_cohort_heatmap.R
└── results/
├── png/cohort_overview_heatmap.png
├── pdf/cohort_overview_heatmap.pdf
└── svg/cohort_overview_heatmap.svg
sans. Install showtext + sysfonts for true Arial.svglite. The base svg() device needs X11
which is often unavailable on macOS; svglite is self-contained.patient_id — the plotter
errors with the offending patient_id. Fix upstream or collapse to the mode.NA and
--na-as-no is off. Either drop the assay from --assays or pass --na-as-no.development
Decide whether and how to scatter genomics workloads across chromosomes or region tiles, then gather the per-shard outputs back together correctly. Use proactively whenever the user mentions parallelizing per-chromosome, sharding by chrom, tiling the genome, splitting a BAM/VCF/BED by region, merging per-chrom outputs, or has a workflow with obvious per-chromosome parallelism (variant calling, methylation pileup/DMR, coverage, liftover, peak calling, SV calling). Also triggers on /scatter-gather, "scatter X across chromosomes", "shard this", "chunked variant calling", "merge per-chrom VCFs", "gather these bedmethyl files", "concat these bigwigs", or any per-region parallelism question. **Trigger even when the user is also using Snakemake or Nextflow** — those skills handle DAG plumbing while this one defines *what* to scatter, *whether* it's even safe to scatter (some computations like DSS DMLtest pool globally and break under naive sharding), and *how* to gather each output format without silent corruption. Especially trigger on questions about merging per-chromosome BAM / VCF / BED / bedMethyl / bigwig outputs, or whether a scatter-gather is equivalent to running on the whole genome.
tools
Build self-contained, offline HTML genomic-region reports with igv-reports (create_report). Each HTML bundles igv.js viewers per region with embedded BAM/VCF data slices and default tracks (CpG islands, gencode, RepeatMasker); a reviewer clicks the variant table to inspect read-level evidence with no internet, no server, no IGV install. USE this skill whenever the user wants an HTML, clickable, or browseable viewer of genomic data — phrases like "HTML IGV report", "offline IGV", "self-contained HTML", "clickable viewer", "create_report", "igv-reports", "email this viewer", or any browseable HTML of reads at variants, fusion breakpoints, SV junctions, viral integrations, ChIP peaks, or ROIs. Trigger even when the user doesn't say "igv-reports" — giveaway is HTML/clickable/offline plus genomic regions. Also fire on /igv-reports. DO NOT use for static PNG/PDF/SVG IGV screenshots — use the igv-screenshots skill. Supports hg38, mm10, mm39, T2T. Defaults: --flanking 300, --standalone, genome-tagged output.
development
Verify that structural-variant / breakpoint calls are actually real by checking the chimeric reads that support them. Use whenever the user has caller output (Severus, Manta, Sniffles2, Delly, GRIDSS, MELT, Arriba, SvABA) and wants to validate / audit / QC / double-check their calls — viral integrations (HTLV-1, HBV, HPV, EBV), gene fusions (BCR-ABL, IGH translocations), mobile element insertions (L1, Alu, SVA), translocations. Trigger on phrasings like "is this integration real?", "should I trust this fusion call?", "are these false positives?", "are these PASS calls actually supported by reads?", "QC my SV calls", or any per-call chimeric-read / contamination / bimodality / T-vs-N read overlap question. Also fires on BAM @PG -Y / SA-tag questions on chimeric BAMs, and on /chimeric-read-validation. Output is a per-call TSV with pass / needs_review / fail verdicts. Do not use for calling SVs (use the caller), IGV screenshots (use igv-reports), or RNA-level fusion FDR (use Arriba).
tools
Run a stage-gated runtime/resource optimization study for any bioinformatics tool or command-line program on a SLURM HPC cluster. Walks through preflight, OFAT factor scan, 2^k confirmation factorial, build-mode + alternative-implementation comparison, input-size scan, out-of-sample validation, and produces a fitted predictive resource model (wall_s and peak_rss as functions of input size), a machine-readable model.yaml with caveats, a full REPORT.md, and a one-page exec summary PDF. Trigger PROACTIVELY whenever the user asks to "benchmark", "optimize", "tune", "characterize runtime/memory", "find best config", "build a resource model", "how does X scale", or "what should I put in my Snakemake resources directive for tool Y" — for any compute-bound bioinformatics step (sort, dedup, alignment, variant calling, methylation calling, basecalling, indexing, pileup, liftover). Also triggers on /runtime-resource-study or /benchmark-tool. Skip only for one-off quick timing where a single number suffices and no model is needed.