claude/skills/igv-reports/SKILL.md
Build self-contained, offline HTML genomic-region reports with igv-reports (create_report). Each HTML bundles igv.js viewers per region with embedded BAM/VCF data slices and default tracks (CpG islands, gencode, RepeatMasker); a reviewer clicks the variant table to inspect read-level evidence with no internet, no server, no IGV install. USE this skill whenever the user wants an HTML, clickable, or browseable viewer of genomic data — phrases like "HTML IGV report", "offline IGV", "self-contained HTML", "clickable viewer", "create_report", "igv-reports", "email this viewer", or any browseable HTML of reads at variants, fusion breakpoints, SV junctions, viral integrations, ChIP peaks, or ROIs. Trigger even when the user doesn't say "igv-reports" — giveaway is HTML/clickable/offline plus genomic regions. Also fire on /igv-reports. DO NOT use for static PNG/PDF/SVG IGV screenshots — use the igv-screenshots skill. Supports hg38, mm10, mm39, T2T. Defaults: --flanking 300, --standalone, genome-tagged output.
npx skillsauth add sahuno/llm_configs igv-reportsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill builds self-contained HTML genomic-region reports with
igv-reports (create_report).
Each report is a single browseable HTML containing the igv.js viewer plus
embedded data slices for every region. No server, no internet, no IGV
install needed at view time.
The skill has three entry points:
| User request | Entry point | |---|---| | "Make an HTML for these 5 SV breakpoints in tumor.bam" | build | | "Give me one HTML per patient for the cohort integration calls" | cohort | | "create_report fails with 'not BGZF' on this gencode" | prep-track |
--flanking 300 bp on either side of each site (good for SV breakpoints
and point variants alike). Override per call if needed.--standalone so the HTML is offline-viewable.cohort.hg38.html —
to pass enforce-genome-tag.sh.databases_config.yaml:
/data1/greenbab/users/ahunos/apps/llm_configs/claude/profiles/databases/databases_config.yaml.
Supported genome IDs: hg38, mm10, mm39, t2t_CHM13v2_plusY, GRCh37.references/databases_config_paths.md — read it before assembling tracks
so the skill doesn't try to load a track that doesn't exist for the
selected genome (e.g., mm39 has no rmsk in our database).igv-reports' BED parser reads fields by position and trips on a header
row (ValueError: invalid literal for int() with base 10: 'start'). Always
emit a plain headerless 4-column BED:
chr start end name
chr2 25227855 25342590 DNMT3A_full_gene
Tab-separated. The name becomes the row label in the report's variant
table — make it specific enough to identify the site after deduping.
The project's enforce-genome-tag.sh hook requires a genome tag in the BED
filename: use sites.hg38.bed, not sites.bed.
| Symptom | Root cause | Fix |
|---|---|---|
| ValueError: invalid literal for int() on first row | Header row in sites BED | Strip header — plain BED |
| UnicodeDecodeError: byte 0x8b reading a track | igv-reports reading bgzip as text | Filename must end .gff3.gz / .bed.gz AND be true bgzip (check with file <name> for "extra field") |
| tabix: not BGZF | Track was plain-gzipped, not bgzipped | Run prep-track entry point |
| tabix: out of order while indexing | GFF/GTF/BED records not pos-sorted within chr | prep-track does sort -k1,1 -k4,4n before bgzip |
| Annotation track empty in viewer | Tabix returns no rows in displayed window — often correct biology (e.g., CGI-distal site). Confirm with tabix file region |
| Genome ID lookup fails with --genome hg38 | igv.js bundled IDs require internet at view + render time. Use --fasta /path/to/local.fa instead (always works offline) |
Full pitfalls + create_report flag reference in references/best_practices.md.
Activate the snakemake conda env first; create_report lives there:
source /home/ahunos/miniforge3/etc/profile.d/conda.sh
conda activate snakemake
Then call the bundled driver script:
python /data1/greenbab/users/ahunos/apps/llm_configs/claude/skills/igv-reports/scripts/build_igvreports.py \
--sites results/run/inputs/sites.hg38.bed \
--bam tumor.bam normal.bam \
--vcf calls.vcf \
--genome hg38 \
--output results/run/reports/cohort.hg38.html
The driver:
databases_config.yaml
(skipping any that don't exist for the chosen genome).start < end.create_report with --flanking 300 --standalone.For multi-sample cohorts, use --samplesheet samplesheet.tsv instead of
--bam/--vcf. Samplesheet format: sample, bam_tumor, bam_normal, vcf, sites_bed.
The driver emits one HTML per sample plus a top-level index.html that lists
all samples with links. Layout matches the ATLL viral-integration reference
implementation:
results/<run>/
├── inputs/<sample>/sites.<genome>.bed
├── reports/<sample>.<genome>.html
├── reports/index.html
└── logs/run_<timestamp>.log
If a GFF3/GTF/BED.gz is plain-gzip rather than bgzip, igv-reports fails silently or with an obscure error. Convert in place with backup:
bash /data1/greenbab/users/ahunos/apps/llm_configs/claude/skills/igv-reports/scripts/prep_track.sh \
/path/to/track.gff3.gz
The script:
<name>.bak.original_gzip.gunzip -cs the file.chr then numeric pos (sort -k1,1 -k4,4n).
(Gencode delivers records interleaved by feature type at the same locus —
tabix requires pos-sorted.)bgzips in place.tabix -p <gff|gtf|bed>s.Run from the snakemake conda env (bgzip/tabix from htslib).
The driver script (build_igvreports.py) deliberately abstracts the
underlying create_report flags — it sets --standalone, --fasta, the
--flanking 300 default, and the YAML-resolved annotation tracks
internally so the user doesn't have to remember them. That abstraction is
good for ergonomics but bad for auditability: a reviewer reading the
answer.md later can't see what flags are actually being invoked without
opening the driver source.
To keep both: when you produce a runnable command for the user, also include a code block titled "Equivalent direct create_report invocation" that shows the fully-expanded command with all flags and resolved track paths inline. The user should see the wrapper command they're going to run AND the underlying command it expands to. Example:
## Run
```bash
python build_igvreports.py --genome mm10 --sites peaks.mm10.bed \\
--bam ./data/ip.bam ./data/input.bam \\
--output reports/peaks_qc.mm10.html
```
### Equivalent direct create_report invocation
```bash
create_report peaks.mm10.bed \\
--fasta /data1/greenbab/database/mm10/mm10.fa \\
--flanking 300 --standalone \\
--tracks ./data/ip.bam ./data/input.bam \\
/data1/greenbab/database/mm10/mm10_CpGIslands.bed \\
/data1/greenbab/database/mm10/annotations/gencode.vM25.annotation.gtf.gz \\
/data1/greenbab/database/RepeatMaskerDB/.../rmsk_all_repeats_mm10.bed.gz \\
--title "ChIP-seq peak QC (mm10) — IP vs Input" \\
--output reports/peaks_qc.mm10.html
```
This costs you ~10 lines and gives the reviewer a full audit trail. For cohort runs, show the expanded form for ONE representative sample only — the others differ only in BAM/VCF paths.
Every run logs to logs/run_<YYYYMMDD_HHMMSS>.log next to the reports dir.
The log captures:
create_report command.This satisfies CLAUDE.md §"Logging and Audit Trail" — every run is reproducible from the log alone.
For gencode on hg38, the default points at
gencode.v47.annotation.gff3.gz (full annotation, bgzip + tabix). This
gives transcript models with exons / CDS / UTRs. The gene-level-only
companion (gencode.v47.genes.annotation.sorted.gff3.gz) renders only
solid gene boxes and is fine for high-zoom views, but the full annotation
is the right default for read-level inspection at integration / fusion /
SV junctions.
For mouse genomes, databases_config.yaml ships .gtf.gz paths instead.
GTFs work in igv-reports if bgzip + tabix-indexed; prep-track converts
plain-gzip GTFs the same way it does GFF3s.
For T2T-CHM13, only the FASTA + GTF + CGI are indexed in our DB; rmsk is absent and is auto-skipped by the driver. The variant table will load without rmsk; flag this in the run log.
The examples/ directory has runnable templates:
single_sample.sh — one BAM + one VCF + a sites BED → one HTML.cohort_samplesheet.sh — TSV-driven multi-sample run.prep_track_demo.sh — convert a plain-gzip gencode to bgzip+tabix.These are reference implementations; copy and edit them for new runs rather than starting from scratch.
references/best_practices.md — full create_report flag reference,
format gotchas, performance notes. Read this if a run fails in a way
not listed in the Pitfalls table above.references/databases_config_paths.md — per-genome track availability
matrix and exact YAML keys. Read this when adding a new genome or
diagnosing a missing-track warning.scripts/build_igvreports.py — the driver. Reads --samplesheet or
--bam/--vcf direct-args, resolves tracks, validates the sites BED,
writes the HTMLs and the run log.scripts/prep_track.sh — gunzip → sort → bgzip → tabix utility.igv-screenshots skill — the static PNG/PDF/SVG counterpart based
on igver. Use it instead of this one when the deliverable is a
publication-quality figure rather than a clickable viewer./data1/greenbab/projects/ont/Project_17424/results/20260503_hg38plusHTLV1EBV_cohort_integration_igvreports/
— 6-patient ATLL cohort viral-integration HTMLs + DNMT3A sanity check;
this skill was extracted from that work.development
Decide whether and how to scatter genomics workloads across chromosomes or region tiles, then gather the per-shard outputs back together correctly. Use proactively whenever the user mentions parallelizing per-chromosome, sharding by chrom, tiling the genome, splitting a BAM/VCF/BED by region, merging per-chrom outputs, or has a workflow with obvious per-chromosome parallelism (variant calling, methylation pileup/DMR, coverage, liftover, peak calling, SV calling). Also triggers on /scatter-gather, "scatter X across chromosomes", "shard this", "chunked variant calling", "merge per-chrom VCFs", "gather these bedmethyl files", "concat these bigwigs", or any per-region parallelism question. **Trigger even when the user is also using Snakemake or Nextflow** — those skills handle DAG plumbing while this one defines *what* to scatter, *whether* it's even safe to scatter (some computations like DSS DMLtest pool globally and break under naive sharding), and *how* to gather each output format without silent corruption. Especially trigger on questions about merging per-chromosome BAM / VCF / BED / bedMethyl / bigwig outputs, or whether a scatter-gather is equivalent to running on the whole genome.
development
Verify that structural-variant / breakpoint calls are actually real by checking the chimeric reads that support them. Use whenever the user has caller output (Severus, Manta, Sniffles2, Delly, GRIDSS, MELT, Arriba, SvABA) and wants to validate / audit / QC / double-check their calls — viral integrations (HTLV-1, HBV, HPV, EBV), gene fusions (BCR-ABL, IGH translocations), mobile element insertions (L1, Alu, SVA), translocations. Trigger on phrasings like "is this integration real?", "should I trust this fusion call?", "are these false positives?", "are these PASS calls actually supported by reads?", "QC my SV calls", or any per-call chimeric-read / contamination / bimodality / T-vs-N read overlap question. Also fires on BAM @PG -Y / SA-tag questions on chimeric BAMs, and on /chimeric-read-validation. Output is a per-call TSV with pass / needs_review / fail verdicts. Do not use for calling SVs (use the caller), IGV screenshots (use igv-reports), or RNA-level fusion FDR (use Arriba).
tools
Run a stage-gated runtime/resource optimization study for any bioinformatics tool or command-line program on a SLURM HPC cluster. Walks through preflight, OFAT factor scan, 2^k confirmation factorial, build-mode + alternative-implementation comparison, input-size scan, out-of-sample validation, and produces a fitted predictive resource model (wall_s and peak_rss as functions of input size), a machine-readable model.yaml with caveats, a full REPORT.md, and a one-page exec summary PDF. Trigger PROACTIVELY whenever the user asks to "benchmark", "optimize", "tune", "characterize runtime/memory", "find best config", "build a resource model", "how does X scale", or "what should I put in my Snakemake resources directive for tool Y" — for any compute-bound bioinformatics step (sort, dedup, alignment, variant calling, methylation calling, basecalling, indexing, pileup, liftover). Also triggers on /runtime-resource-study or /benchmark-tool. Skip only for one-off quick timing where a single number suffices and no model is needed.
tools
End-to-end builder for new nf-core modules. Scaffolds all required files, runs lint and nf-test in a loop until both pass, and produces PR-ready artifacts (description, Slack draft, checklist). Use this skill proactively whenever the user wants to: create a new nf-core module, add a tool to nf-core/modules, write a DORADO_BASECALLER or MODKIT_LOCALIZE style process, wrap a bioinformatics tool in Nextflow for nf-core, or asks "how do I submit a module to nf-core". Also trigger for: adding GPU support to a module, wrapping an R or Python script as an nf-core process, handling licensed/ non-bioconda tools in nf-core, fixing nf-core lint failures on a new module. Do NOT trigger for: editing existing pipelines, writing Snakemake rules, or debugging non-module Nextflow code.