skills/expression-report/SKILL.md
Generate single-cell gene expression report scripts (.qmd) with barplots, heatmaps, and cross-analysis. Use when creating expression reports for gene sets across cell types, visualizing gene expression patterns in single-cell data, or when the user says "expression report", "gene expression barplots", "expression heatmap", or wants to visualize how a gene list is expressed across cell types. Covers both categorical gene groupings (pathway components, functional categories) and data-driven groupings (taxonomy, coexpression modules). Currently Python/scanpy/matplotlib only. Do NOT load for differential expression testing, marker gene discovery, or clustering — those are upstream analyses that produce gene lists this skill consumes.
npx skillsauth add musserlab/lab-claude-skills expression-reportInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate reproducible .qmd scripts for single-cell gene expression reports. The skill
discusses setup with the user, then writes a complete Python/scanpy/matplotlib .qmd
template encoding all decisions as configuration variables.
Reference implementations:
exploration/scripts/scmicrobiome/spongilla/02_nonmetazoan_expression.qmdTiHKAL/scripts/signaling_pathway_reports/generate_pathway_reports.qmdBundled resources:
templates/report_template.py — script body chunks (config, normalization, gene matching,
expression computation, combined heatmap, report loop, cross-analysis, archive)templates/helpers.py — reusable plotting functions (two-panel heatmap, barplot Style A)references/species_notes.md — species-specific cell type configurations and quirks.qmd script following quarto-docs and
script-organization conventionsquarto render, producing all outputs in
outs/<subdirectory>/XX_script_name/Resolve these questions before generating the script. Present recommendations where possible; don't just list options.
| Input | What to ask | Notes |
|-------|-------------|-------|
| Single-cell object | File path (.h5ad) | Check it exists; report shape |
| Gene list | File path (TSV with gene_id + grouping columns) | Read and show column names, row count |
| Primary group column | Which column defines the main grouping | e.g., "kingdom", "pathway", "module" |
Validation steps:
n_obs x n_vars, check if counts are raw (max > 20) or
log-normalized (max < 20)Default: use the full gene name from the single-cell object's var_names. Do not parse or abbreviate — show what's in the object. This ensures labels match the authoritative gene naming in the dataset.
h5ad_name column (populated during gene ID matching) contains the full
var_name from the object. Use this directly as the plot label.#2, #3 for duplicate display names if needed.Check references/species_notes.md for known species-specific quirks before proposing
defaults. If the species has an entry, pre-populate family assignments, ordering, and
colors from it.
Default: named cell types only. Numbered/transitional clusters are excluded unless the user specifically requests them.
Read cell type info directly from the h5ad .obs:
.obs for likely cell type columns (look for "cell_type",
"cluster", "annotation", "celltype" in column names). Show the user what you find.Ordering is family-grouped, user-confirmed. The skill proposes an order, the user adjusts if needed. The final order must be explicitly confirmed by the user. Store it as a list in the configuration chunk so it can be edited later.
All barplots use linear scale (mean CPT). Present the two styles:
Style A: Cell type family coloring (recommended for exploratory / large gene sets)
Style B: Gene category coloring (recommended for curated functional categories)
facet_wrap style with 4 columnsRecommendation logic: If the gene list has a functional category column with <=8 categories, recommend Style B. Otherwise recommend Style A.
Ask if other datasets should be intersected with the gene list to add markers (e.g.,
* for phosphoproteomics hits). This is optional and project-specific.
Once all decisions are resolved, generate a .qmd script.
Follow the script-organization skill:
scripts/<subdirectory>/XX_name.qmd (next available number — always ls first)outs/<subdirectory>/XX_name/Generate a Python .qmd with these sections in order. Read templates/report_template.py
and templates/helpers.py for the code patterns to insert into each chunk:
templates/helpers.py)NATURE_PALETTE = [
"#BC3C29", # Brick red
"#0072B5", # Steel blue
"#20854E", # Forest green
"#E18727", # Amber
"#7876B1", # Muted purple
"#6F99AD", # Slate blue
"#EE4C97", # Rose
"#868686", # Gray
]
White -> steel blue -> navy (#FFFFFF -> #3182BD -> #08306B)
Diverging RdBu_r with TwoSlopeNorm centered at 0.
These were learned through iterative development. Do not repeat them.
expm1() without checking — if X.max() > 20, it's raw countsouts/<subdirectory>/XX_name/
all_genes_heatmap.pdf/.png # Combined heatmap if <=100 genes
expression_summary.pdf/.png # Genes per group, fraction expressed
group_celltype_heatmap.pdf/.png # Group x cell type (linear + z-score)
gene_summary.tsv # All genes with expression info
BUILD_INFO.txt # Provenance
_archive/ # Previous runs
group_slug_1/ # Per-group subdirectory
gene_list.tsv
heatmap.pdf/.png
barplots.pdf/.png
group_slug_2/
...
When <=100 expressed genes, the combined heatmap uses:
Not yet implemented. When added, the R template will use:
DotPlot for dotplotspheatmap with category row annotations for heatmapsggplot2 + facet_wrap for Style B barplotsdevelopment
Phylogenetic tree visualization and formatting with ggtree (R) or iTOL (web). Use when rendering a phylogenetic tree as a figure, choosing tree layout, coloring branches or labels by taxonomy, collapsing clades, displaying support values, or adding overlays to a tree. Do NOT load for tree inference (use protein-phylogeny skill) or domain annotation (future separate skill).
development
Configure and manage Claude Code security protections for sensitive files, credentials, and data. Use when the user invokes /security-setup to set up or modify protections against unauthorized file access, credential exposure, or sensitive data leaks.
development
Script organization for data science analysis projects with numbered scripts, data/outs/ directories, and reproducibility conventions. Use when creating new analysis scripts in projects that follow data science conventions (numbered XX_ prefix scripts, outs/ directories, BUILD_INFO.txt). Do NOT load for documentation projects (Quarto books), infrastructure repos, or projects without data/outs/ directory structure.
testing
R renv package management for data science projects. Use when working with renv (renv.lock, renv::restore, renv::snapshot) in R analysis projects. Do NOT load for projects that do not use R or renv.