skills/tree-formatting/SKILL.md
Phylogenetic tree visualization and formatting with ggtree (R) or iTOL (web). Use when rendering a phylogenetic tree as a figure, choosing tree layout, coloring branches or labels by taxonomy, collapsing clades, displaying support values, or adding overlays to a tree. Do NOT load for tree inference (use protein-phylogeny skill) or domain annotation (future separate skill).
npx skillsauth add musserlab/lab-claude-skills tree-formattingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Conventions for rendering phylogenetic trees using ggtree (R/Bioconductor) or iTOL (Interactive Tree of Life, web-based).
Ask the user which backend to use based on their needs:
| Backend | Best for | Output | Language | |---------|----------|--------|----------| | ggtree | Publication figures, full programmatic control, offline use | PDF/PNG/SVG | R (.qmd script) | | iTOL | Interactive exploration, quick iteration, web sharing, UI tweaking | Web + PDF/SVG/PNG exports | R (.qmd annotations) + Python (.qmd upload) |
| Feature | ggtree | iTOL |
|---------|--------|------|
| Interactive exploration | No | Yes (web UI) |
| Label alignment control | Full (programmatic) | Limited (UI toggle only, not via API) |
| Collapse triangle labels | Manual geom_text() | Built-in LABELS for internal nodes |
| Circular label positioning | Complex (manual angle computation) | Automatic |
| Branch length display | Yes (phylogram/cladogram toggle) | Yes (via UI) |
| Offline/reproducible | Fully offline | Requires iTOL API + internet |
| Two-script workflow | No (single .qmd) | Yes (R .qmd annotations + Python .qmd upload) |
Before rendering, validate the tree file. Tree-formatting scripts should include a validation step early on.
ape::read.tree() (R) or ete3.Tree() (Python)ape::is.rooted())ape::is.binary())| characters (breaks iTOL), spaces, unusual characters#| label: validate-tree
library(ape)
tree_path <- here("data/phylogenetics/tree.treefile")
stopifnot("Tree file not found" = file.exists(tree_path))
tree <- read.tree(tree_path)
cat("Tips:", Ntip(tree), "\n")
cat("Rooted:", is.rooted(tree), "\n")
cat("Binary:", is.binary(tree), "\n")
# Zero-length branches
if (!is.null(tree$edge.length)) {
n_zero <- sum(tree$edge.length == 0)
if (n_zero > 0) cat("WARNING:", n_zero, "zero-length branches\n")
}
# Tip label issues (pipe breaks iTOL)
has_pipe <- grepl("\\|", tree$tip.label)
if (any(has_pipe)) {
cat("WARNING:", sum(has_pipe), "tips contain '|' — will break iTOL annotations\n")
}
#| label: validate-tree
from ete3 import Tree
tree_path = PROJECT_ROOT / "data/phylogenetics/tree.treefile"
assert tree_path.exists(), f"Tree file not found: {tree_path}"
tree = Tree(str(tree_path))
tips = tree.get_leaf_names()
print(f"Tips: {len(tips)}")
# Check for pipe characters
pipe_tips = [t for t in tips if "|" in t]
if pipe_tips:
print(f"WARNING: {len(pipe_tips)} tips contain '|' — must relabel before iTOL")
Help the user select the right visualization. Ask about purpose and tree size, then recommend from the options below.
| Type | Best for | Tips | Key features | |------|----------|------|-------------| | Collapsed rectangular phylogram | Large family trees; showing branch-length variation and gene family structure | 250-2000+ | Collapsed pure clades, branch lengths, selective labels | | Collapsed rectangular cladogram | Large family trees; topology focus, cleaner labels | 250-2000+ | Same as phylogram but no branch lengths, narrower page | | Collapsed circular | Large trees; compact overview showing overall structure | 250-2000+ | Circular layout, collapsed clades, optional selective labels | | Simple rectangular phylogram | Small-medium trees where all tips are readable | < 250 | All tips labeled, no collapsing needed | | Unrooted | Networks, showing relationships without root assumption | Any | No directionality implied |
Gather these decisions before writing any code:
collapse_groups parameter controls which
taxonomic groups are eligible. Common choices:
c("Bilateria") — only collapse bilaterians (keeps sponges/cnidarians expanded)c("Bilateria", "Protostomia", "Deuterostomia") — collapse specific groupsNULL — all groups eligible for collapsing"Bilateria (36 tips: LAMA1, LAMA2, LAMB1)". This ensures key gene family
members remain visible even when the clade is collapsed.ITOL_PROJECT env var
or hardcode in the upload script.All ggtree templates are Quarto .qmd documents following the project's data
science conventions (YAML frontmatter with status field, git hash, BUILD_INFO.txt).
Reference template: ~/.claude/skills/tree-formatting/templates/ggtree/collapsed_rectangular.qmd
This template is a complete, runnable .qmd with all tuned style parameters. Copy it
into the project's scripts/ directory and adapt the sections marked PROJECT-SPECIFIC:
collapse_groups parameter (which taxonomic groups to collapse)The template handles: tree loading, midpoint rooting, pure-clade collapsing by taxonomic group, branch coloring by taxonomy, all visible tips labeled, model species gene names on collapsed triangle labels, formula-based page sizing, and PDF output.
Key features:
INCHES_PER_TIP = 0.12, height = max(8, n_visible * INCHES_PER_TIP)collapse_groups parameter — controls which taxonomic groups are eligible for
collapsing (e.g., c("Bilateria") to only collapse bilaterians, or NULL for all)"Group (N tips: GENE1, GENE2, ...)" so key gene family members remain visiblemax(pre_data$x[tip_ids]) (triangle tip),
not at internal node x (triangle base)Reference template: ~/.claude/skills/tree-formatting/templates/ggtree/collapsed_circular.qmd
Same structure as rectangular — adapt PROJECT-SPECIFIC sections. Produces:
Critical circular gotcha: Labels must be positioned BEFORE collapse() is called.
The template handles this by computing angles from y-position (y / max_y * 360),
flipping text on the left half of the circle, and using geom_text() with explicit
angle/hjust values instead of geom_tiplab2().
For simple rectangular or unrooted trees, no template exists yet. Build from ggtree basics:
# Simple rectangular (all tips labeled)
p <- ggtree(tree) + geom_tiplab(size = 2)
# Unrooted
p <- ggtree(tree, layout = "unrooted")
All style parameters are defined as named constants at the top of each template
(e.g., BRANCH_LINE_WIDTH, LABEL_SIZE, INCHES_PER_TIP). Do not scatter
magic numbers through the code.
iTOL requires separate R and Python steps (do not mix in one .qmd):
Reference template: ~/.claude/skills/tree-formatting/templates/itol/annotations.R
Copy into project and adapt PROJECT-SPECIFIC sections. Generates these files:
GENE.tree — relabeled Newick (short display labels, no | characters)GENE_branch_colors.txt — TREE_COLORS with clade + branch entriesGENE_label_colors.txt — TREE_COLORS label color entriesGENE_collapse.txt — COLLAPSE entries for pure cladesGENE_collapse_labels.txt — LABELS for collapsed clade internal nodesReference template: ~/.claude/skills/tree-formatting/templates/itol/upload_export.py
Uploads two versions:
Exports multiple layout/format combinations (circular PDF/SVG/PNG, rectangular PDF/SVG, unrooted PDF).
ITOL_API_KEY env varITOL_PROJECT env var (default: "misc"). The project must
already exist — the iTOL API cannot create projects, only the web UI can
(My Trees > New Project). Prompt the user to create it if needed.After rendering the upload script, always read the BUILD_INFO.txt and report the iTOL URLs back to the user in chat. These clickable links are essential for quick iteration. Format:
**Uncollapsed:** http://itol.embl.de/external.cgi?tree=TREE_ID&restore_saved=1
**Collapsed:** http://itol.embl.de/external.cgi?tree=TREE_ID&restore_saved=1
Tip label formats vary substantially depending on data source. Do not assume a fixed format. Inspect the actual tip labels first, then write parsing functions tailored to what's present.
| Source | Example | Species part | ID part |
|--------|---------|-------------|---------|
| UniProt (sp) | sp\|O95631\|NET1_HUMAN | Suffix: HUMAN | Gene: NET1 |
| UniProt (tr) | tr\|Q23158\|Q23158_CAEEL | Suffix: CAEEL | Accession: Q23158 |
| Species|taxid.acc | Mus_musculus\|10090.Q9R1A3 | Before \| | After taxid. |
| Species|acc | Nematostella\|XP_032238380.2 | Before \| | After \| |
| BLAST-annotated | Hydra\|8692.t25743aep_EHBP1_HUMAN_... | Before \| | Transcript ID only |
G._species_GENE_OR_ID (e.g., H._sapiens_SPTB1,
E._muelleri_Em0014g869a)| Taxonomic Group | Hex |
|-----------------|-----|
| Demosponges | #2ca02c |
| Calcarea + Homoscleromorpha | #98df8a |
| Ctenophora | #9467bd |
| Cnidaria + Placozoa | #ff7f0e |
| Deuterostomia | #d62728 |
| Protostomia | #1f77b4 |
| Non-metazoan eukaryotes | #555555 |
| Mixed (internal nodes) | #999999 |
Species that are commonly misclassified:
| Species | Correct group | Notes | |---------|---------------|-------| | Thelohanellus_kitauei | Cnidaria + Placozoa | Myxozoan = cnidarian | | Meara_stichopi, Waminoa | Deuterostomia | Xenacoelomorpha | | Spadella_cephaloptera | Protostomia | Chaetognath | | Monosiga, Salpingoeca | Non-metazoan | Choanoflagellates |
These are hard-won lessons — do not skip:
Pre-compute label positions BEFORE collapse() — collapse modifies p$data
coordinates. Extract x/y from p$data first. This applies to BOTH rectangular
and circular layouts.
Match on node column, not row index — p$data rows may not be ordered by
node ID. Always use match(tip_node_ids, pre_data$node).
Collapse label x-position: use max(pre_data$x[tip_ids]), NOT node x —
The internal node sits at the base of the collapsed triangle, but the label
should appear at the triangle tip (where descendant tips extend to). Using the
node x places labels at the triangle base, which looks wrong.
Never cap branch lengths — Branch lengths represent real evolutionary distances. Capping or truncating them is data manipulation. If long branches compress internal structure, offer a cladogram as the honest alternative.
Circular labels: use geom_text() with manual angles, NOT geom_tiplab2() —
compute angles as y / max_y * 360, flip text on left half (angles > 90 & < 270),
and pass angle/hjust outside aes().
show.legend = FALSE on geom_text — prevents "a" character artifacts
appearing in the color legend.
branch.length = "none" for cladogram — cannot pass NULL. Must use
if/else to conditionally include this argument.
coord_cartesian(clip = "off") — required for rectangular labels that extend
beyond the plot area. Pair with wide right margin. Not needed for circular.
Daylight layout — produces unusable output for large trees (branches crossing, triangles overlapping). Avoid it.
Page sizing formula — Use INCHES_PER_TIP = 0.12 with
PAGE_HEIGHT = max(8, n_visible * INCHES_PER_TIP) where n_visible counts
non-collapsed tips plus collapsed triangles. This formula keeps labels readable
without excess whitespace. Hardcoded page sizes invariably need adjustment.
Never collapse by gene family — Unless eggNOG orthogroup data is available to intelligently define ortholog groups, only collapse by taxonomic group. Gene families within a tree are the object of study, not noise to be hidden.
Accession filtering for collapse labels — When building collapse triangle
labels from tip names, use an is_gene_symbol() helper that excludes UniProt
accession patterns (A0A..., P12345, Q-prefixed, etc.). Only sp| Swiss-Prot
entries produce real gene symbols; tr| TrEMBL entries produce accessions that
are not informative as labels. Filter these out so collapsed triangles show
gene names, not accession numbers.
Hard-won lessons from iTOL annotation file development:
Tip labels must NOT contain | characters — iTOL uses | as the MRCA
separator in TREE_COLORS clade entries (tipA|tipB clade ...), COLLAPSE entries,
and LABELS internal node entries. If tip labels contain |, all clade/collapse
specifications silently break (wrong MRCA selected, or entries ignored entirely).
Solution: relabel tips to short display names before writing the Newick tree.
ape::write.tree() converts spaces to underscores — display labels must use
underscores from the start (H._sapiens_SPTN2 not H. sapiens SPTN2), or
annotation file IDs will not match the tree.
itol.toolkit R package is incompatible with | in tip labels — the toolkit
also uses | internally and cannot escape it. Write annotation files manually
(plain text with TAB separator) instead of using itol.toolkit.
MRCA pair selection: to specify an internal node, provide one tip from each
child subtree (tipA|tipB). Using tips[1] and tips[N] (first/last by array
index) can give two tips from the same child, which specifies a different MRCA.
Label alignment is NOT controllable via batch export API — the "Align tip
labels" toggle is UI-only. The label_display export parameter controls
visibility (0=hide, 1=show) but not alignment. Users must toggle alignment
manually in the iTOL web interface.
Collapsed triangle labels — use LABELS annotation type with MRCA specification
(tipA|tipB\tLabel text). These render as the displayed name on collapsed
triangles.
Two uploads for collapsed vs uncollapsed — upload annotation files are baked into the tree on upload. To have both an uncollapsed and collapsed version, upload twice: once without collapse files, once with all files.
development
Configure and manage Claude Code security protections for sensitive files, credentials, and data. Use when the user invokes /security-setup to set up or modify protections against unauthorized file access, credential exposure, or sensitive data leaks.
development
Script organization for data science analysis projects with numbered scripts, data/outs/ directories, and reproducibility conventions. Use when creating new analysis scripts in projects that follow data science conventions (numbered XX_ prefix scripts, outs/ directories, BUILD_INFO.txt). Do NOT load for documentation projects (Quarto books), infrastructure repos, or projects without data/outs/ directory structure.
testing
R renv package management for data science projects. Use when working with renv (renv.lock, renv::restore, renv::snapshot) in R analysis projects. Do NOT load for projects that do not use R or renv.
development
R ggplot2 plotting conventions and theme. Use when creating, modifying, or styling ggplot2 plots in R, or when adjusting plot themes, colors, labels, or formatting.