skills/quarto-docs/SKILL.md
Quarto document conventions for data science analysis scripts (.qmd). Use when creating or rendering .qmd analysis scripts in data science projects with numbered scripts, status fields, git hash capture, and BUILD_INFO.txt. Do NOT load for Quarto books, websites, or documentation projects — those use standard Quarto conventions without numbered script prefixes or BUILD_INFO.txt.
npx skillsauth add musserlab/lab-claude-skills quarto-docsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
.qmd is the default analysis script format on local (macOS) data science projects, for both R and Python. On the cluster (Bouchet), .py is the default instead — see the script-organization skill's "Script Format by Environment" section for the full rule and the local override marker. Regardless of environment, use .py (or .R) files for standalone utilities, CLI tools, and library code (in python//R/), not for numbered analysis scripts.
Always use quarto render, never use rmarkdown::render() for .qmd files.
# CORRECT: Use quarto CLI
quarto render path/to/script.qmd
# WRONG: Do NOT use rmarkdown
Rscript -e "rmarkdown::render('script.qmd')" # Will fail with pandoc error
If quarto is not in PATH, try /usr/local/bin/quarto or check your conda environment.
CRITICAL: For Python .qmd files, the project's conda environment must be active before rendering. Otherwise Quarto will use the wrong Python or fail to find packages.
# R QMD — no activation needed (renv handles it)
quarto render scripts/01_analysis.qmd --output-dir outs/01_analysis/
# Python QMD — MUST activate conda first
source ~/miniconda3/etc/profile.d/conda.sh && conda activate PROJECT_ENV && \
quarto render scripts/02_plots.qmd --output-dir outs/02_plots/
# Render to default format, output to outs/
quarto render scripts/XX_name.qmd --output-dir outs/XX_name/
# Render to specific format
quarto render scripts/XX_name.qmd --to html --output-dir outs/XX_name/
quarto render scripts/XX_name.qmd --to pdf --output-dir outs/XX_name/
# Render with execution
quarto render scripts/XX_name.qmd --execute --output-dir outs/XX_name/
CRITICAL: Always use --output-dir to render HTML directly into the script's outs/ folder. Never leave rendered HTML next to the .qmd source file.
# CORRECT: Render directly to outs/ folder
quarto render scripts/01_analysis.qmd --output-dir outs/01_analysis/
# CORRECT: With R 4.5 override
QUARTO_R=/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/bin/R \
quarto render scripts/01_analysis.qmd --output-dir outs/01_analysis/
# WRONG: Do NOT render in place (pollutes scripts/ with HTML)
quarto render scripts/01_analysis.qmd
The --output-dir path is relative to the project root (where you run the command from).
| Format | Best for | Notes | |--------|----------|-------| | HTML | GitHub, web sharing | Reliable text wrapping, self-contained | | PDF | Print, formal docs | Requires LaTeX workarounds for line wrapping |
Recommendation: Use HTML for GitHub/web. Use PDF only when print is required.
When you just need outputs, not the rendered document:
R:
Rscript directlyquarto render script.qmd --executePython:
python directly (with conda env active)quarto render script.qmd --execute (with conda env active)Templates include a status lifecycle field, git hash capture, and BUILD_INFO.txt provenance (see the script-organization skill for the conventions behind these).
The YAML header is identical for R and Python, except Python adds jupyter: python3:
R:
---
title: "Script Title"
subtitle: "Brief description"
author: "Your Name"
date: today
status: development # development | finalized | deprecated
format:
html:
toc: true
toc-depth: 2
number-sections: true
code-overflow: wrap
code-fold: false
code-tools: true
highlight-style: github
theme: cosmo
fontsize: 1rem
linestretch: 1.5
self-contained: true
execute:
echo: true
message: false
warning: false
cache: false
---
Python — same, but add jupyter: python3 and drop message: false (not applicable):
---
title: "Script Title"
subtitle: "Brief description"
author: "Your Name"
date: today
status: development # development | finalized | deprecated
jupyter: python3
format:
html:
toc: true
toc-depth: 2
number-sections: true
code-overflow: wrap
code-fold: false
code-tools: true
highlight-style: github
theme: cosmo
fontsize: 1rem
linestretch: 1.5
self-contained: true
execute:
echo: true
warning: false
cache: false
---
When Claude generates a QMD script, include an attribution callout immediately after the YAML header (before any content). This documents that the code was AI-generated and records the model version for reproducibility.
::: {.callout-note title="Code generation"}
This script was generated by **Claude (Opus 4.6)** and reviewed by [author name].
:::
Rules:
Claude (Sonnet 4.6), Claude (Haiku 4.5))author from the YAML headerSetup chunk:
```{r setup}
suppressPackageStartupMessages({
library(tidyverse)
library(here)
# ... other packages
})
source(here("R/helpers.R")) # if needed
# ---- Options ----
options(stringsAsFactors = FALSE)
set.seed(42)
git_hash <- system("git rev-parse --short HEAD", intern = TRUE)
cat("Rendered from commit:", git_hash, "\n")
# ---- Archive previous outputs ----
out_dir <- here("outs/XX_script_name")
dir.create(out_dir, showWarnings = FALSE, recursive = TRUE)
existing_files <- list.files(out_dir, full.names = TRUE)
existing_files <- existing_files[!file.info(existing_files)$isdir]
if (length(existing_files) > 0) {
build_info <- file.path(out_dir, "BUILD_INFO.txt")
if (file.exists(build_info)) {
orig_time <- file.info(build_info)$mtime
} else {
orig_time <- max(file.info(existing_files)$mtime)
}
archive_dir <- file.path(out_dir, "_archive", format(orig_time, "%Y-%m-%d_%H%M%S"))
dir.create(archive_dir, recursive = TRUE, showWarnings = FALSE)
file.rename(existing_files, file.path(archive_dir, basename(existing_files)))
message("Archived ", length(existing_files), " previous outputs -> ", basename(archive_dir))
}
```
Input section (immediately after setup):
```{r inputs}
# --- Inputs (from other scripts) ---
mdata <- readRDS(here("outs/01_analysis/mdata.rds"))
# --- Inputs (external data) ---
gene_names <- read_tsv(here("data/gene_naming/names.tsv"))
```
Final chunk (after all outputs are written):
```{r build-info}
out_dir <- here("outs/XX_script_name")
dir.create(out_dir, showWarnings = FALSE, recursive = TRUE)
writeLines(
c(
paste("script:", "XX_script_name.qmd"),
paste("commit:", git_hash),
paste("date:", format(Sys.time(), "%Y-%m-%d %H:%M:%S"))
),
file.path(out_dir, "BUILD_INFO.txt")
)
sessionInfo()
```
Setup chunk:
```{python}
#| label: setup
import subprocess
import sys
import random
from pathlib import Path
from datetime import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
PROJECT_ROOT = Path(subprocess.check_output(["git", "rev-parse", "--show-toplevel"]).decode().strip())
sys.path.insert(0, str(PROJECT_ROOT / "python"))
# from helpers import ... # if needed
# ---- Options ----
random.seed(42)
np.random.seed(42)
pd.set_option("display.max_columns", None)
sns.set_theme(style="whitegrid")
# ---- Paths ----
out_dir = PROJECT_ROOT / "outs/XX_script_name"
out_dir.mkdir(parents=True, exist_ok=True)
git_hash = subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode().strip()
print(f"Rendered from commit: {git_hash}")
# ---- Archive previous outputs ----
import shutil
existing_files = [f for f in out_dir.iterdir() if f.is_file()]
if existing_files:
build_info = out_dir / "BUILD_INFO.txt"
if build_info.exists():
orig_time = datetime.fromtimestamp(build_info.stat().st_mtime)
else:
orig_time = datetime.fromtimestamp(max(f.stat().st_mtime for f in existing_files))
archive_dir = out_dir / "_archive" / orig_time.strftime("%Y-%m-%d_%H%M%S")
archive_dir.mkdir(parents=True, exist_ok=True)
for f in existing_files:
shutil.move(str(f), str(archive_dir / f.name))
print(f"Archived {len(existing_files)} previous outputs → {archive_dir.name}")
```
Input section (immediately after setup):
```{python}
#| label: inputs
# --- Inputs (from other scripts) ---
modules = pd.read_csv(PROJECT_ROOT / "outs/02_module_lists/modules.tsv", sep="\t")
# --- Inputs (external data) ---
gene_names = pd.read_csv(PROJECT_ROOT / "data/gene_naming/names.tsv", sep="\t")
```
Saving figures:
```{python}
#| label: fig-example
#| fig-cap: "Description of figure"
fig, ax = plt.subplots(figsize=(6, 4))
# ... plotting code ...
plt.tight_layout()
# Save to outs/ AND display inline
fig.savefig(out_dir / "figure_name.pdf", dpi=300, bbox_inches="tight")
fig.savefig(out_dir / "figure_name.png", dpi=300, bbox_inches="tight")
plt.show()
```
For seaborn:
```{python}
g = sns.catplot(data=df, x="condition", y="value", kind="box")
g.savefig(out_dir / "boxplot.pdf", dpi=300, bbox_inches="tight")
plt.show()
```
Final chunk:
```{python}
#| label: build-info
(out_dir / "BUILD_INFO.txt").write_text(
f"script: XX_script_name.qmd\n"
f"commit: {git_hash}\n"
f"date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n"
)
import session_info
session_info.show()
```
| Convention | R | Python |
|------------|---|--------|
| Project root | here::here() | PROJECT_ROOT (from git) |
| Read CSV | read_csv(here("data/file.csv")) | pd.read_csv(PROJECT_ROOT / "data/file.csv") |
| Read TSV | read_tsv(here("data/file.tsv")) | pd.read_csv(PROJECT_ROOT / "data/file.tsv", sep="\t") |
| Read Parquet | arrow::read_parquet(here(...)) | pd.read_parquet(PROJECT_ROOT / ...) |
| Read RDS | readRDS(here(...)) | N/A (use Parquet for cross-language) |
| Save figure | ggsave(file.path(out_dir, "fig.pdf")) | fig.savefig(out_dir / "fig.pdf") |
| Random seed | set.seed(42) | random.seed(42) + np.random.seed(42) |
| Session info | sessionInfo() | session_info.show() |
| Suppress startup | suppressPackageStartupMessages() | N/A (Python imports are quiet) |
| Chunk label | {r label-name} or #| label: | #| label: only |
| Helper loading | source(here("R/helpers.R")) | sys.path.insert(0, str(PROJECT_ROOT / "python")) |
Do not mix R and Python chunks in a single .qmd. Each script uses one language. Scripts communicate through files in outs/, not shared memory. Use interchange formats (TSV, CSV, Parquet) for data that crosses the language boundary.
!command) in cellsCause: Quarto renders Python cells with a standard Python kernel, not IPython.
Shell magic like !git rev-parse HEAD raises SyntaxError.
Fix: Always use subprocess for shell commands in Python .qmd cells:
# CORRECT — works in Quarto
import subprocess
git_hash = subprocess.check_output(["git", "rev-parse", "HEAD"], text=True).strip()
# WRONG — IPython magic, fails in Quarto
git_hash = !git rev-parse HEAD
__file__ is not definedCause: Quarto runs Python QMDs via Jupyter, where __file__ doesn't exist.
Fix: Use git rev-parse --show-toplevel for PROJECT_ROOT (already in the template above). Never use Path(__file__) in QMD files.
# CORRECT — works in Jupyter and standalone
PROJECT_ROOT = Path(subprocess.check_output(
["git", "rev-parse", "--show-toplevel"], text=True
).strip())
# WRONG — fails in Jupyter/Quarto
PROJECT_ROOT = Path(__file__).resolve().parents[1]
nbformat, nbclient, or ipykernelCause: Quarto needs these packages in the active conda env to execute Python QMDs. Fix: Install all three in the project's conda env:
pip install nbformat nbclient ipykernel
scripts/ instead of outs/Cause: Forgot --output-dir flag.
Fix: Always specify output directory:
quarto render scripts/XX_name.qmd --output-dir outs/XX_name/
Cause: Conda env not activated before quarto render, or wrong kernel registered.
Fix: Always activate conda first. If needed, register the kernel:
source ~/miniconda3/etc/profile.d/conda.sh && conda activate PROJECT_ENV
python -m ipykernel install --user --name PROJECT_ENV
quarto render scripts/XX_name.qmd --output-dir outs/XX_name/
| Topic | File |
|-------|------|
| Publication-quality YAML templates (HTML and PDF) | references/pdf-formatting.md |
development
Phylogenetic tree visualization and formatting with ggtree (R) or iTOL (web). Use when rendering a phylogenetic tree as a figure, choosing tree layout, coloring branches or labels by taxonomy, collapsing clades, displaying support values, or adding overlays to a tree. Do NOT load for tree inference (use protein-phylogeny skill) or domain annotation (future separate skill).
development
Configure and manage Claude Code security protections for sensitive files, credentials, and data. Use when the user invokes /security-setup to set up or modify protections against unauthorized file access, credential exposure, or sensitive data leaks.
development
Script organization for data science analysis projects with numbered scripts, data/outs/ directories, and reproducibility conventions. Use when creating new analysis scripts in projects that follow data science conventions (numbered XX_ prefix scripts, outs/ directories, BUILD_INFO.txt). Do NOT load for documentation projects (Quarto books), infrastructure repos, or projects without data/outs/ directory structure.
testing
R renv package management for data science projects. Use when working with renv (renv.lock, renv::restore, renv::snapshot) in R analysis projects. Do NOT load for projects that do not use R or renv.