claude/skills/quarto/SKILL.md
Render computational documents to markdown (DEFAULT), PDF, HTML, Word, and presentations using Quarto. PREFER markdown output for composability. Use for static reports, multi-format publishing, scientific documents with citations/cross-references, or exporting Jupyter notebooks. Triggers on "render markdown", "render PDF", "publish document", "create presentation", "quarto render", or multi-format publishing needs.
npx skillsauth add lanej/dotfiles quartoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quarto is an open-source scientific and technical publishing system built on Pandoc. It renders computational documents (with Python, R, Julia code) to publication-quality output in multiple formats.
epq scaffold (PREFERRED for new projects)New QMD analysis projects must be created via the epq CLI — do NOT manually create
pyproject.toml, _quarto.yml, justfile, figures/_style.py, or latex-header.tex:
epq scaffold ~/workspace/projects/my-analysis # generates all boilerplate
cd ~/workspace/projects/my-analysis
just bootstrap # uv sync + ipykernel install
epq audit . # verify clean (0 warnings)
epq scaffold generates: _quarto.yml (with jupyter: set), latex-header.tex (local
copy — no external path dep at render time), justfile (thin import wrapper for canonical
recipes), pyproject.toml (package = false, epq editable install), .gitignore,
figures/fig_example.py (canonical dev loop), {name}_files/figure-pdf/ pre-created.
To audit or retrofit an existing project:
epq audit <path> # JSON violations with file/line/suggestion
epq fix <path> # unified diffs for auto-fixable issues (LLM reviews and applies)
epq list-rules # enumerate all rule IDs
Shared library — import in every analysis QMD setup cell:
from epq import style, cache, bq, fmt
style.apply_style() # canonical rcParams (150/200 DPI, sans-serif fallbacks)
style.NAVY, style.TEAL, ... # workspace palette — never redefine inline
cache.read_cache("name") # 24h TTL file-based cache → None if stale/missing
bq.run_bq_query(SQL) # subprocess bigquery CLI wrapper, max_results=10000
fmt.millions_formatter() # FuncFormatter for ax.yaxis.set_major_formatter()
Full authoring reference: ~/src/analysis-doc/docs/AGENTS.md
Retrofit guide: ~/src/analysis-doc/docs/RETROFIT.md
For documents with multiple visualizations, extract all matplotlib code into standalone
Python modules in figures/. The QMD becomes a thin shell with stub cells only.
{name}.qmd ← thin shell: prose + data-load cells + stub figure cells only
_quarto.yml ← jupyter: {name} (set by epq scaffold)
latex-header.tex ← local copy (set by epq scaffold)
justfile ← imports ~/src/analysis-doc/tools/justfile
figures/
__init__.py
fig_NAME.py ← one module per figure; render(data) contract
scripts/data/
extract_NAME.py ← standalone BigQuery extractor; writes data/cache/*.json
data/cache/ ← JSON cache files (gitignored)
{name}_files/
figure-pdf/ ← dev loop writes here (pre-created by epq scaffold)
One file per figure (not a dispatcher). render(data: dict) is the only public function.
# figures/fig_revenue.py
from pathlib import Path
from epq import style, fmt # palette and formatters from epq — never local _style.py
LABEL = "fig-revenue" # matches QMD #| label: fig-revenue
FIG_WIDTH = 8.5
FIG_HEIGHT = 4.0
FIG_CAP = "Insight-focused caption — not a data description."
def render(data: dict) -> None:
"""Render figure. Called from QMD stub cell with shared data dict.
Do NOT call plt.show() or plt.savefig() here — Quarto handles capture.
Do NOT call plt.close() here — handled in __main__ and between QMD cells.
"""
import matplotlib.pyplot as plt
style.apply_style()
fig, ax = plt.subplots(figsize=(FIG_WIDTH, FIG_HEIGHT))
# ... all matplotlib code, reading from data dict ...
df = data.get("revenue", _load_sample_data())
ax.bar(df["month"], df["revenue"], color=style.NAVY)
ax.yaxis.set_major_formatter(fmt.millions_formatter())
ax.set_title(FIG_CAP.rstrip("."))
plt.tight_layout()
def _load_sample_data():
"""Synthetic fallback for dev loop (no BQ needed)."""
import pandas as pd
return pd.DataFrame({"month": ["Q1", "Q2", "Q3", "Q4"],
"revenue": [1.2e6, 1.4e6, 1.3e6, 1.6e6]})
if __name__ == "__main__":
"""Dev loop — saves to {project}_files/figure-pdf/{LABEL}-output-1.png.
Writes to the same path Quarto uses, so visual inspection is against the
real render artifact. Run via: just dev-fig revenue
"""
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
_project_root = Path(__file__).parent.parent
_out_dir = _project_root / f"{_project_root.name}_files" / "figure-pdf"
_out_dir.mkdir(parents=True, exist_ok=True)
out = str(_out_dir / f"{LABEL}-output-1.png")
render({}) # synthetic data fallback
plt.savefig(out, dpi=150, bbox_inches="tight") # savefig BEFORE close
print(f"Saved {out}")
plt.close("all")
#| label: fig-revenue
#| fig-cap: "Revenue by stream."
#| fig-height: 4.0
#| fig-width: 8.5
#| fig-pos: "H"
#| out-width: 100%
import sys; sys.path.insert(0, str(Path("."))) if "." not in sys.path else None
from figures import fig_revenue
fig_revenue.render(data)
epq scaffold)Projects get a thin justfile that imports the canonical recipe library:
# Project-local overrides go here. Canonical recipes imported below.
import? '~/src/analysis-doc/tools/justfile'
The canonical justfile provides:
# Render figure to {project}_files/figure-pdf/fig-NAME-output-1.png
dev-fig NAME:
PYTHONPATH=. uv run python figures/fig_{{NAME}}.py
@echo "→ $(basename $PWD)_files/figure-pdf/fig-{{NAME}}-output-1.png"
# Full render (clears Jupyter cache)
render:
rm -rf .jupyter_cache/
quarto render *.qmd
CRITICAL: Any task involving figures — iteration, audit, or review — MUST render and visually inspect the PNG before reporting. Code-only review is incomplete.
Required sequence for every figure change or audit:
just dev-fig NAME → {project}_files/figure-pdf/fig-NAME-output-1.pngDo NOT use just preview-fig / Playwright — it screenshots HTML chrome, not the raw
figure. Read the PNG directly from {project}_files/figure-pdf/.
Visual Readability Checklist (inspect the rendered PNG, not the code):
suptitle(y=1.02) ALWAYS clips in PDF. Use y=1.0 +
fig.subplots_adjust(top=0.82–0.88). Never use y > 1.0.set_clip_on(True) makes labels invisible, not smallText contrast rule (most common source of invisible text):
# Use epq.style helper — covers all palette fills correctly:
from epq import style
tc = style.text_color_for(fill_color) # WHITE for dark fills, NAVY for light
# Manual guard if needed:
DARK_FILLS = (style.NAVY, style.TEAL, style.CORAL, style.PURPLE, style.GOLD)
tc = style.WHITE if fill in DARK_FILLS else style.NAVY
# ^^^^ NAVY, never SLATE
# SLATE on LIGHT_SLATE = 2.6:1 contrast — fails AA, looks muddy at print scale
Verified contrast ratios:
Common visual defects invisible in code:
subplots_adjustplt.savefig() called after plt.close(). Fix: call savefig first:
render({})
plt.savefig(out, dpi=150, bbox_inches="tight")
plt.close("all")
figures/fig_NAME.py, not the QMD — QMD stubs never change unless label/caption/dimensions changerender(data: dict) -> None, no dispatcher patternfrom epq import style, fmt — never a local _style.py copyPYTHONPATH=. required when running modules directly: PYTHONPATH=. uv run python figures/fig_NAME.pyrender() must NOT call plt.show(), plt.close(), or plt.savefig() — Quarto handles capture; __main__ handles savematplotlib.use("Agg") before any pyplot import in __main__data dict is the only input — never read cache files inside figure modules; provide synthetic fallback in _load_sample_data(){project}_files/figure-pdf/{LABEL}-output-1.png (pre-created by epq scaffold)~/workspace/projects/luma-revenue-forecast/ and ~/workspace/projects/revenue-forecast-2026/Use epq scaffold — it handles all of the following automatically:
epq scaffold ~/workspace/projects/my-project
cd ~/workspace/projects/my-project
just bootstrap # runs: uv sync + ipykernel install + mkdir data/cache
Manual checklist (only if NOT using epq scaffold):
pyproject.toml with [tool.uv] package = false and run uv syncuv run python -m ipykernel install --user --name=<project>_quarto.yml: set jupyter: <project> (must match registered kernel name exactly)latex-header.tex locally from ~/src/analysis-doc/templates/ (do not reference it via external path)title: or subtitle: — they produce \maketitle which double-renders with any custom header{\large\textbf{Document Title}}
\hfill
{\small\color{gray} Author \quad\textbullet\quad \today}
\vspace{4pt}
\hrule
\vspace{8pt}
pagetitle: " " to YAML to suppress the HTML <title> without breaking PandocIf a combined figure has two sub-panels telling different insights, split into separate fig-* cells with their own captions. Ask upfront whether panels should be split — this is cheaper than rework after the fact.
loc='upper right') when data occupies the bottom portion of the chartfig.legend(loc='lower center', ncol=N, bbox_to_anchor=(0.5, -0.02)) + fig.subplots_adjust(bottom=0.20)Never rely on AutoDateLocator — it overcrowds on multi-year spans:
ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=[1, 7]))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b '%y"))
ax.tick_params(axis='x', labelsize=8)
Justfile render recipes must not include && open <file>. The user opens files manually.
"WARN: maximum number of runs (9) reached" is cosmetically harmless when there are no \ref{} cross-references in prose. It is caused by fancyhdr + many figures creating layout oscillation. Mitigations:
\needspace not \clearpage (see needspace sizing table in the \needspace section)labelformat=empty from \captionsetup — caption numbering churn drives oscillationhtbp float placement rather than forced H\needspace sizing reference — X = figure height + 1.2in overhead:
| Figure height | \needspace |
|---|---|
| 3.2in | \needspace{4.5in} |
| 4.0in | \needspace{5.2in} |
| 5.5in | \needspace{6.8in} |
| Prose only | \needspace{2.5in} |
Int64 for integer columns — always .astype('float64') before fillna()df.set_index('quarter').reindex(monthly_idx, method='ffill') + ax.step(..., where='post')\n inside BigQuery SQL string literals in Python f-strings — use spaces insteadMarkdown(f"""...""") call — f-strings evaluate at call time, not definition timeformat: gfm with wrap: none for composability, portability, and archivalwrap: none for GFM output (avoids artificial line breaks)auto-dark filter with dual themes for HTML output (accessibility and modern UX)df.head(), print(dict))--to pdf for Google Drive sharing (read-only, professional)Markdown() for text output in code blocks - NEVER use print() or printf() (output must render as formatted markdown)title/author/date fields (they produce an academic title block via \maketitle). Use a raw LaTeX minipage inline header instead. Use ## markdown headings for section headings (NOT raw LaTeX \noindent{\large\textbf{...}} blocks — those fight with Quarto's float placement). Exception: only use raw LaTeX headings for the document title line itself.#| fig-pos: "H", #| fig-width: N, #| fig-height: N, #| out-width: 100%. Always end chunks with plt.close('all'). Set ax.text(...).set_clip_on(True) on all annotation labels. Never use mixed coordinate transforms. See "PDF Figure Sizing — Critical Patterns" section for full details.CRITICAL: Matplotlib figure sizing in Quarto PDF output is failure-prone. Follow these rules exactly.
Every chart chunk that renders to PDF must have ALL of these chunk options:
#| label: fig-my-chart
#| fig-pos: "H" # force-here via float.sty — prevents deferral/stacking
#| fig-width: 6.5 # must match figsize width in Python code
#| fig-height: 3.5 # must match figsize height in Python code
#| out-width: 100% # tells LaTeX to scale to full text column width
#| fig-cap: "Caption text with no bare % characters — write 'percent' instead."
And in the Python code:
fig, ax = plt.subplots(figsize=(6.5, 3.5)) # must match chunk fig-width/fig-height
# ... chart code ...
plt.tight_layout()
plt.show()
plt.close('all') # REQUIRED — prevents state leaking between chunks
plt.rcParams.update({
'savefig.bbox': None, # fills declared figsize exactly — do NOT set to 'tight'
'savefig.pad_inches': 0,
'figure.dpi': 200,
'savefig.dpi': 300,
})
savefig.bbox: None is counterintuitive but correct for PDF output. Setting it to 'tight' causes matplotlib to auto-expand the canvas, which fights against the declared figsize and produces malformed figure PDFs.
These are the diagnosed failure modes, in order of frequency:
1. Text labels extending beyond xlim/ylim — MOST COMMON
# ❌ BROKEN: annotation text positioned beyond axis limits
for bar, row in zip(bars, df.itertuples()):
ax.text(bar.get_width() + 4, ..., f"{row.value}") # +4 may push past xlim
ax.set_xlim(0, 380) # text at bar.width+4 can exceed 380 → bbox explosion
# ✅ FIX: clip text labels to axes, OR increase xlim to accommodate labels
for bar, row in zip(bars, df.itertuples()):
t = ax.text(bar.get_width() + 4, ..., f"{row.value}")
t.set_clip_on(True) # ← prevents bbox from expanding to include clipped text
# OR: ax.set_xlim(0, 420) # ensure xlim accommodates largest label
When ax.text() labels are placed at bar.get_width() + offset in data coordinates and those labels extend beyond xlim, the PDF backend measures the full artist bounding box (including out-of-bounds text) when computing the figure's page size. This causes the figure PDF to be output at half or less of the declared figsize — which LaTeX then renders at postage-stamp size even though out-width: 100% is set.
Rule: After setting xlim, add t.set_clip_on(True) to ALL ax.text() calls, or add enough xlim headroom to fit the longest annotation.
2. Mixed coordinate transforms on annotations
# ❌ BROKEN: mixes data coordinates with axis-fraction transform
ax.annotate("", xy=(x0, y0 + 3), xytext=(x1, y1 + 3), ...) # data coords
ax.text(0.5, max_val + 9, "label", transform=ax.get_xaxis_transform()) # mixed!
# ✅ FIX: use pure axes fraction for floating annotations
ax.annotate("label text",
xy=(0.5, 0.85), xycoords='axes fraction',
ha='center', va='center', fontsize=9, ...)
ax.get_xaxis_transform() mixes x=axis-fraction with y=data coordinates. When the y-value in data coords exceeds ylim, the PDF backend's bounding box measurement goes pathological, producing figures that are 10-30× taller than declared. Always use pure coordinate systems — either all data coords or all xycoords='axes fraction'.
3. Missing plt.close('all') between chunks
Without plt.close('all') after each plt.show(), matplotlib figure state (transforms, layout engines, bounding boxes) leaks between Jupyter/Quarto execution chunks. This can cause later charts to inherit corrupt layout state from earlier ones. Always end every chart chunk with:
plt.tight_layout()
plt.show()
plt.close('all')
4. fig-pos: "!ht" instead of "H"
"!ht" (try-here, then top-of-page) causes LaTeX to defer figures when there's insufficient space, stacking them at awkward positions. "H" (force-here via float.sty) places the figure exactly where declared. Requires \usepackage{float} in the LaTeX header.
5. Missing #| fig-width / #| fig-height on chunk
Without explicit chunk-level sizing, Quarto uses YAML defaults and may not pre-allocate the correct float box size before Python renders into it. Always specify both per-chunk.
To identify which figures are malformed without waiting for a full visual review:
# Add keep-tex to render temporarily
cd /path/to/doc && uv run quarto render doc.qmd --to pdf -M keep-tex:true
# Check actual page dimensions of each generated figure PDF
for f in doc_files/figure-pdf/*.pdf; do
echo -n "$f: "
pdfinfo "$f" 2>/dev/null | grep "Page size"
done
# A correct 6.5×3.5in figure should be ~468×252 pts (at 72 pts/in)
# A figure with width << 440 pts or height >> 400 pts is malformed
If a figure keeps rendering incorrectly despite all fixes, force PNG raster output:
#| label: fig-problematic
#| fig-pos: "H"
#| dev: png # ← bypass the PDF vector pipeline entirely
#| dpi: 150
#| fig-width: 6.5
#| fig-height: 3.5
#| out-width: 100%
PNG output is immune to all the bbox/transform issues because matplotlib renders to a fixed-size raster and Quarto embeds it directly. Use as a last resort since vector PDF is crisper.
% Restore default numbering (Figure 1., Figure 2., etc.) with bold prefix:
\captionsetup{font={small,it},justification=centering,skip=6pt,labelfont=bf}
% Suppress numbering (caption text only, no "Figure N." prefix):
\captionsetup{font={small,it},justification=centering,skip=6pt,labelformat=empty,labelsep=none}
Never use bare % in fig-cap strings — write "percent" or "percentage points" instead. LaTeX may fail to compile depending on pandoc version.
Reference lines (baselines, averages) should be visible context, not dominant elements. Always set alpha=0.4:
ax.axvline(48.9, color=NAVY, linestyle=":", linewidth=1.6, alpha=0.4, label="Baseline", zorder=3)
ax.axhline(37.8, color=SLATE, linestyle="--", linewidth=1.4, alpha=0.4, label="Avg", zorder=3)
At alpha=1.0 (default), reference lines dominate the chart and compete with the data bars. alpha=0.4 keeps them readable without visual dominance.
CRITICAL: Quarto documents are for COMMUNICATION, not raw data dumps.
Quarto outputs are static documents meant to convey insights to humans. Raw dataframes, print statements, and JSON blobs fail to communicate effectively.
# ❌ BAD: Raw dataframe dump
df.head()
# ❌ BAD: Print statements - output renders as plain text, not formatted markdown
print(f"Total: {total}")
print(data_dict)
# ❌ BAD: printf/print for text - use Markdown() instead
print("## Summary\n- Item 1\n- Item 2") # Renders as plain text!
# ❌ BAD: Bare variable returning data structure
result # Returns raw dict/JSON
# ❌ BAD: DataFrame info without formatting
df.describe()
df.info()
# ✅ GOOD: Use Markdown() for ALL text output
from IPython.display import Markdown
Markdown(f"""
## Summary
- **Total**: {total:,}
- **Average**: {avg:.2f}
""")
❌ BAD: Plain text for mathematical notation
- The growth rate is alpha = 0.15 or 15%
- We calculated the mean mu = sum(xi)/n
- The correlation coefficient r = 0.85
❌ BAD: No table formatting
```python
print(df.head())
❌ BAD: Using asterisks for equations
### Good Patterns: Visual Communication
```python
# ✅ GOOD: Chart for trends
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
df.groupby('date')['sales'].sum().plot(ax=ax, kind='line')
ax.set_title('Sales Trend Over Time')
ax.set_ylabel('Sales ($)')
plt.tight_layout()
plt.show()
# ✅ GOOD: Formatted table using Great Tables
from great_tables import GT
(GT(df.head(10))
.tab_header(title="Top 10 Sales Records")
.fmt_currency(columns="sales", currency="USD")
.fmt_date(columns="date", date_style="medium"))
# ✅ GOOD: Formatted table using pandas markdown
from IPython.display import Markdown
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))
# ✅ GOOD: Formatted metrics in markdown with LaTeX
from IPython.display import Markdown
Markdown(f"""
## Key Metrics
- **Total Sales**: ${total_sales:,.2f}
- **Average Order**: ${avg_order:,.2f}
- **Growth Rate**: $\\alpha = {growth_rate:.1%}$ (15% YoY)
- **Top Product**: {top_product}
### Statistical Summary
The linear regression model $y = \\beta_0 + \\beta_1 x + \\epsilon$ yielded:
- Slope: $\\hat{{\\beta_1}} = 3.2$ (SE = 0.4)
- $R^2 = 0.78$, indicating strong fit
""")
# NOTE: Mermaid diagrams must use native Quarto syntax outside Python blocks
# Use ```{mermaid} directly in markdown, NOT inside Markdown() calls
Build understanding progressively through a series of sections:
Each base fact is an independent observation with its own data and evidence. Use descriptive headers (not "Fact 1"):
## Response Time Distribution
\needspace{3in}
Analysis of the past 7 days shows significant tail latency:
- p50: 45ms
- p95: 230ms
- p99: 890ms (concerning)
```{python}
#| echo: false
# Chart showing latency distribution
### Synthesis Sections
Synthesis sections **explicitly reference** which earlier sections they build upon:
```qmd
## Performance Degradation Under Load
\needspace{4in}
Building on the response time distribution and traffic patterns above, we
observe a clear correlation: p99 latency spikes to 2.3s during the 2-4pm
peak traffic window. The system handles baseline load well but degrades
significantly under peak conditions.
Use \needspace{Xin} before sections with charts/diagrams to prevent awkward page breaks and large whitespace gaps:
## Revenue by Carrier
\needspace{4in}
```{python}
# Chart code here
**Guidelines:**
- **Mermaid diagrams**: `\needspace{2in}`
- **Single chart**: `\needspace{3in}`
- **Chart + explanation**: `\needspace{4in}`
**Requires** in YAML frontmatter:
```yaml
format:
pdf:
include-in-header:
text: |
\usepackage{needspace}
IMPORTANT: Markdown requires blank lines between paragraphs for proper rendering.
The Problem:
❌ BAD: This will render as one long line
This text appears on a new line in the source
But it renders on the same line as above
Because there's no blank line between them
The Solution:
✅ GOOD: This renders as separate paragraphs
This text appears on its own line because there's a blank line above it.
Each paragraph needs a blank line before and after it.
Paragraphs (need blank lines):
This is paragraph one.
This is paragraph two.
This is paragraph three.
Lists REQUIRE blank line before the list:
❌ BAD: List doesn't render correctly
Here is some text:
- Item 1
- Item 2
✅ GOOD: Blank line before list
Here is some text:
- Item 1
- Item 2
Numbered lists also require blank line before:
❌ BAD: Numbered list broken
The steps are:
1. First step
2. Second step
✅ GOOD: Blank line before numbered list
The steps are:
1. First step
2. Second step
Lists (no blank lines between items):
- Item 1
- Item 2
- Item 3
Multi-paragraph list items (blank lines within item):
- Item 1 with first paragraph
Item 1 continued with second paragraph (indented 2 spaces)
- Item 2 starts here
Headers (blank line before and after):
Previous paragraph ends here.
## Section Header
New paragraph starts here.
Code blocks (blank line before and after):
Previous paragraph ends here.
```python
print("code block")
```
New paragraph starts here.
Python code output:
#| echo: false
from IPython.display import Markdown
# ❌ BAD: Single newline won't create paragraph break
Markdown("Line 1\nLine 2") # Renders as: Line 1 Line 2
# ✅ GOOD: Double newline creates paragraph break
Markdown("Line 1\n\nLine 2") # Renders as separate paragraphs
# ✅ GOOD: Use triple-quoted string with blank lines
Markdown("""
Paragraph one.
Paragraph two.
Paragraph three.
""")
F-strings in Markdown:
#| echo: false
from IPython.display import Markdown
# ✅ GOOD: Blank lines between paragraphs
Markdown(f"""
## Analysis Results
The growth rate is {growth_rate:.1%}.
This represents a significant increase over last quarter.
We recommend increasing inventory by {inventory_increase:,} units.
""")
✅ DO:
❌ DON'T:
\n in strings expecting paragraph breaks (use \n\n)Quick test:
# Create test document
cat > test.qmd << 'EOF'
---
title: "Markdown Test"
format: gfm
---
Paragraph 1.
Paragraph 2.
## Header
Paragraph 3.
EOF
# Render and check output
quarto render test.qmd --to gfm
cat test.md
Use /think command for structured analysis with graduated detail.
Documents use graduated detail so readers can stop at their desired depth:
Abstract (paragraph)
Key Findings (3-5 bullets)
**[Finding]**: [Result] — [Evidence] (Confidence)Investigation (detailed)
[source])Appendix (optional)
Include diagram, chart, or table for most findings:
/think Why is the login endpoint returning 500 errors intermittently?
Creates analysis document with abstract-first structure, key findings with confidence levels, and visual reasoning chains.
See /think command for full template.
Perfect for:
/think command)NOT for:
CRITICAL: Quarto documents must be reproducible with documented dependencies.
When someone runs quarto render analysis.qmd, they should be able to get identical results given:
.qmd file)A .qmd file defines a complete data pipeline. The document contains:
Practical limits: Some dependencies are too expensive to rebuild on every render:
The key is documentation: readers must understand what's needed and how to set it up.
❌ BAD: Referencing local files without provenance
df = pd.read_json('/tmp/orders.jsonl', lines=True) # Where did this come from?
df = pd.read_csv('sales.csv') # Who created this? When? How?
# "This JSONL file" without explaining its origin
data = load_data('extracted_metrics.jsonl') # Non-portable!
✅ PREFERRED: Document defines where data comes from
#| cache: true
import subprocess
import io
import pandas as pd
# Extract from BigQuery - cached to avoid re-running on every render
result = subprocess.run([
'bigquery', 'query',
'''SELECT * FROM production.orders
WHERE date >= '2024-01-01'
AND status = 'completed' ''',
'--format', 'jsonl'
], capture_output=True, text=True, check=True)
df = pd.read_json(io.StringIO(result.stdout), lines=True)
✅ ALSO GOOD: Canonical external sources
#| cache: true
import pandas as pd
# Public dataset with stable URL
df = pd.read_csv('https://data.company.com/public/sales-2024.csv')
# Or versioned data in the same repository
df = pd.read_csv('data/sales-2024-v2.csv') # Committed to git with the .qmd
Rule of thumb: Un-cached renders should complete in < 60 seconds.
cache: true)Some dependencies are too expensive to recreate on every render. Document them clearly so readers can set up the environment.
✅ GOOD: Reference with setup documentation
#| echo: false
import subprocess
# DEPENDENCY: LanceDB index at ~/.lancedb/documents
# Setup: lancer ingest -t documents ~/corpus/*.md
# This index contains ~50k documents and takes ~10 min to build
result = subprocess.run(
['lancer', 'search', '-t', 'documents', 'shipping rate errors', '--limit', '20'],
capture_output=True, text=True, check=True
)
relevant_docs = result.stdout
✅ GOOD: Prerequisites section in document
---
title: "Knowledge Base Analysis"
---
## Prerequisites
This analysis requires the following setup:
1. **LanceDB index**: `lancer ingest -t documents ~/corpus/*.md`
2. **BigQuery access**: Authenticated via `gcloud auth application-default login`
3. **Data snapshot**: Run `./scripts/extract-data.sh` (takes ~5 min)
## Analysis
...
❌ BAD: Silent dependency on local state
# No documentation about what this index is or how to create it
results = lancer.search("documents", "query") # Will fail for anyone else
Use cache: true to avoid re-running expensive operations during iteration.
Requires jupyter-cache (one-time install):
uv pip install jupyter-cache
Per-cell caching:
#| cache: true
#| label: data-extraction
# This cell only re-executes if the code changes
result = subprocess.run(['bigquery', 'query', ...], capture_output=True, text=True)
df = pd.read_json(io.StringIO(result.stdout), lines=True)
Document-wide caching in YAML frontmatter:
---
title: "Analysis Report"
execute:
cache: true
---
For projects with many documents, use freeze to cache execution results in version control:
# _quarto.yml (project config)
execute:
freeze: auto # Re-render only when source changes
Key difference:
cache: true - Caches cell outputs locally (Jupyter Cache)freeze: auto - Stores results in _freeze/ directory (can commit to git)When to use freeze:
_freeze/)Use this pattern when cache: true is insufficient — specifically when:
Set execute: cache: false in frontmatter when using this pattern (disable Jupyter cache to avoid double-caching).
Helper functions (add once per document, in a setup cell):
import json
from pathlib import Path
from datetime import datetime, timezone
_CACHE_DIR = Path('data/cache')
_CACHE_DIR.mkdir(parents=True, exist_ok=True)
def _cache_path(name):
return _CACHE_DIR / f"{name}.json"
def _read_cache(name):
p = _cache_path(name)
if p.exists():
return json.loads(p.read_text())
return None
def _write_cache(name, records, scalars=None):
p = _cache_path(name)
p.write_text(json.dumps({
'_queried_at': datetime.now(timezone.utc).isoformat(),
'records': records,
'scalars': scalars or {}
}, default=str))
Per-dataset usage template (repeat for each dataset):
_c = _read_cache('my_dataset')
if _c:
df = pd.DataFrame(_c['records'])
my_scalar = float(_c['scalars']['my_scalar'])
else:
df = run_bq_query(my_query)
my_scalar = float(df['col'].values[0])
_write_cache('my_dataset', df.to_dict(orient='records'), {
'my_scalar': my_scalar,
})
Cache hit → reads DataFrame and scalars from JSON; no BigQuery call. Cache miss → queries BigQuery live, writes cache, continues render. Query failure on miss → render fails loudly (intentional — no silent fallback).
Never do:
except Exception: my_scalar = 42 — silent fallback masks broken queriestry: without a preceding query call — hidden constant, not a live value_write_cache() in the else branch — next render re-queries unnecessarilyCache file format:
{
"_queried_at": "2026-02-20T16:00:00Z",
"records": [...],
"scalars": {...}
}
Serialization notes:
df['col'].tolist()), reconstruct with np.array(...)default=str in json.dumps to handle non-serializable typesCache invalidation:
# .gitignore
data/cache/
# Justfile recipe
delete-cache:
rm -rf data/cache/
---
title: "Q4 2024 Sales Analysis"
author: "Josh Lane"
date: "2024-12-31"
format:
gfm:
wrap: none
html:
theme:
dark: darkly
light: flatly
execute:
cache: true
filters:
- auto-dark
---
## Data Extraction
```{python}
#| cache: true
#| label: extract-sales
import subprocess
import io
import pandas as pd
# Reproducible: Query is embedded in the document
result = subprocess.run([
'bigquery', 'query',
'''
SELECT date, product, region, sales, units
FROM production.sales
WHERE EXTRACT(QUARTER FROM date) = 4
AND EXTRACT(YEAR FROM date) = 2024
''',
'--format', 'jsonl'
], capture_output=True, text=True, check=True)
df = pd.read_json(io.StringIO(result.stdout), lines=True)
df['date'] = pd.to_datetime(df['date'])
#| echo: false
from IPython.display import Markdown
total_sales = df['sales'].sum()
top_product = df.groupby('product')['sales'].sum().idxmax()
Markdown(f"""
### Key Metrics
- **Total Q4 Sales**: ${total_sales:,.2f}
- **Top Product**: {top_product}
- **Records**: {len(df):,}
""")
### Best Practices Summary
✅ **DO:**
- Embed data extraction commands in the document
- Use `cache: true` for expensive operations
- Reference canonical external URLs when possible
- Commit small data files alongside the `.qmd`
- Use `freeze` for project-level caching
❌ **DON'T:**
- Reference opaque local files (`/tmp/data.jsonl`)
- Say "this file" without showing how it was created
- Assume the reader has access to your local machine
- Leave data provenance undocumented
## Decision Tree: Quarto vs Alternatives
Need user interactivity? (sliders, dropdowns, real-time updates) ├─ YES → Use Shiny or dedicated dashboard tools └─ NO → Static output needed │ ├─ Complex multi-page documentation site? │ └─ YES → Use Quarto website/book projects │ ├─ Single analysis with code + results? │ └─ Native Quarto .qmd files (recommended) │ └─ Just formatting existing markdown? └─ Use Quarto with plain .md files
## Installation
Quarto is already installed (version 1.8.27).
**Optional dependencies:**
```bash
# TinyTeX for better PDF generation (LaTeX)
quarto install tinytex
# Chromium for PDF generation (alternative to LaTeX)
quarto install chromium
Current setup:
quarto-nvim plugin is installed and configured.
Features:
.qmd files (Python, bash, lua, html code chunks):QuartoPreviewKeybindings:
<leader>qp - Preview current document (live reload)<leader>qc - Close preview<leader>qm - Render to markdown (GFM) - RECOMMENDED default<leader>qh - Render to HTML<leader>qd - Render to PDFTreesitter support: Quarto syntax highlighting requires treesitter parsers:
# In Neovim
:TSInstall markdown
:TSInstall markdown_inline
:TSInstall python
Otter.nvim integration:
The plugin uses otter.nvim for embedded language support in code chunks. This means you get full LSP features (completion, diagnostics, hover) for Python code inside .qmd files.
Create a Quarto document:
# Create analysis.qmd
cat > analysis.qmd << 'EOF'
---
title: "Sales Analysis Q4 2024"
author: "Josh Lane"
date: "2024-01-30"
format:
gfm:
wrap: none # No line wrapping
html:
theme:
dark: darkly # Dark mode (recommended)
light: flatly # Light mode fallback
code-fold: true
execute:
cache: true # Cache expensive queries
filters:
- auto-dark # Respect system preference
---
## Data Extraction
```{python}
#| cache: true
#| echo: false
import subprocess
import io
import pandas as pd
# Reproducible: Query embedded in document
result = subprocess.run([
'bigquery', 'query',
'''SELECT date, product, sales
FROM production.sales
WHERE EXTRACT(QUARTER FROM date) = 4
AND EXTRACT(YEAR FROM date) = 2024''',
'--format', 'jsonl'
], capture_output=True, text=True, check=True)
df = pd.read_json(io.StringIO(result.stdout), lines=True)
df['date'] = pd.to_datetime(df['date'])
#| echo: false
import matplotlib.pyplot as plt
from IPython.display import Markdown
total_sales = df['sales'].sum()
avg_daily = df.groupby('date')['sales'].sum().mean()
top_product = df.groupby('product')['sales'].sum().idxmax()
# Display key metrics
Markdown(f"""
### Key Metrics
- **Total Sales**: ${total_sales:,.2f}
- **Average Daily Sales**: ${avg_daily:,.2f}
- **Top Product**: {top_product}
- **Records Analyzed**: {len(df):,}
""")
#| label: fig-sales-trend
#| fig-cap: "Daily sales trending upward in Q4 2024"
#| echo: false
fig, ax = plt.subplots(figsize=(10, 6))
daily_sales = df.groupby('date')['sales'].sum()
daily_sales.plot(ax=ax, kind='line', linewidth=2, color='#2E86AB')
ax.set_title('Sales Trend Over Time', fontsize=14, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
#| echo: false
# Format top 10 products as markdown table
top_products = (df.groupby('product')['sales']
.sum()
.sort_values(ascending=False)
.head(10)
.reset_index())
top_products.columns = ['Product', 'Total Sales']
top_products['Total Sales'] = top_products['Total Sales'].apply(lambda x: f"${x:,.2f}")
Markdown(top_products.to_markdown(index=False, tablefmt='pipe'))
#| echo: false
import numpy as np
# Calculate growth metrics
daily_sales = df.groupby('date')['sales'].sum()
growth_rate = (daily_sales.iloc[-1] - daily_sales.iloc[0]) / daily_sales.iloc[0]
avg_growth = daily_sales.pct_change().mean()
Markdown(f"""
The sales data exhibits a compound growth pattern modeled by:
$$
S(t) = S_0 \\times (1 + r)^t
$$
where $S_0$ represents initial sales, $r = {avg_growth:.3f}$ is the average daily growth rate,
and $t$ is time in days.
**Key Statistical Findings:**
- Overall Q4 growth: $\\Delta S = {growth_rate:.1%}$
- Average daily growth: $\\bar{{r}} = {avg_growth:.3%}$
- Standard deviation: $\\sigma = {daily_sales.std():,.2f}$
- Correlation with marketing spend: $\\rho = 0.82$ (strong positive)
These metrics indicate statistically significant growth ($p < 0.01$) with
consistent upward momentum throughout the quarter.
""")
Q4 2024 showed strong performance with total sales of ${total_sales:,.2f}. The upward trend in daily sales indicates positive momentum heading into the next quarter. EOF
quarto add gadenbuie/quarto-auto-dark --no-prompt
quarto render analysis.qmd --to gfm
quarto render analysis.qmd --to html # Uses auto-dark theme
quarto render analysis.qmd --to pdf # For Google Drive sharing/printing quarto render analysis.qmd --to html # For web viewing
### File Formats Quarto Can Render
**Input formats:**
- `.qmd` - Quarto markdown (native, recommended)
- `.ipynb` - Jupyter notebooks
- `.md` - Plain markdown (no code execution)
- `.Rmd` - R Markdown files
**Output formats:**
- **Markdown**: `md` (plain), `gfm` (GitHub-flavored) - **PREFERRED default**
- **Documents**: PDF, HTML, Word, ODT, ePub, Typst
- **Presentations**: RevealJS (HTML), PowerPoint, Beamer (PDF)
- **Websites**: Multi-page sites, blogs, books
- **Dashboards**: Interactive dashboards (with Shiny or Observable JS)
## Core Commands
```bash
# Render document to markdown (DEFAULT - composable, text-based)
quarto render document.qmd --to md # Executable markdown with results
quarto render document.qmd --to gfm # GitHub-flavored markdown
# Render to other formats
quarto render document.qmd --to pdf # PDF (for Google Drive sharing)
quarto render document.qmd --to html # HTML (avoid --toc, use clear headings)
# Render Jupyter notebook to markdown
quarto render notebook.ipynb --to md # Markdown with executed results
# Render plain markdown (no code execution)
quarto render README.md --to pdf
# Multiple formats at once
quarto render document.qmd --to md,pdf,html
# Preview with live reload
quarto preview document.qmd
# Create new project
quarto create project website mysite
quarto create project book mybook
# Publish
quarto publish gh-pages # GitHub Pages
quarto publish quarto-pub # Quarto Pub
quarto publish netlify # Netlify
## Analysis Section
```{python}
import pandas as pd
df = pd.read_csv("data.csv")
df.head()
```
```{python}
#| label: fig-sales
#| fig-cap: "Sales over time"
#| echo: false
#| warning: false
plt.figure(figsize=(10, 6))
df.plot(x='date', y='sales')
plt.show()
```
Common options:
echo: false - Hide code, show output onlycode-fold: true - Collapsible code blockswarning: false - Hide warningsmessage: false - Hide messageslabel: fig-name - Reference label for cross-referencesfig-cap: "Caption" - Figure captionThe total is `{python} f"${total:,.2f}"`.
There are `{python} len(df)` rows in the dataset.
## Display Options
```{python}
#| output: asis
print("**Bold text** from code")
```
```{python}
#| output: false
# Code runs but output is hidden
result = expensive_calculation()
```
NEVER use raw df.head() or bare dataframes. ALWAYS format tables for presentation.
Installation:
uv add great-tables
Basic usage:
from great_tables import GT
# Simple formatted table
GT(df.head(10))
# With styling
(GT(df.head(10))
.tab_header(title="Sales Summary", subtitle="Q4 2024")
.fmt_currency(columns="sales", currency="USD")
.fmt_percent(columns="growth_rate", decimals=1)
.fmt_number(columns="quantity", decimals=0)
.fmt_date(columns="date", date_style="medium")
.tab_source_note("Source: Company Database"))
Advanced styling:
from great_tables import GT, loc, style
(GT(top_products)
.tab_header(title="Top 10 Products by Revenue")
.fmt_currency(columns="revenue", currency="USD")
.data_color(
columns="revenue",
palette=["lightblue", "darkblue"],
domain=[0, df['revenue'].max()]
)
.tab_style(
style=style.text(weight="bold"),
locations=loc.body(columns="product")
))
.to_markdown() (Simple, built-in)For markdown output (use Markdown() to render properly):
from IPython.display import Markdown
# Basic markdown table
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))
# With custom formatting
formatted_df = df.head(10).copy()
formatted_df['sales'] = formatted_df['sales'].apply(lambda x: f"${x:,.2f}")
formatted_df['date'] = pd.to_datetime(formatted_df['date']).dt.strftime('%Y-%m-%d')
Markdown(formatted_df.to_markdown(index=False, tablefmt='pipe'))
Table format options:
'pipe' - GitHub-flavored markdown pipes (RECOMMENDED)'grid' - ASCII grid'simple' - Simple spacing'html' - HTML table (for HTML output)Installation:
uv add tabulate
Usage:
from tabulate import tabulate
from IPython.display import Markdown
# Basic table
Markdown(tabulate(df.head(10), headers='keys', tablefmt='pipe', showindex=False))
# With custom formatting
Markdown(tabulate(
df.head(10),
headers=['Product', 'Sales', 'Date'],
tablefmt='pipe',
floatfmt='.2f',
showindex=False
))
✅ DO:
.to_markdown() for markdown output (simplicity)❌ DON'T:
df.head() without formattingprint() for tables (use Markdown() instead to render properly)For GFM/Markdown:
from IPython.display import Markdown
# Use Markdown() to render tables properly in Quarto
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))
For HTML:
# Use Great Tables for rich styling
from great_tables import GT
GT(df.head(10)).fmt_currency(columns="sales")
For PDF:
# Use Great Tables or formatted markdown
# Great Tables renders well in PDF via LaTeX
GT(df.head(10))
Charts and diagrams should be your PRIMARY communication tool, not an afterthought.
Use Charts for:
Use Tables for:
Use Both:
Trends and Time Series:
import matplotlib.pyplot as plt
# Line chart for trends
fig, ax = plt.subplots(figsize=(10, 6))
df.groupby('date')['sales'].sum().plot(ax=ax, kind='line', linewidth=2)
ax.set_title('Sales Trend Over Time', fontsize=14, fontweight='bold')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Comparisons:
# Bar chart for category comparisons
fig, ax = plt.subplots(figsize=(10, 6))
top_products = df.groupby('product')['sales'].sum().nlargest(10)
top_products.plot(ax=ax, kind='barh', color='#2E86AB')
ax.set_title('Top 10 Products by Sales', fontsize=14, fontweight='bold')
ax.set_xlabel('Sales ($)')
plt.tight_layout()
plt.show()
Distributions:
# Histogram for distributions
fig, ax = plt.subplots(figsize=(10, 6))
df['order_value'].hist(bins=30, ax=ax, color='#A23B72', edgecolor='black')
ax.set_title('Order Value Distribution', fontsize=14, fontweight='bold')
ax.set_xlabel('Order Value ($)')
ax.set_ylabel('Frequency')
plt.tight_layout()
plt.show()
Proportions:
# Pie chart for proportions (use sparingly)
fig, ax = plt.subplots(figsize=(8, 8))
category_sales = df.groupby('category')['sales'].sum()
ax.pie(category_sales, labels=category_sales.index, autopct='%1.1f%%', startangle=90)
ax.set_title('Sales by Category', fontsize=14, fontweight='bold')
plt.show()
Relationships:
# Scatter plot for correlations
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(df['marketing_spend'], df['sales'], alpha=0.5, color='#F18F01')
ax.set_title('Marketing Spend vs Sales', fontsize=14, fontweight='bold')
ax.set_xlabel('Marketing Spend ($)')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
✅ DO:
plt.tight_layout() to prevent label cutoff❌ DON'T:
Quarto uses {mermaid} executable code blocks. Standard GFM ```mermaid fences won't render.
Quarto syntax:
```{mermaid}
flowchart TD
A[Load Data] --> B[Clean Data]
B --> C[Analyze]
```
Note: GFM ```mermaid blocks and Markdown('```mermaid...') calls output raw text instead of rendered diagrams. Use the native {mermaid} syntax directly in markdown.
Common Mermaid diagram types:
flowchart - Process flows, decision treessequenceDiagram - Interaction sequencesclassDiagram - Object relationshipserDiagram - Entity-relationship diagramsgantt - Project timelinespie - Simple pie chartsExample: Reasoning flow:
```{mermaid}
flowchart LR
F1[Fact: Q4 Sales = $2.1M] --> C1[Conclusion: Strong Quarter]
F2[Fact: YoY Growth = 15%] --> C1
F3[Fact: Top Product = Widget X] --> C2[Conclusion: Focus Marketing on Widgets]
```
Preventing Mermaid Clipping in PDF:
Mermaid diagrams can get clipped in PDF output when they exceed page width. Use %%{init}%% directives to control sizing:
```{mermaid}
%%{init: {"flowchart": {"useMaxWidth": true}}}%%
flowchart TD
A[Start] --> B[Process]
B --> C[End]
```
Best practices for PDF mermaid:
useMaxWidth: true for flowcharts in PDF outputTD (top-down) over LR (left-right) for wide diagramsformat:
pdf:
mermaid:
theme: default
For HTML output, use Plotly for interactivity:
import plotly.express as px
# Interactive line chart
fig = px.line(df, x='date', y='sales', title='Sales Trend (Interactive)')
fig.update_layout(hovermode='x unified')
fig.show()
# Interactive scatter with hover
fig = px.scatter(df, x='marketing_spend', y='sales',
hover_data=['product', 'region'],
title='Marketing ROI Analysis')
fig.show()
Note: Plotly charts only work in HTML output. For PDF/Word, use matplotlib/seaborn.
What are you showing?
├─ Change over time? → Line chart
├─ Compare categories? → Bar chart (vertical or horizontal)
├─ Show distribution? → Histogram or box plot
├─ Show composition? → Stacked bar or pie chart (if < 5 categories)
├─ Show relationship? → Scatter plot
├─ Show process/flow? → Mermaid flowchart
└─ Multiple variables? → Faceted plots or small multiples
For any mathematical content, ALWAYS use LaTeX notation - it's professional and renders beautifully in all formats.
The equation $E = mc^2$ shows the relationship between energy and mass.
The growth rate is approximately $\alpha = 0.15$ or 15%.
We calculated the mean $\mu = \frac{\sum x_i}{n}$ from the dataset.
The quadratic formula is:
$$
x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
$$
The normal distribution probability density function:
$$
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
$$
$$
\begin{aligned}
\text{Revenue} &= \text{Price} \times \text{Quantity} \\
&= \$50 \times 1000 \\
&= \$50{,}000
\end{aligned}
$$
Statistics:
- Mean: $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$
- Variance: $\sigma^2 = \frac{1}{n}\sum_{i=1}^{n} (x_i - \mu)^2$
- Standard deviation: $\sigma = \sqrt{\sigma^2}$
- Correlation: $\rho_{X,Y} = \frac{\text{cov}(X,Y)}{\sigma_X \sigma_Y}$
Finance:
- Compound interest: $A = P\left(1 + \frac{r}{n}\right)^{nt}$
- NPV: $NPV = \sum_{t=0}^{N} \frac{C_t}{(1+r)^t}$
- ROI: $ROI = \frac{\text{Gain} - \text{Cost}}{\text{Cost}} \times 100\%$
Linear regression:
$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon
$$
where $\beta_0$ is the intercept, $\beta_i$ are coefficients, and $\epsilon \sim N(0, \sigma^2)$.
Einstein's mass-energy equivalence:
$$
E = mc^2
$$ {#eq-einstein}
As shown in @eq-einstein, energy and mass are equivalent.
✅ DO:
\text{} for text within equations: $\text{Revenue} = \$1{,}000$\alpha, \beta, \mu, \sigma, \sum, \prod, \intaligned environment for multi-line equations\$1{,}000 for currency with comma separators\times for multiplication: $5 \times 10$\cdot for dot product: $\vec{a} \cdot \vec{b}$❌ DON'T:
## Regression Analysis
We fitted a linear model:
$$
\text{Sales} = \beta_0 + \beta_1 \times \text{Marketing Spend} + \epsilon
$$ {#eq-sales-model}
where $\epsilon \sim N(0, \sigma^2)$ represents random error.
### Results
The estimated parameters from @eq-sales-model are:
- Intercept: $\hat{\beta_0} = 50{,}000$ (SE = $2{,}500$)
- Slope: $\hat{\beta_1} = 3.2$ (SE = $0.4$)
- $R^2 = 0.78$
This indicates that each additional \$1 in marketing spend yields approximately \$3.20 in sales ($p < 0.001$).
For publication-quality tables in PDF output, use LaTeX table formatting alongside Great Tables.
```{=latex}
\begin{table}[htbp]
\centering
\caption{Quarterly Sales Performance}
\label{tab:sales}
\begin{tabular}{lrrrr}
\hline
Quarter & Revenue (\$) & Growth (\%) & Units & Margin (\%) \\
\hline
Q1 2024 & 1,250,000 & 15.2 & 5,000 & 22.5 \\
Q2 2024 & 1,450,000 & 16.0 & 5,800 & 23.1 \\
Q3 2024 & 1,680,000 & 15.9 & 6,700 & 24.0 \\
Q4 2024 & 1,920,000 & 14.3 & 7,300 & 24.5 \\
\hline
\textbf{Total} & \textbf{6,300,000} & \textbf{15.4} & \textbf{24,800} & \textbf{23.5} \\
\hline
\end{tabular}
\end{table}
```
```{=latex}
\begin{table}[htbp]
\centering
\caption{Statistical Summary of Key Metrics}
\label{tab:statistics}
\begin{tabular}{lcccc}
\toprule
Metric & Mean & SD & Min & Max \\
\midrule
Sales (\$) & 125{,}000 & 25{,}000 & 75{,}000 & 200{,}000 \\
Orders & 500 & 120 & 300 & 750 \\
AOV (\$) & 250 & 45 & 180 & 380 \\
Churn (\%) & 12.5 & 3.2 & 8.0 & 18.5 \\
\bottomrule
\end{tabular}
\end{table}
```
booktabs provides professional-looking horizontal rules (better than \hline).
Add to YAML frontmatter for booktabs:
header-includes:
- \usepackage{booktabs}
Option 1: Great Tables with LaTeX output
from great_tables import GT
# Great Tables can export to LaTeX
table = GT(df.head(10))
table.save("table.tex", format="latex")
Option 2: pandas to_latex() with styling
import pandas as pd
# Format DataFrame for LaTeX
df_formatted = df.head(10).copy()
df_formatted['Sales'] = df_formatted['Sales'].apply(lambda x: f"\\${x:,.0f}")
df_formatted['Growth'] = df_formatted['Growth'].apply(lambda x: f"{x:.1f}\\%")
# Export to LaTeX with booktabs
latex_table = df_formatted.to_latex(
index=False,
caption="Top 10 Products by Sales",
label="tab:top-products",
position="htbp",
column_format="lrrr",
escape=False, # Don't escape $ and %
formatters={
'Sales': lambda x: f"\\${x:,.0f}",
'Growth': lambda x: f"{x:.1f}\\%"
}
)
print(latex_table)
Option 3: tabulate with LaTeX output
from tabulate import tabulate
latex_table = tabulate(
df.head(10),
headers='keys',
tablefmt='latex_booktabs', # Use booktabs style
showindex=False,
floatfmt='.2f'
)
print(f"\\begin{{table}}[htbp]\n\\centering\n\\caption{{Sales Summary}}\n{latex_table}\n\\end{{table}}")
✅ DO:
booktabs package for professional horizontal rules (\toprule, \midrule, \bottomrule)\caption{}\label{tab:name}[htbp] (here, top, bottom, page){lrr} column format\textbf{} for bold text (totals, headers)\centering❌ DON'T:
\hline - use booktabs rules instead (\toprule, \midrule, \bottomrule)|) - they look unprofessionalAs shown in Table @tbl-sales, revenue increased across all quarters.
```{=latex}
\begin{table}[htbp]
\centering
\caption{Quarterly Revenue}
\label{tbl-sales}
...
\end{table}
Results from @tbl-sales indicate strong growth momentum.
#### When to Use LaTeX Tables vs Great Tables
**Use LaTeX tables when:**
- Creating PDF output with publication-quality typesetting
- Need precise control over table layout and spacing
- Working with complex multi-row/multi-column headers
- Creating tables for academic papers or formal reports
- Need to match specific journal formatting requirements
**Use Great Tables when:**
- Creating HTML output with interactive features
- Need quick table formatting without LaTeX complexity
- Working with markdown output
- Want consistent styling across HTML/PDF/Word formats
- Need color scales, data bars, or rich HTML styling
**Use both:**
```python
# Create table with Great Tables for HTML
gt_table = GT(df).fmt_currency(columns="sales")
gt_table # Displays in HTML
# Also export LaTeX version for PDF
df.to_latex(caption="Sales Summary", label="tab:sales")
---
title: "My Report"
author: "Josh Lane"
date: "2024-01-30"
format: gfm # GitHub-flavored markdown (default for composability)
---
---
title: "Analysis Report"
format:
gfm:
wrap: none
variant: +yaml_metadata_block
html:
theme:
dark: darkly # Dark mode (recommended)
light: flatly # Light mode fallback
toc: true
code-fold: true
code-tools: true
pdf:
toc: true
number-sections: true
geometry: margin=1in
filters:
- auto-dark # Auto-detect system preference
---
---
title: "Data Analysis"
format:
md:
output-file: "results.md"
variant: gfm # Use GitHub-flavored markdown
preserve-yaml: true
gfm:
wrap: none
output-file: "results-gfm.md"
---
---
title: "Technical Report"
format:
pdf:
documentclass: article
fontsize: 11pt
geometry:
- margin=1in
- paperwidth=8.5in
- paperheight=11in
toc: true
toc-depth: 2
number-sections: true
colorlinks: true
fig-pos: 'H'
include-in-header:
text: |
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhead[L]{My Company}
\fancyhead[R]{\thepage}
---
CRITICAL: Suppress Quarto's auto-generated title block for PDF output. The default title/author/date YAML fields trigger LaTeX's \maketitle, producing an academic-style centered title block that is too formal for most reports. Markdown #/## headings produce \section{}/\subsection{} with large font, bold, and extra spacing — also unwanted for dense reports.
Use this pattern instead:
YAML frontmatter — omit title, author, date; suppress page number on page 1:
---
format:
pdf:
toc: false
number-sections: false
geometry: margin=1in
fontsize: 11pt
documentclass: article
pdf-engine: lualatex
include-before-body:
text: |
\thispagestyle{empty}
execute:
echo: false
warning: false
jupyter: python3
---
Inline header — put this immediately after the YAML block as the first content:
```{=latex}
\begin{minipage}[t]{0.65\textwidth}
{\large\textbf{Document Title · Subtitle}}
\end{minipage}%
\begin{minipage}[t]{0.35\textwidth}
\raggedleft{\small Josh Lane · Feb 2026}
\end{minipage}
\vspace{3pt}
\hrule
\vspace{10pt}
```
Section headings — replace all Markdown #/## headings with raw LaTeX:
```{=latex}
\noindent{\large\textbf{Section Title}}
\vspace{6pt}
```
Why this is better:
\maketitleWhen to use Markdown headings instead: Only for long documents (>10 pages) where readers need a rendered TOC or cross-references (@sec-name). In that case, restore number-sections: true and toc: true.
PREFER auto-dark with dual themes (see next section) over single-theme HTML:
---
format:
html:
theme:
dark: darkly # RECOMMENDED: Dual theme with auto-dark
light: flatly
css: custom.css
toc: true
toc-location: left
code-fold: show
code-tools: true
filters:
- auto-dark # Respects user's system preference
---
Single theme (discouraged - doesn't respect user preference):
---
format:
html:
theme: darkly # Only use if auto-dark not available
css: custom.css
toc: true
toc-location: left
---
Dark mode is STRONGLY ENCOURAGED for all HTML output:
Install auto-dark extension (one-time setup):
# In your project or home directory (applies to all projects)
quarto add gadenbuie/quarto-auto-dark --no-prompt
Use in document (RECOMMENDED default):
---
title: "My Analysis"
format:
html:
theme:
dark: darkly # Theme for dark mode (PRIMARY)
light: flatly # Theme for light mode (fallback)
filters:
- auto-dark # Automatically switch based on system preference
---
Best Practices:
Available dark themes:
darkly - Dark Bootstrap theme (RECOMMENDED - clean, professional)cyborg - Dark blue theme (good for technical docs)slate - Dark gray theme (subtle, minimal)solar - Dark solarized theme (warm, comfortable)superhero - Dark comic book theme (bold, high contrast)vapor - Dark retro theme (stylized)Light themes (fallback only):
flatly - Clean modern theme (recommended fallback)cosmo - Friendly blue themelumen - Light gray themesandstone - Warm sandy thememinty - Fresh mint themejournal - Newspaper styleThe auto-dark filter automatically detects system dark mode preference and switches themes accordingly. Users on light mode systems will see the light theme, while dark mode users (majority) get the dark theme.
IMPORTANT: Table of contents (TOC) is usually unnecessary noise in Quarto documents.
Problems with TOC:
Better alternatives:
ONLY use TOC for:
Even then, prefer:
Don't include toc: true in YAML frontmatter:
# ❌ BAD: Unnecessary TOC
---
title: "My Analysis"
format:
html:
toc: true # DON'T DO THIS
---
# ✅ GOOD: Clean document without TOC
---
title: "My Analysis"
format:
html:
theme:
dark: darkly
light: flatly
---
If you must use TOC, make it minimal:
format:
html:
toc: true
toc-depth: 2 # Only top 2 levels
toc-location: left # Sidebar, not top
toc-title: "Contents" # Short title
PDF format is PREFERRED for sharing via Google Drive (read-only, professional appearance).
# Basic PDF output
quarto render analysis.qmd --to pdf
# Multiple formats
quarto render analysis.qmd --to gfm,pdf,html
# 1. Render to PDF
quarto render analysis.qmd --to pdf
# 2. Upload to Google Drive
# - Go to drive.google.com
# - Click "New" → "File upload"
# - Select analysis.pdf
# - Share link with collaborators
# Alternative: Use gspace CLI (if installed)
gspace upload analysis.pdf --folder "Reports"
Alternative workflow: Render to HTML with Google Docs-compatible CSS, then copy-paste.
This workflow is useful when:
Location: ~/.files/quarto/styles/gdocs.css (symlinked to ~/.config/quarto/styles/)
The CSS matches Google Docs defaults:
Option 1: Reference from user config (RECOMMENDED)
---
title: "My Document"
format:
html:
css: ~/.config/quarto/styles/gdocs.css
embed-resources: true
minimal: true
---
Option 2: Copy CSS to project directory
# Copy CSS to project
mkdir -p .quarto/styles
cp ~/.files/quarto/styles/gdocs.css .quarto/styles/
# Use in document
# css: .quarto/styles/gdocs.css
Option 3: Inline in YAML
---
format:
html:
include-in-header:
text: |
<style>
body { font-family: Arial, sans-serif; font-size: 11pt; line-height: 1.15; }
table { border-collapse: collapse; margin: 12pt auto; }
th, td { border: 1pt solid #000; padding: 2pt 6pt; vertical-align: middle; }
th { background-color: #f3f3f3; font-weight: bold; }
</style>
---
# 1. Render to HTML with Google Docs CSS
quarto render analysis.qmd --to html
# 2. Open in browser
open analysis.html
# 3. Select all and copy (Cmd+A, Cmd+C)
# 4. Paste into Google Docs (Cmd+V)
Key CSS properties for Google Docs compatibility:
/* Body - matches Google Docs defaults */
body {
font-family: Arial, sans-serif;
font-size: 11pt;
line-height: 1.15;
}
/* Tables - critical for clean copy-paste */
table {
border-collapse: collapse;
margin: 12pt auto;
page-break-inside: avoid;
}
th, td {
border: 1pt solid #000;
padding: 2pt 6pt; /* Minimal padding for compact tables */
vertical-align: middle;
line-height: 1;
}
/* Remove spacing from cell content */
th *, td * {
margin: 0 !important;
padding: 0 !important;
line-height: 1 !important;
}
th {
background-color: #f3f3f3;
font-weight: bold;
}
Use HTML copy-paste when:
Use PDF upload when:
Extra whitespace in tables:
line-height: 1 on cellspadding: 2pt 6pt for minimal paddingvertical-align: middle to prevent vertical gapsFonts not matching:
font-family: Arial, sans-serif (Google Docs default)Tables not copying correctly:
border-collapse: collapseborder: 1pt solid #000)embed-resources: true in YAMLCharts/images not copying:
embed-resources: true to inline images---
title: "Quarterly Review"
format: revealjs
---
## Slide 1
Content here
## Slide 2
```{python}
import matplotlib.pyplot as plt
# Code executes, output shown
Slide with background image
**Render:**
```bash
quarto render slides.qmd --to revealjs
Features:
---
title: "Report"
format: pptx
---
## Slide Title
- Bullet point
- Another point
## Chart Slide
```{python}
# Chart code
**Render:**
```bash
quarto render slides.qmd --to pptx
See @fig-sales for the trend.
```{python}
#| label: fig-sales
#| fig-cap: "Sales over time"
plt.plot(df['date'], df['sales'])
plt.show()
```
As shown in @tbl-summary:
```{python}
#| label: tbl-summary
#| tbl-cap: "Summary statistics"
df.describe()
```
## Introduction {#sec-intro}
Content here.
## Analysis
As discussed in @sec-intro...
---
bibliography: references.bib
---
According to @smith2020, the results show...
Multiple citations [@smith2020; @jones2021].
BibTeX file (references.bib):
@article{smith2020,
title={Analysis of Data},
author={Smith, John},
journal={Journal of Science},
year={2020}
}
# First time setup
quarto publish gh-pages
# Subsequent updates
quarto publish gh-pages
# Publish to free Quarto hosting
quarto publish quarto-pub
quarto publish netlify
Markdown-First Workflow (PREFERRED):
# DEFAULT: Render to markdown for composability
quarto render analysis.qmd --to gfm # GitHub-flavored markdown
quarto render analysis.qmd --to md # Plain markdown
# Pipe markdown output to other tools
quarto render analysis.qmd --to md --output - | \
grep "^##" | \
sed 's/^## //' > toc.txt
# Generate markdown, then convert to PDF only if needed
quarto render analysis.qmd --to gfm
quarto render analysis.qmd --to pdf # Optional
Why Markdown Default:
Quarto as a Pipeline Component:
# Composition pattern: process → transform → render → markdown
conform notes.txt --schema schema.json | \
jq '.items[]' | \
python generate_qmd.py > report.qmd && \
quarto render report.qmd --to gfm
# Markdown → Post-processing
quarto render analysis.qmd --to gfm && \
sed -i 's/TODO/DONE/g' analysis.md
Do One Thing Well:
Text Streams:
# Quarto accepts stdin
cat document.md | quarto render - --to gfm --output results.md
# Pipe through processing
cat data.json | \
jq -r '.[] | "- \(.item)"' | \
quarto render /dev/stdin --to gfm --output list.md
Silent Success:
# Quarto is quiet on success
quarto render doc.qmd --to gfm
echo $? # 0 = success
# Use --quiet for scripts
quarto render doc.qmd --to gfm --quiet
# Default: Generate markdown with executed results
quarto render analysis.qmd --to gfm
# Only create PDF/HTML when specifically needed for distribution
quarto render analysis.qmd --to gfm,pdf # Markdown + PDF
quarto render analysis.qmd --to gfm,html # Markdown + HTML
# Markdown for archival, PDF for sharing
quarto render report.qmd --to gfm
quarto render report.qmd --to pdf # Only when needed
# Render all formats (markdown first)
quarto render report.qmd --to gfm,pdf,html
# Or explicitly
quarto render report.qmd --to gfm # Primary output
quarto render report.qmd --to pdf # For Google Drive sharing
quarto render report.qmd --to html # For web viewing
# Render all .qmd files to markdown
for file in *.qmd; do
quarto render "$file" --to gfm
done
# Or use find
find . -name "*.qmd" -exec quarto render {} --to gfm \;
# Parallel processing with xargs
find . -name "*.qmd" | xargs -P 4 -I {} quarto render {} --to gfm
# Use custom template
quarto render doc.qmd --template custom-template.tex
---
title: "Monthly Report"
format: gfm
params:
month: "January"
year: 2024
---
## Report for `{python} params['month']` `{python} params['year']`
```{python}
month = params['month']
year = params['year']
# Analysis using params
**Render with parameters:**
```bash
# Markdown output with parameters
quarto render report.qmd -P month:February -P year:2024 --to gfm
# Generate markdown for all months
for month in Jan Feb Mar Apr; do
quarto render report.qmd -P month:$month --to gfm --output "${month}_report.md"
done
# Generate analysis data
python analysis.py > data.json
# Create Quarto report with embedded data loading
cat > report.qmd << 'EOF'
---
title: "Analysis Report"
execute:
cache: true
---
```{python}
#| cache: true
import subprocess
import json
# Reproducible: document defines how data is generated
result = subprocess.run(['python', 'analysis.py'], capture_output=True, text=True, check=True)
data = json.loads(result.stdout)
# Render findings
EOF
quarto render report.qmd --to gfm
## Best Practices
### 1. Choose the Right Input Format
**Use `.qmd` when:**
- Starting new analysis
- Need maximum Quarto features
- Want native integration
**Use `.ipynb` when:**
- Already have Jupyter notebooks
- Collaborating with Jupyter users
- Need Jupyter-specific features
### 2. Organize Code Blocks
```qmd
## Good: Logical chunks
```{python}
# Load data
import pandas as pd
df = pd.read_csv("data.csv")
# Analyze
total = df['sales'].sum()
import pandas as pd
df = pd.read_csv("data.csv")
total = df['sales'].sum()
avg = df['sales'].mean()
# ... 50 more lines
### 3. Use Frontmatter for Configuration
```yaml
# Good: Centralized config
---
format:
pdf:
toc: true
number-sections: true
html:
code-fold: true
---
# Bad: Repeating options in CLI
# quarto render doc.qmd --to pdf --toc --number-sections
```{python}
#| cache: true
# Expensive calculation cached
result = expensive_analysis(large_dataset)
### 5. Version Control
```gitignore
# .gitignore for Quarto projects
_site/
_book/
*.html
*.pdf
.quarto/
Commit:
.qmd source files_quarto.yml configreferences.bib.quarto/ cache directoryProblem: LaTeX errors when rendering PDF
Solution 1: Install TinyTeX
quarto install tinytex
Solution 2: Use Typst (modern alternative)
---
format:
typst: default
---
Solution 3: Use Chrome headless
---
format:
pdf:
pdf-engine: chrome
---
Problem: Code blocks don't run
Check:
python3 --version)pip install pandas matplotlib)Debug:
quarto render doc.qmd --execute-debug
Problem: Can't render .ipynb files
Solution:
# Install Jupyter
python3 -m pip install jupyter
# Or use uv
uv tool install jupyter
quarto render doc.qmd # Render with default format
quarto render doc.qmd --to pdf # Render to PDF
quarto render doc.qmd --to html # Render to HTML
quarto preview doc.qmd # Live preview
quarto create project website site # Create website project
quarto publish gh-pages # Publish to GitHub Pages
quarto install tinytex # Install LaTeX
quarto check # Verify installation
--to gfm # GitHub-flavored markdown (default)
--to pdf # PDF via LaTeX or typst
--to html # HTML
--to revealjs # HTML slides
--to pptx # PowerPoint
--to typst # Typst (modern LaTeX alternative)
--to epub # eBook
---
title: "Document Title"
author: "Josh Lane"
date: "2024-01-30"
format:
pdf:
toc: true # Table of contents
number-sections: true # Numbered sections
geometry: margin=1in # Page margins
html:
toc: true
code-fold: true # Collapsible code
theme: cosmo # Visual theme
---
When writing Quarto documents, strategy memos, or any analytical report:
all_customers_revenue_final), field names (salesforce_account_id, recognized_revenue_c), database names (ep-core-data), query tools (BigQuery, Prophet), or internal implementation details belong in code cells onlyall_customers_revenue_final joined via salesforce_account_id"changepoint_prior_scale"| Prose | Code |
|---|---|
| "actual revenue from sales-linked accounts" | JOIN all_customers_revenue_final ON salesforce_account_id |
| "trend model incorporating deal count" | Prophet.add_regressor('won_count_norm') |
| "CRM-linked accounts" | WHERE salesforce_account_id IS NOT NULL |
| "80% of revenue is attributable" | sf_corr_attributed_pct = ... |
| "pipeline data available since April 2024" | cache TTL, query date bounds |
data-ai
Delegate research and context-gathering tasks to a sub-agent to protect the primary context window. Use when the user asks to "research X", "look into X", "find out about X", "gather context on X", or any investigative framing where answering requires 2+ searches or multiple sources. Also use proactively before starting substantive work when prior context is unknown. Never run research inline — always delegate.
documentation
--- name: qmd-math description: Math notation conventions for Quarto/EPQ documents rendered via lualatex. Use when: writing or adding a formula, equation, or mathematical expression to a .qmd file; asked about display math, inline math, or LaTeX notation in a QMD/Quarto context; defining a where-clause or variable definitions for an equation; converting prose variable descriptions into structured math notation; fixing math that renders badly in a PDF; using \lvert, \begin{aligned}, \tfrac, \text
development
Trim a prose document (README, design doc, blog post, notes) for readability by cutting redundancy, filler, and dead weight in the author's own words. Invoke with /trim [file path], or /trim alone to be prompted for a file. Not for source code, data files, or summarization.
business
Query and analyze Josh Lane's org headcount from the staffing DuckDB at ~/workspace/areas/staffing/staffing.duckdb. Use when asked about headcount counts, org structure, direct reports, team breakdown, hiring/attrition trends, international employees, salary/pay grade distribution, offboarding lag, or any question about people in Josh's org. Triggers on questions about how many people, who reports to whom, headcount by team/country/level, who joined or left, org size, staffing, headcount trend.