Quarto Skill

Quarto is an open-source scientific and technical publishing system built on Pandoc. It renders computational documents (with Python, R, Julia code) to publication-quality output in multiple formats.

Bootstrap with `epq scaffold` (PREFERRED for new projects)

New QMD analysis projects must be created via the epq CLI — do NOT manually create pyproject.toml, _quarto.yml, justfile, figures/_style.py, or latex-header.tex:

epq scaffold ~/workspace/projects/my-analysis   # generates all boilerplate
cd ~/workspace/projects/my-analysis
just bootstrap                                   # uv sync + ipykernel install
epq audit .                                      # verify clean (0 warnings)

epq scaffold generates: _quarto.yml (with jupyter: set), latex-header.tex (local copy — no external path dep at render time), justfile (thin import wrapper for canonical recipes), pyproject.toml (package = false, epq editable install), .gitignore, figures/fig_example.py (canonical dev loop), {name}_files/figure-pdf/ pre-created.

To audit or retrofit an existing project:

epq audit <path>    # JSON violations with file/line/suggestion
epq fix <path>      # unified diffs for auto-fixable issues (LLM reviews and applies)
epq list-rules      # enumerate all rule IDs

Shared library — import in every analysis QMD setup cell:

from epq import style, cache, bq, fmt

style.apply_style()          # canonical rcParams (150/200 DPI, sans-serif fallbacks)
style.NAVY, style.TEAL, ...  # workspace palette — never redefine inline
cache.read_cache("name")     # 24h TTL file-based cache → None if stale/missing
bq.run_bq_query(SQL)         # subprocess bigquery CLI wrapper, max_results=10000
fmt.millions_formatter()     # FuncFormatter for ax.yaxis.set_major_formatter()

Full authoring reference: ~/src/analysis-doc/docs/AGENTS.md Retrofit guide: ~/src/analysis-doc/docs/RETROFIT.md

External Python Figures Pattern (PREFERRED for complex documents)

For documents with multiple visualizations, extract all matplotlib code into standalone Python modules in figures/. The QMD becomes a thin shell with stub cells only.

Architecture

{name}.qmd          ← thin shell: prose + data-load cells + stub figure cells only
_quarto.yml         ← jupyter: {name} (set by epq scaffold)
latex-header.tex    ← local copy (set by epq scaffold)
justfile            ← imports ~/src/analysis-doc/tools/justfile
figures/
  __init__.py
  fig_NAME.py       ← one module per figure; render(data) contract
scripts/data/
  extract_NAME.py   ← standalone BigQuery extractor; writes data/cache/*.json
data/cache/         ← JSON cache files (gitignored)
{name}_files/
  figure-pdf/       ← dev loop writes here (pre-created by epq scaffold)

Figure Module Contract

One file per figure (not a dispatcher). render(data: dict) is the only public function.

# figures/fig_revenue.py
from pathlib import Path
from epq import style, fmt  # palette and formatters from epq — never local _style.py

LABEL = "fig-revenue"       # matches QMD #| label: fig-revenue
FIG_WIDTH = 8.5
FIG_HEIGHT = 4.0
FIG_CAP = "Insight-focused caption — not a data description."


def render(data: dict) -> None:
    """Render figure. Called from QMD stub cell with shared data dict.

    Do NOT call plt.show() or plt.savefig() here — Quarto handles capture.
    Do NOT call plt.close() here — handled in __main__ and between QMD cells.
    """
    import matplotlib.pyplot as plt
    style.apply_style()
    fig, ax = plt.subplots(figsize=(FIG_WIDTH, FIG_HEIGHT))

    # ... all matplotlib code, reading from data dict ...
    df = data.get("revenue", _load_sample_data())
    ax.bar(df["month"], df["revenue"], color=style.NAVY)
    ax.yaxis.set_major_formatter(fmt.millions_formatter())
    ax.set_title(FIG_CAP.rstrip("."))
    plt.tight_layout()


def _load_sample_data():
    """Synthetic fallback for dev loop (no BQ needed)."""
    import pandas as pd
    return pd.DataFrame({"month": ["Q1", "Q2", "Q3", "Q4"],
                         "revenue": [1.2e6, 1.4e6, 1.3e6, 1.6e6]})


if __name__ == "__main__":
    """Dev loop — saves to {project}_files/figure-pdf/{LABEL}-output-1.png.

    Writes to the same path Quarto uses, so visual inspection is against the
    real render artifact. Run via: just dev-fig revenue
    """
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt

    _project_root = Path(__file__).parent.parent
    _out_dir = _project_root / f"{_project_root.name}_files" / "figure-pdf"
    _out_dir.mkdir(parents=True, exist_ok=True)
    out = str(_out_dir / f"{LABEL}-output-1.png")

    render({})                                          # synthetic data fallback
    plt.savefig(out, dpi=150, bbox_inches="tight")     # savefig BEFORE close
    print(f"Saved {out}")
    plt.close("all")

QMD Stub Cell

#| label: fig-revenue
#| fig-cap: "Revenue by stream."
#| fig-height: 4.0
#| fig-width: 8.5
#| fig-pos: "H"
#| out-width: 100%
import sys; sys.path.insert(0, str(Path("."))) if "." not in sys.path else None
from figures import fig_revenue
fig_revenue.render(data)

Justfile (generated by `epq scaffold`)

Projects get a thin justfile that imports the canonical recipe library:

# Project-local overrides go here. Canonical recipes imported below.
import? '~/src/analysis-doc/tools/justfile'

The canonical justfile provides:

# Render figure to {project}_files/figure-pdf/fig-NAME-output-1.png
dev-fig NAME:
    PYTHONPATH=. uv run python figures/fig_{{NAME}}.py
    @echo "→ $(basename $PWD)_files/figure-pdf/fig-{{NAME}}-output-1.png"

# Full render (clears Jupyter cache)
render:
    rm -rf .jupyter_cache/
    quarto render *.qmd

Figure Audit and Visual Iteration Protocol

CRITICAL: Any task involving figures — iteration, audit, or review — MUST render and visually inspect the PNG before reporting. Code-only review is incomplete.

Required sequence for every figure change or audit:

just dev-fig NAME → {project}_files/figure-pdf/fig-NAME-output-1.png
Read the PNG using the Read tool — do not skip this step
Apply the visual readability checklist to what you see in the PNG
Fix issues, re-render, re-inspect until checklist passes
Only then check factual accuracy against prose/data

Do NOT use just preview-fig / Playwright — it screenshots HTML chrome, not the raw figure. Read the PNG directly from {project}_files/figure-pdf/.

Visual Readability Checklist (inspect the rendered PNG, not the code):

[ ] Suptitle not clipped — suptitle(y=1.02) ALWAYS clips in PDF. Use y=1.0 + fig.subplots_adjust(top=0.82–0.88). Never use y > 1.0.
[ ] No text collisions — suptitle and panel titles have clear separation
[ ] Text readable at print scale — PDF page is 6.5in wide; >1.5× zoom = illegible
[ ] Panels not squished — 3-panel: FIG_HEIGHT ≥ 4.0; single-panel ≥ 3.0; swim-lane ≥ 4.5
[ ] Bar labels within bounds — set_clip_on(True) makes labels invisible, not small
[ ] No partial annotations — text near ylim/xlim edges fully in frame
[ ] Text contrast correct — see color guard below

Text contrast rule (most common source of invisible text):

# Use epq.style helper — covers all palette fills correctly:
from epq import style
tc = style.text_color_for(fill_color)   # WHITE for dark fills, NAVY for light

# Manual guard if needed:
DARK_FILLS = (style.NAVY, style.TEAL, style.CORAL, style.PURPLE, style.GOLD)
tc = style.WHITE if fill in DARK_FILLS else style.NAVY
#                                               ^^^^ NAVY, never SLATE
# SLATE on LIGHT_SLATE = 2.6:1 contrast — fails AA, looks muddy at print scale

Verified contrast ratios:

NAVY/TEAL/CORAL/PURPLE/GOLD fills → WHITE text (8–16:1 ✅)
LIGHT_SLATE fill → NAVY text (5.5:1 ✅); SLATE fails (2.6:1 ❌)
Pale backgrounds (LIGHT_BG, NAVY_BG) → NAVY text; WHITE is invisible (~1.07:1 ❌)

Common visual defects invisible in code:

Suptitle bleeding into panel titles → increase FIG_HEIGHT, use subplots_adjust
Bar labels extending past xlim → invisible (not small); expand xlim or reduce offset
Blank/white PNG → plt.savefig() called after plt.close(). Fix: call savefig first:
```
render({})
plt.savefig(out, dpi=150, bbox_inches="tight")
plt.close("all")
```

Key Rules

Edit figures/fig_NAME.py, not the QMD — QMD stubs never change unless label/caption/dimensions change
One module per figure — render(data: dict) -> None, no dispatcher pattern
from epq import style, fmt — never a local _style.py copy
PYTHONPATH=. required when running modules directly: PYTHONPATH=. uv run python figures/fig_NAME.py
render() must NOT call plt.show(), plt.close(), or plt.savefig() — Quarto handles capture; __main__ handles save
matplotlib.use("Agg") before any pyplot import in __main__
data dict is the only input — never read cache files inside figure modules; provide synthetic fallback in _load_sample_data()
Dev loop output: {project}_files/figure-pdf/{LABEL}-output-1.png (pre-created by epq scaffold)
Reference implementation: ~/workspace/projects/luma-revenue-forecast/ and ~/workspace/projects/revenue-forecast-2026/

PDF Project Bootstrap

Use epq scaffold — it handles all of the following automatically:

epq scaffold ~/workspace/projects/my-project
cd ~/workspace/projects/my-project
just bootstrap    # runs: uv sync + ipykernel install + mkdir data/cache

Manual checklist (only if NOT using epq scaffold):

Create pyproject.toml with [tool.uv] package = false and run uv sync
Register kernel: uv run python -m ipykernel install --user --name=<project>
_quarto.yml: set jupyter: <project> (must match registered kernel name exactly)
Copy latex-header.tex locally from ~/src/analysis-doc/templates/ (do not reference it via external path)
Never borrow another project's venv or kernel — each project must own its own

PDF Title Suppression

Never use YAML title: or subtitle: — they produce \maketitle which double-renders with any custom header
Use a raw LaTeX inline header in the document body instead:

{\large\textbf{Document Title}}
\hfill
{\small\color{gray} Author \quad\textbullet\quad \today}
\vspace{4pt}
\hrule
\vspace{8pt}

Add pagetitle: " " to YAML to suppress the HTML <title> without breaking Pandoc

PDF Figure Rules

Split multi-story panels

If a combined figure has two sub-panels telling different insights, split into separate fig-* cells with their own captions. Ask upfront whether panels should be split — this is cheaper than rework after the fact.

Legend placement

Top-right (loc='upper right') when data occupies the bottom portion of the chart
Below chart when dense: fig.legend(loc='lower center', ncol=N, bbox_to_anchor=(0.5, -0.02)) + fig.subplots_adjust(bottom=0.20)
Never place legend over data area

Date axes — always explicit

Never rely on AutoDateLocator — it overcrowds on multi-year spans:

ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=[1, 7]))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b '%y"))
ax.tick_params(axis='x', labelsize=8)

Never auto-open artifacts

Justfile render recipes must not include && open <file>. The user opens files manually.

LaTeX Max-Runs Warning

"WARN: maximum number of runs (9) reached" is cosmetically harmless when there are no \ref{} cross-references in prose. It is caused by fancyhdr + many figures creating layout oscillation. Mitigations:

Use \needspace not \clearpage (see needspace sizing table in the \needspace section)
Remove labelformat=empty from \captionsetup — caption numbering churn drives oscillation
Use htbp float placement rather than forced H

\needspace sizing reference — X = figure height + 1.2in overhead:

| Figure height | \needspace | |---|---| | 3.2in | \needspace{4.5in} | | 4.0in | \needspace{5.2in} | | 5.5in | \needspace{6.8in} | | Prose only | \needspace{2.5in} |

BigQuery + Pandas Gotchas (in Quarto Documents)

BigQuery returns nullable Int64 for integer columns — always .astype('float64') before fillna()
Quarterly data on a monthly x-axis: df.set_index('quarter').reindex(monthly_idx, method='ffill') + ax.step(..., where='post')
Never put \n inside BigQuery SQL string literals in Python f-strings — use spaces instead
Always define intermediate variables BEFORE the Markdown(f"""...""") call — f-strings evaluate at call time, not definition time

Best Practices (TL;DR)

Markdown-First: Default to format: gfm with wrap: none for composability, portability, and archival
No Line Wrapping: Always use wrap: none for GFM output (avoids artificial line breaks)
Dark Mode: Always use auto-dark filter with dual themes for HTML output (accessibility and modern UX)
Visual Expression: Use charts and formatted tables, NEVER raw data dumps (df.head(), print(dict))
LaTeX for Math: Use LaTeX notation for ALL mathematical expressions ($\alpha = 0.15$, not "alpha = 0.15")
Professional Tables: Use LaTeX tables (booktabs) for PDF, Great Tables for HTML
No TOC: Table of contents is usually noise - use clear section headings instead
PDF for Sharing: Use --to pdf for Google Drive sharing (read-only, professional)
Blank Lines Before Lists: ALWAYS include a blank line before every list (bullet or numbered) - no exceptions
No Appendix for Sources: Data sources belong in code blocks, not appendix - only add external sources not directly referenced in code to appendix
Use Markdown() Class: ALWAYS use Markdown() for text output in code blocks - NEVER use print() or printf() (output must render as formatted markdown)
Minimal PDF Titling: For PDF output, suppress YAML title/author/date fields (they produce an academic title block via \maketitle). Use a raw LaTeX minipage inline header instead. Use ## markdown headings for section headings (NOT raw LaTeX \noindent{\large\textbf{...}} blocks — those fight with Quarto's float placement). Exception: only use raw LaTeX headings for the document title line itself.
PDF Figure Sizing: Every chart chunk MUST have #| fig-pos: "H", #| fig-width: N, #| fig-height: N, #| out-width: 100%. Always end chunks with plt.close('all'). Set ax.text(...).set_clip_on(True) on all annotation labels. Never use mixed coordinate transforms. See "PDF Figure Sizing — Critical Patterns" section for full details.

PDF Figure Sizing — Critical Patterns

CRITICAL: Matplotlib figure sizing in Quarto PDF output is failure-prone. Follow these rules exactly.

The Working Pattern (copy verbatim)

Every chart chunk that renders to PDF must have ALL of these chunk options:

#| label: fig-my-chart
#| fig-pos: "H"          # force-here via float.sty — prevents deferral/stacking
#| fig-width: 6.5        # must match figsize width in Python code
#| fig-height: 3.5       # must match figsize height in Python code
#| out-width: 100%       # tells LaTeX to scale to full text column width
#| fig-cap: "Caption text with no bare % characters — write 'percent' instead."

And in the Python code:

fig, ax = plt.subplots(figsize=(6.5, 3.5))  # must match chunk fig-width/fig-height
# ... chart code ...
plt.tight_layout()
plt.show()
plt.close('all')   # REQUIRED — prevents state leaking between chunks

rcParams That Must Not Be Changed

plt.rcParams.update({
    'savefig.bbox': None,        # fills declared figsize exactly — do NOT set to 'tight'
    'savefig.pad_inches': 0,
    'figure.dpi': 200,
    'savefig.dpi': 300,
})

savefig.bbox: None is counterintuitive but correct for PDF output. Setting it to 'tight' causes matplotlib to auto-expand the canvas, which fights against the declared figsize and produces malformed figure PDFs.

Root Causes of "Comically Small" Figures

These are the diagnosed failure modes, in order of frequency:

1. Text labels extending beyond xlim/ylim — MOST COMMON

# ❌ BROKEN: annotation text positioned beyond axis limits
for bar, row in zip(bars, df.itertuples()):
    ax.text(bar.get_width() + 4, ..., f"{row.value}")  # +4 may push past xlim
ax.set_xlim(0, 380)  # text at bar.width+4 can exceed 380 → bbox explosion

# ✅ FIX: clip text labels to axes, OR increase xlim to accommodate labels
for bar, row in zip(bars, df.itertuples()):
    t = ax.text(bar.get_width() + 4, ..., f"{row.value}")
    t.set_clip_on(True)   # ← prevents bbox from expanding to include clipped text
# OR: ax.set_xlim(0, 420)  # ensure xlim accommodates largest label

When ax.text() labels are placed at bar.get_width() + offset in data coordinates and those labels extend beyond xlim, the PDF backend measures the full artist bounding box (including out-of-bounds text) when computing the figure's page size. This causes the figure PDF to be output at half or less of the declared figsize — which LaTeX then renders at postage-stamp size even though out-width: 100% is set.

Rule: After setting xlim, add t.set_clip_on(True) to ALL ax.text() calls, or add enough xlim headroom to fit the longest annotation.

2. Mixed coordinate transforms on annotations

# ❌ BROKEN: mixes data coordinates with axis-fraction transform
ax.annotate("", xy=(x0, y0 + 3), xytext=(x1, y1 + 3), ...)  # data coords
ax.text(0.5, max_val + 9, "label", transform=ax.get_xaxis_transform())  # mixed!

# ✅ FIX: use pure axes fraction for floating annotations
ax.annotate("label text",
            xy=(0.5, 0.85), xycoords='axes fraction',
            ha='center', va='center', fontsize=9, ...)

ax.get_xaxis_transform() mixes x=axis-fraction with y=data coordinates. When the y-value in data coords exceeds ylim, the PDF backend's bounding box measurement goes pathological, producing figures that are 10-30× taller than declared. Always use pure coordinate systems — either all data coords or all xycoords='axes fraction'.

3. Missing plt.close('all') between chunks

Without plt.close('all') after each plt.show(), matplotlib figure state (transforms, layout engines, bounding boxes) leaks between Jupyter/Quarto execution chunks. This can cause later charts to inherit corrupt layout state from earlier ones. Always end every chart chunk with:

plt.tight_layout()
plt.show()
plt.close('all')

4. fig-pos: "!ht" instead of "H"

"!ht" (try-here, then top-of-page) causes LaTeX to defer figures when there's insufficient space, stacking them at awkward positions. "H" (force-here via float.sty) places the figure exactly where declared. Requires \usepackage{float} in the LaTeX header.

5. Missing #| fig-width / #| fig-height on chunk

Without explicit chunk-level sizing, Quarto uses YAML defaults and may not pre-allocate the correct float box size before Python renders into it. Always specify both per-chunk.

Diagnosing Figure Size Issues

To identify which figures are malformed without waiting for a full visual review:

# Add keep-tex to render temporarily
cd /path/to/doc && uv run quarto render doc.qmd --to pdf -M keep-tex:true

# Check actual page dimensions of each generated figure PDF
for f in doc_files/figure-pdf/*.pdf; do
  echo -n "$f: "
  pdfinfo "$f" 2>/dev/null | grep "Page size"
done

# A correct 6.5×3.5in figure should be ~468×252 pts (at 72 pts/in)
# A figure with width << 440 pts or height >> 400 pts is malformed

The Nuclear Option

If a figure keeps rendering incorrectly despite all fixes, force PNG raster output:

#| label: fig-problematic
#| fig-pos: "H"
#| dev: png            # ← bypass the PDF vector pipeline entirely
#| dpi: 150
#| fig-width: 6.5
#| fig-height: 3.5
#| out-width: 100%

PNG output is immune to all the bbox/transform issues because matplotlib renders to a fixed-size raster and Quarto embeds it directly. Use as a last resort since vector PDF is crisper.

Caption Numbering

% Restore default numbering (Figure 1., Figure 2., etc.) with bold prefix:
\captionsetup{font={small,it},justification=centering,skip=6pt,labelfont=bf}

% Suppress numbering (caption text only, no "Figure N." prefix):
\captionsetup{font={small,it},justification=centering,skip=6pt,labelformat=empty,labelsep=none}

Never use bare % in fig-cap strings — write "percent" or "percentage points" instead. LaTeX may fail to compile depending on pandoc version.

Reference Lines (axvline/axhline) Opacity

Reference lines (baselines, averages) should be visible context, not dominant elements. Always set alpha=0.4:

ax.axvline(48.9, color=NAVY,  linestyle=":",  linewidth=1.6, alpha=0.4, label="Baseline", zorder=3)
ax.axhline(37.8, color=SLATE, linestyle="--", linewidth=1.4, alpha=0.4, label="Avg", zorder=3)

At alpha=1.0 (default), reference lines dominate the chart and compete with the data bars. alpha=0.4 keeps them readable without visual dominance.

Visual Expression Philosophy

CRITICAL: Quarto documents are for COMMUNICATION, not raw data dumps.

Quarto outputs are static documents meant to convey insights to humans. Raw dataframes, print statements, and JSON blobs fail to communicate effectively.

Visual Hierarchy (Use in Order)

Charts/Plots - For trends, distributions, comparisons, relationships
Formatted Tables - For structured data with styling and context
Formatted Metrics - For key numbers with context and formatting
Raw Output - NEVER (not even for debugging - use separate analysis files)

Anti-Patterns: What NOT to Do

# ❌ BAD: Raw dataframe dump
df.head()

# ❌ BAD: Print statements - output renders as plain text, not formatted markdown
print(f"Total: {total}")
print(data_dict)

# ❌ BAD: printf/print for text - use Markdown() instead
print("## Summary\n- Item 1\n- Item 2")  # Renders as plain text!

# ❌ BAD: Bare variable returning data structure
result  # Returns raw dict/JSON

# ❌ BAD: DataFrame info without formatting
df.describe()
df.info()

# ✅ GOOD: Use Markdown() for ALL text output
from IPython.display import Markdown
Markdown(f"""
## Summary

- **Total**: {total:,}
- **Average**: {avg:.2f}
""")

❌ BAD: Plain text for mathematical notation
- The growth rate is alpha = 0.15 or 15%
- We calculated the mean mu = sum(xi)/n
- The correlation coefficient r = 0.85

❌ BAD: No table formatting
```python
print(df.head())

❌ BAD: Using asterisks for equations

E = m * c^2
y = beta0 + beta1 * x


### Good Patterns: Visual Communication

```python
# ✅ GOOD: Chart for trends
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
df.groupby('date')['sales'].sum().plot(ax=ax, kind='line')
ax.set_title('Sales Trend Over Time')
ax.set_ylabel('Sales ($)')
plt.tight_layout()
plt.show()

# ✅ GOOD: Formatted table using Great Tables
from great_tables import GT

(GT(df.head(10))
    .tab_header(title="Top 10 Sales Records")
    .fmt_currency(columns="sales", currency="USD")
    .fmt_date(columns="date", date_style="medium"))

# ✅ GOOD: Formatted table using pandas markdown
from IPython.display import Markdown
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))

# ✅ GOOD: Formatted metrics in markdown with LaTeX
from IPython.display import Markdown

Markdown(f"""
## Key Metrics

- **Total Sales**: ${total_sales:,.2f}
- **Average Order**: ${avg_order:,.2f}
- **Growth Rate**: $\\alpha = {growth_rate:.1%}$ (15% YoY)
- **Top Product**: {top_product}

### Statistical Summary

The linear regression model $y = \\beta_0 + \\beta_1 x + \\epsilon$ yielded:

- Slope: $\\hat{{\\beta_1}} = 3.2$ (SE = 0.4)
- $R^2 = 0.78$, indicating strong fit
""")

# NOTE: Mermaid diagrams must use native Quarto syntax outside Python blocks
# Use ```{mermaid} directly in markdown, NOT inside Markdown() calls

Why Visual Expression Matters

Documents are for humans: Show insights, not data structures
Static format: No interactive exploration - must communicate clearly on first view
Traceable reasoning: Visualize fact→conclusion chains, not raw JSON
Professional output: Charts and tables look polished in PDF/HTML/Word
Accessibility: Visual hierarchy helps readers navigate content
Shareability: Well-formatted outputs communicate without explanation

Narrative Structure

Build understanding progressively through a series of sections:

Document Flow

Abstract - The punchline first (executive summary for busy readers)
Key Findings - Scannable bullet points with confidence levels
Base Facts - Individual observations/data, each in its own section
Synthesis Sections - Combine earlier facts into higher-level insights
Conclusion - Final synthesis referencing the insights above

Base Facts (Individual Sections)

Each base fact is an independent observation with its own data and evidence. Use descriptive headers (not "Fact 1"):

## Response Time Distribution

\needspace{3in}

Analysis of the past 7 days shows significant tail latency:
- p50: 45ms
- p95: 230ms  
- p99: 890ms (concerning)

```{python}
#| echo: false
# Chart showing latency distribution


### Synthesis Sections

Synthesis sections **explicitly reference** which earlier sections they build upon:

```qmd
## Performance Degradation Under Load

\needspace{4in}

Building on the response time distribution and traffic patterns above, we 
observe a clear correlation: p99 latency spikes to 2.3s during the 2-4pm 
peak traffic window. The system handles baseline load well but degrades 
significantly under peak conditions.

Keeping Content Together (PDF)

Use \needspace{Xin} before sections with charts/diagrams to prevent awkward page breaks and large whitespace gaps:

## Revenue by Carrier

\needspace{4in}

```{python}
# Chart code here


**Guidelines:**
- **Mermaid diagrams**: `\needspace{2in}`
- **Single chart**: `\needspace{3in}`
- **Chart + explanation**: `\needspace{4in}`

**Requires** in YAML frontmatter:
```yaml
format:
  pdf:
    include-in-header:
      text: |
        \usepackage{needspace}

Markdown Formatting Rules (CRITICAL)

IMPORTANT: Markdown requires blank lines between paragraphs for proper rendering.

Line Breaks and Paragraphs

The Problem:

❌ BAD: This will render as one long line
This text appears on a new line in the source
But it renders on the same line as above
Because there's no blank line between them

The Solution:

✅ GOOD: This renders as separate paragraphs

This text appears on its own line because there's a blank line above it.

Each paragraph needs a blank line before and after it.

Common Markdown Patterns

Paragraphs (need blank lines):

This is paragraph one.

This is paragraph two.

This is paragraph three.

Lists REQUIRE blank line before the list:

❌ BAD: List doesn't render correctly
Here is some text:
- Item 1
- Item 2

✅ GOOD: Blank line before list
Here is some text:

- Item 1
- Item 2

Numbered lists also require blank line before:

❌ BAD: Numbered list broken
The steps are:
1. First step
2. Second step

✅ GOOD: Blank line before numbered list
The steps are:

1. First step
2. Second step

Lists (no blank lines between items):

- Item 1
- Item 2
- Item 3

Multi-paragraph list items (blank lines within item):

- Item 1 with first paragraph

  Item 1 continued with second paragraph (indented 2 spaces)

- Item 2 starts here

Headers (blank line before and after):

Previous paragraph ends here.

## Section Header

New paragraph starts here.

Code blocks (blank line before and after):

Previous paragraph ends here.

```python
print("code block")
```

New paragraph starts here.

Quarto-Specific Markdown

Python code output:

#| echo: false
from IPython.display import Markdown

# ❌ BAD: Single newline won't create paragraph break
Markdown("Line 1\nLine 2")  # Renders as: Line 1 Line 2

# ✅ GOOD: Double newline creates paragraph break
Markdown("Line 1\n\nLine 2")  # Renders as separate paragraphs

# ✅ GOOD: Use triple-quoted string with blank lines
Markdown("""
Paragraph one.

Paragraph two.

Paragraph three.
""")

F-strings in Markdown:

#| echo: false
from IPython.display import Markdown

# ✅ GOOD: Blank lines between paragraphs
Markdown(f"""
## Analysis Results

The growth rate is {growth_rate:.1%}.

This represents a significant increase over last quarter.

We recommend increasing inventory by {inventory_increase:,} units.
""")

Best Practices

✅ DO:

Use blank lines between all paragraphs
Use blank lines before and after headers
Use blank lines before and after code blocks
Use blank lines before and after tables
Use triple-quoted strings for multi-line markdown in Python

❌ DON'T:

Use single newlines and expect paragraph breaks
Forget blank lines around headers or code blocks
Mix single and double newlines inconsistently
Use \n in strings expecting paragraph breaks (use \n\n)

Testing Markdown Formatting

Quick test:

# Create test document
cat > test.qmd << 'EOF'
---
title: "Markdown Test"
format: gfm
---

Paragraph 1.

Paragraph 2.

## Header

Paragraph 3.
EOF

# Render and check output
quarto render test.qmd --to gfm
cat test.md

LLM Self-Reasoning with Quarto

Use /think command for structured analysis with graduated detail.

Document Structure

Documents use graduated detail so readers can stop at their desired depth:

Abstract (paragraph)

Self-contained executive summary
Complete story: what, why, result, meaning
Include key metrics and confidence assessment

Key Findings (3-5 bullets)

Result + confidence + brief evidence
Scannable - each finding valuable standalone
Format: **[Finding]**: [Result] — [Evidence] (Confidence)

Investigation (detailed)

Observations (sourced facts with academic citations [source])
Analysis (visual reasoning, statistical evidence)
Interpretation (what it means, confidence, dependencies)

Appendix (optional)

Investigation notes, dead ends, debugging traces
External data sources NOT directly referenced in code blocks (e.g., verbal conversations, meeting notes, prior analyses)
Do NOT duplicate data sources already expressed in code blocks - the code IS the source documentation

Visual Evidence

Include diagram, chart, or table for most findings:

Mermaid diagrams for reasoning flows
Charts for quantitative analysis
Formatted tables for comparative data

Example Invocation

/think Why is the login endpoint returning 500 errors intermittently?

Creates analysis document with abstract-first structure, key findings with confidence levels, and visual reasoning chains.

See /think command for full template.

When to Use Quarto

Perfect for:

LLM epistemological reasoning (use /think command)
Static reports and documentation (no interactivity needed)
Multi-format publishing (PDF + HTML + Word from single source)
Scientific documents (equations, citations, cross-references)
Presentations (RevealJS HTML slides, PowerPoint, Beamer PDF)
Websites and blogs (multi-page projects)
Exporting Jupyter notebooks for publication

NOT for:

Interactive dashboards (use Shiny or dedicated dashboard tools)
Real-time data updates (use web dashboards)
When you need widgets/sliders for end users

Data Provenance and Portability

CRITICAL: Quarto documents must be reproducible with documented dependencies.

When someone runs quarto render analysis.qmd, they should be able to get identical results given:

The document itself (.qmd file)
Documented external dependencies (with setup instructions)
Access to the same data sources (APIs, databases)

The Portability Contract

A .qmd file defines a complete data pipeline. The document contains:

Data extraction logic - How to obtain the data (queries, API calls, etc.)
Transformation code - How to process and analyze
Presentation - Charts, tables, narrative
Dependency documentation - Setup instructions for heavy dependencies

Practical limits: Some dependencies are too expensive to rebuild on every render:

Vector indexes (LanceDB, FAISS) - Document how to build, reference existing
Large datasets - Commit to git or document extraction, don't re-download
ML models - Reference by path with setup instructions

The key is documentation: readers must understand what's needed and how to set it up.

Anti-Pattern: Opaque File References

❌ BAD: Referencing local files without provenance

df = pd.read_json('/tmp/orders.jsonl', lines=True)  # Where did this come from?
df = pd.read_csv('sales.csv')  # Who created this? When? How?

# "This JSONL file" without explaining its origin
data = load_data('extracted_metrics.jsonl')  # Non-portable!

Good Pattern: Embed Data Extraction

✅ PREFERRED: Document defines where data comes from

#| cache: true
import subprocess
import io
import pandas as pd

# Extract from BigQuery - cached to avoid re-running on every render
result = subprocess.run([
    'bigquery', 'query',
    '''SELECT * FROM production.orders 
       WHERE date >= '2024-01-01' 
       AND status = 'completed' ''',
    '--format', 'jsonl'
], capture_output=True, text=True, check=True)

df = pd.read_json(io.StringIO(result.stdout), lines=True)

✅ ALSO GOOD: Canonical external sources

#| cache: true
import pandas as pd

# Public dataset with stable URL
df = pd.read_csv('https://data.company.com/public/sales-2024.csv')

# Or versioned data in the same repository
df = pd.read_csv('data/sales-2024-v2.csv')  # Committed to git with the .qmd

Heavy Dependencies: Document, Don't Rebuild

Rule of thumb: Un-cached renders should complete in < 60 seconds.

< 60 seconds → Embed in document (with cache: true)
> 60 seconds → Document as external dependency with setup instructions

Some dependencies are too expensive to recreate on every render. Document them clearly so readers can set up the environment.

✅ GOOD: Reference with setup documentation

#| echo: false
import subprocess

# DEPENDENCY: LanceDB index at ~/.lancedb/documents
# Setup: lancer ingest -t documents ~/corpus/*.md
# This index contains ~50k documents and takes ~10 min to build

result = subprocess.run(
    ['lancer', 'search', '-t', 'documents', 'shipping rate errors', '--limit', '20'],
    capture_output=True, text=True, check=True
)
relevant_docs = result.stdout

✅ GOOD: Prerequisites section in document

---
title: "Knowledge Base Analysis"
---

## Prerequisites

This analysis requires the following setup:

1. **LanceDB index**: `lancer ingest -t documents ~/corpus/*.md`
2. **BigQuery access**: Authenticated via `gcloud auth application-default login`
3. **Data snapshot**: Run `./scripts/extract-data.sh` (takes ~5 min)

## Analysis
...

❌ BAD: Silent dependency on local state

# No documentation about what this index is or how to create it
results = lancer.search("documents", "query")  # Will fail for anyone else

Caching for Iteration Speed

Use cache: true to avoid re-running expensive operations during iteration.

Requires jupyter-cache (one-time install):

uv pip install jupyter-cache

Per-cell caching:

#| cache: true
#| label: data-extraction

# This cell only re-executes if the code changes
result = subprocess.run(['bigquery', 'query', ...], capture_output=True, text=True)
df = pd.read_json(io.StringIO(result.stdout), lines=True)

Document-wide caching in YAML frontmatter:

---
title: "Analysis Report"
execute:
  cache: true
---

Freeze for Project-Level Caching

For projects with many documents, use freeze to cache execution results in version control:

# _quarto.yml (project config)
execute:
  freeze: auto  # Re-render only when source changes

Key difference:

cache: true - Caches cell outputs locally (Jupyter Cache)
freeze: auto - Stores results in _freeze/ directory (can commit to git)

When to use freeze:

Large projects with many collaborators
Documents with environment-specific dependencies
When you want cached results portable across machines (commit _freeze/)

Cache-Read-or-Query Pattern (BigQuery / Expensive APIs)

Use this pattern when cache: true is insufficient — specifically when:

Querying BigQuery or other expensive external APIs
Cache must survive Quarto kernel restarts (Jupyter cache does not)
You want explicit control over cache invalidation (not tied to source changes)
Cache files are machine-specific and should be gitignored

Set execute: cache: false in frontmatter when using this pattern (disable Jupyter cache to avoid double-caching).

Helper functions (add once per document, in a setup cell):

import json
from pathlib import Path
from datetime import datetime, timezone

_CACHE_DIR = Path('data/cache')
_CACHE_DIR.mkdir(parents=True, exist_ok=True)

def _cache_path(name):
    return _CACHE_DIR / f"{name}.json"

def _read_cache(name):
    p = _cache_path(name)
    if p.exists():
        return json.loads(p.read_text())
    return None

def _write_cache(name, records, scalars=None):
    p = _cache_path(name)
    p.write_text(json.dumps({
        '_queried_at': datetime.now(timezone.utc).isoformat(),
        'records': records,
        'scalars': scalars or {}
    }, default=str))

Per-dataset usage template (repeat for each dataset):

_c = _read_cache('my_dataset')
if _c:
    df = pd.DataFrame(_c['records'])
    my_scalar = float(_c['scalars']['my_scalar'])
else:
    df = run_bq_query(my_query)
    my_scalar = float(df['col'].values[0])
    _write_cache('my_dataset', df.to_dict(orient='records'), {
        'my_scalar': my_scalar,
    })

Cache hit → reads DataFrame and scalars from JSON; no BigQuery call. Cache miss → queries BigQuery live, writes cache, continues render. Query failure on miss → render fails loudly (intentional — no silent fallback).

Never do:

except Exception: my_scalar = 42 — silent fallback masks broken queries
Assign a constant inside try: without a preceding query call — hidden constant, not a live value
Omit _write_cache() in the else branch — next render re-queries unnecessarily

Cache file format:

{
    "_queried_at": "2026-02-20T16:00:00Z",
    "records": [...],
    "scalars": {...}
}

Serialization notes:

Numpy arrays → store as lists (df['col'].tolist()), reconstruct with np.array(...)
Dates → use default=str in json.dumps to handle non-serializable types

Cache invalidation:

# .gitignore
data/cache/

# Justfile recipe
delete-cache:
    rm -rf data/cache/

Complete Example: Reproducible Analysis

---
title: "Q4 2024 Sales Analysis"
author: "Josh Lane"
date: "2024-12-31"
format:
  gfm:
    wrap: none
  html:
    theme:
      dark: darkly
      light: flatly
execute:
  cache: true
filters:
  - auto-dark
---

## Data Extraction

```{python}
#| cache: true
#| label: extract-sales
import subprocess
import io
import pandas as pd

# Reproducible: Query is embedded in the document
result = subprocess.run([
    'bigquery', 'query',
    '''
    SELECT date, product, region, sales, units
    FROM production.sales
    WHERE EXTRACT(QUARTER FROM date) = 4
      AND EXTRACT(YEAR FROM date) = 2024
    ''',
    '--format', 'jsonl'
], capture_output=True, text=True, check=True)

df = pd.read_json(io.StringIO(result.stdout), lines=True)
df['date'] = pd.to_datetime(df['date'])

Analysis

#| echo: false
from IPython.display import Markdown

total_sales = df['sales'].sum()
top_product = df.groupby('product')['sales'].sum().idxmax()

Markdown(f"""
### Key Metrics

- **Total Q4 Sales**: ${total_sales:,.2f}
- **Top Product**: {top_product}
- **Records**: {len(df):,}
""")


### Best Practices Summary

✅ **DO:**

- Embed data extraction commands in the document
- Use `cache: true` for expensive operations
- Reference canonical external URLs when possible
- Commit small data files alongside the `.qmd`
- Use `freeze` for project-level caching

❌ **DON'T:**

- Reference opaque local files (`/tmp/data.jsonl`)
- Say "this file" without showing how it was created
- Assume the reader has access to your local machine
- Leave data provenance undocumented

## Decision Tree: Quarto vs Alternatives

Need user interactivity? (sliders, dropdowns, real-time updates) ├─ YES → Use Shiny or dedicated dashboard tools └─ NO → Static output needed │ ├─ Complex multi-page documentation site? │ └─ YES → Use Quarto website/book projects │ ├─ Single analysis with code + results? │ └─ Native Quarto .qmd files (recommended) │ └─ Just formatting existing markdown? └─ Use Quarto with plain .md files


## Installation

Quarto is already installed (version 1.8.27).

**Optional dependencies:**

```bash
# TinyTeX for better PDF generation (LaTeX)
quarto install tinytex

# Chromium for PDF generation (alternative to LaTeX)
quarto install chromium

Current setup:

✅ Pandoc 3.6.3 (embedded)
✅ Chrome headless (system installation)
✅ Python 3.14.2 detected
✅ Jupyter installed (for Python code execution)
❌ TinyTeX not installed (optional)

Neovim Integration

quarto-nvim plugin is installed and configured.

Features:

LSP support for .qmd files (Python, bash, lua, html code chunks)
Syntax highlighting for code chunks
Diagnostics and completion in code cells
Live preview with :QuartoPreview

Keybindings:

<leader>qp - Preview current document (live reload)
<leader>qc - Close preview
<leader>qm - Render to markdown (GFM) - RECOMMENDED default
<leader>qh - Render to HTML
<leader>qd - Render to PDF

Treesitter support: Quarto syntax highlighting requires treesitter parsers:

# In Neovim
:TSInstall markdown
:TSInstall markdown_inline
:TSInstall python

Otter.nvim integration: The plugin uses otter.nvim for embedded language support in code chunks. This means you get full LSP features (completion, diagnostics, hover) for Python code inside .qmd files.

Basic Usage

Quick Start: Native .qmd Files

Create a Quarto document:

# Create analysis.qmd
cat > analysis.qmd << 'EOF'
---
title: "Sales Analysis Q4 2024"
author: "Josh Lane"
date: "2024-01-30"
format:
  gfm:
    wrap: none              # No line wrapping
  html:
    theme:
      dark: darkly          # Dark mode (recommended)
      light: flatly         # Light mode fallback
    code-fold: true
execute:
  cache: true               # Cache expensive queries
filters:
  - auto-dark               # Respect system preference
---

## Data Extraction

```{python}
#| cache: true
#| echo: false
import subprocess
import io
import pandas as pd

# Reproducible: Query embedded in document
result = subprocess.run([
    'bigquery', 'query',
    '''SELECT date, product, sales 
       FROM production.sales 
       WHERE EXTRACT(QUARTER FROM date) = 4 
       AND EXTRACT(YEAR FROM date) = 2024''',
    '--format', 'jsonl'
], capture_output=True, text=True, check=True)

df = pd.read_json(io.StringIO(result.stdout), lines=True)
df['date'] = pd.to_datetime(df['date'])

Overview

#| echo: false
import matplotlib.pyplot as plt
from IPython.display import Markdown

total_sales = df['sales'].sum()
avg_daily = df.groupby('date')['sales'].sum().mean()
top_product = df.groupby('product')['sales'].sum().idxmax()

# Display key metrics
Markdown(f"""
### Key Metrics

- **Total Sales**: ${total_sales:,.2f}
- **Average Daily Sales**: ${avg_daily:,.2f}
- **Top Product**: {top_product}
- **Records Analyzed**: {len(df):,}
""")

Sales Trend Analysis

#| label: fig-sales-trend
#| fig-cap: "Daily sales trending upward in Q4 2024"
#| echo: false

fig, ax = plt.subplots(figsize=(10, 6))
daily_sales = df.groupby('date')['sales'].sum()
daily_sales.plot(ax=ax, kind='line', linewidth=2, color='#2E86AB')
ax.set_title('Sales Trend Over Time', fontsize=14, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Top Products

#| echo: false
# Format top 10 products as markdown table
top_products = (df.groupby('product')['sales']
                .sum()
                .sort_values(ascending=False)
                .head(10)
                .reset_index())

top_products.columns = ['Product', 'Total Sales']
top_products['Total Sales'] = top_products['Total Sales'].apply(lambda x: f"${x:,.2f}")

Markdown(top_products.to_markdown(index=False, tablefmt='pipe'))

Statistical Analysis

#| echo: false
import numpy as np

# Calculate growth metrics
daily_sales = df.groupby('date')['sales'].sum()
growth_rate = (daily_sales.iloc[-1] - daily_sales.iloc[0]) / daily_sales.iloc[0]
avg_growth = daily_sales.pct_change().mean()

Markdown(f"""
The sales data exhibits a compound growth pattern modeled by:

$$
S(t) = S_0 \\times (1 + r)^t
$$

where $S_0$ represents initial sales, $r = {avg_growth:.3f}$ is the average daily growth rate, 
and $t$ is time in days.

**Key Statistical Findings:**

- Overall Q4 growth: $\\Delta S = {growth_rate:.1%}$
- Average daily growth: $\\bar{{r}} = {avg_growth:.3%}$
- Standard deviation: $\\sigma = {daily_sales.std():,.2f}$
- Correlation with marketing spend: $\\rho = 0.82$ (strong positive)

These metrics indicate statistically significant growth ($p < 0.01$) with 
consistent upward momentum throughout the quarter.
""")

Conclusion

Q4 2024 showed strong performance with total sales of ${total_sales:,.2f}. The upward trend in daily sales indicates positive momentum heading into the next quarter. EOF

Install auto-dark extension (one-time, if not already installed)

quarto add gadenbuie/quarto-auto-dark --no-prompt

Render to markdown (DEFAULT - composable, archival)

quarto render analysis.qmd --to gfm

Optional: Render to HTML with dark mode for web viewing

quarto render analysis.qmd --to html # Uses auto-dark theme

Optional: Render to other formats only when needed

quarto render analysis.qmd --to pdf # For Google Drive sharing/printing quarto render analysis.qmd --to html # For web viewing


### File Formats Quarto Can Render

**Input formats:**
- `.qmd` - Quarto markdown (native, recommended)
- `.ipynb` - Jupyter notebooks
- `.md` - Plain markdown (no code execution)
- `.Rmd` - R Markdown files

**Output formats:**
- **Markdown**: `md` (plain), `gfm` (GitHub-flavored) - **PREFERRED default**
- **Documents**: PDF, HTML, Word, ODT, ePub, Typst
- **Presentations**: RevealJS (HTML), PowerPoint, Beamer (PDF)
- **Websites**: Multi-page sites, blogs, books
- **Dashboards**: Interactive dashboards (with Shiny or Observable JS)

## Core Commands

```bash
# Render document to markdown (DEFAULT - composable, text-based)
quarto render document.qmd --to md           # Executable markdown with results
quarto render document.qmd --to gfm          # GitHub-flavored markdown

# Render to other formats
quarto render document.qmd --to pdf          # PDF (for Google Drive sharing)
quarto render document.qmd --to html         # HTML (avoid --toc, use clear headings)

# Render Jupyter notebook to markdown
quarto render notebook.ipynb --to md         # Markdown with executed results

# Render plain markdown (no code execution)
quarto render README.md --to pdf

# Multiple formats at once
quarto render document.qmd --to md,pdf,html

# Preview with live reload
quarto preview document.qmd

# Create new project
quarto create project website mysite
quarto create project book mybook

# Publish
quarto publish gh-pages                       # GitHub Pages
quarto publish quarto-pub                     # Quarto Pub
quarto publish netlify                        # Netlify

Python Code Execution

Code Blocks

## Analysis Section

```{python}
import pandas as pd

df = pd.read_csv("data.csv")
df.head()
```

Code Block Options

```{python}
#| label: fig-sales
#| fig-cap: "Sales over time"
#| echo: false
#| warning: false

plt.figure(figsize=(10, 6))
df.plot(x='date', y='sales')
plt.show()
```

Common options:

echo: false - Hide code, show output only
code-fold: true - Collapsible code blocks
warning: false - Hide warnings
message: false - Hide messages
label: fig-name - Reference label for cross-references
fig-cap: "Caption" - Figure caption

Inline Python Expressions

The total is `{python} f"${total:,.2f}"`.

There are `{python} len(df)` rows in the dataset.

Output Formatting

## Display Options

```{python}
#| output: asis
print("**Bold text** from code")
```

```{python}
#| output: false
# Code runs but output is hidden
result = expensive_calculation()
```

Table Formatting (CRITICAL)

NEVER use raw df.head() or bare dataframes. ALWAYS format tables for presentation.

Option 1: Great Tables (RECOMMENDED for rich formatting)

Installation:

uv add great-tables

Basic usage:

from great_tables import GT

# Simple formatted table
GT(df.head(10))

# With styling
(GT(df.head(10))
    .tab_header(title="Sales Summary", subtitle="Q4 2024")
    .fmt_currency(columns="sales", currency="USD")
    .fmt_percent(columns="growth_rate", decimals=1)
    .fmt_number(columns="quantity", decimals=0)
    .fmt_date(columns="date", date_style="medium")
    .tab_source_note("Source: Company Database"))

Advanced styling:

from great_tables import GT, loc, style

(GT(top_products)
    .tab_header(title="Top 10 Products by Revenue")
    .fmt_currency(columns="revenue", currency="USD")
    .data_color(
        columns="revenue",
        palette=["lightblue", "darkblue"],
        domain=[0, df['revenue'].max()]
    )
    .tab_style(
        style=style.text(weight="bold"),
        locations=loc.body(columns="product")
    ))

Option 2: pandas `.to_markdown()` (Simple, built-in)

For markdown output (use Markdown() to render properly):

from IPython.display import Markdown

# Basic markdown table
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))

# With custom formatting
formatted_df = df.head(10).copy()
formatted_df['sales'] = formatted_df['sales'].apply(lambda x: f"${x:,.2f}")
formatted_df['date'] = pd.to_datetime(formatted_df['date']).dt.strftime('%Y-%m-%d')
Markdown(formatted_df.to_markdown(index=False, tablefmt='pipe'))

Table format options:

'pipe' - GitHub-flavored markdown pipes (RECOMMENDED)
'grid' - ASCII grid
'simple' - Simple spacing
'html' - HTML table (for HTML output)

Option 3: tabulate (Flexible formatting)

Installation:

uv add tabulate

Usage:

from tabulate import tabulate
from IPython.display import Markdown

# Basic table
Markdown(tabulate(df.head(10), headers='keys', tablefmt='pipe', showindex=False))

# With custom formatting
Markdown(tabulate(
    df.head(10),
    headers=['Product', 'Sales', 'Date'],
    tablefmt='pipe',
    floatfmt='.2f',
    showindex=False
))

Best Practices for Tables

✅ DO:

Use Great Tables for HTML/PDF output (rich formatting)
Use .to_markdown() for markdown output (simplicity)
Format currency, percentages, dates before display
Add titles, subtitles, and source notes
Limit to top N rows (10-20 max) - don't dump entire dataset
Apply color scales for numerical columns
Bold headers and important columns
Pair non-trivial tables with visuals - tables showing trends, comparisons, or distributions need an accompanying chart

❌ DON'T:

Use raw df.head() without formatting
Display more than 20 rows in a table
Show raw timestamps or unformatted numbers
Include index column unless meaningful
Use print() for tables (use Markdown() instead to render properly)
Present analytical tables (trends, comparisons) without an accompanying chart

Table Formatting by Output Format

For GFM/Markdown:

from IPython.display import Markdown
# Use Markdown() to render tables properly in Quarto
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))

For HTML:

# Use Great Tables for rich styling
from great_tables import GT
GT(df.head(10)).fmt_currency(columns="sales")

For PDF:

# Use Great Tables or formatted markdown
# Great Tables renders well in PDF via LaTeX
GT(df.head(10))

Data Visualization Best Practices

Charts and diagrams should be your PRIMARY communication tool, not an afterthought.

When to Use Charts vs Tables

Use Charts for:

Trends over time (line charts)
Distributions (histograms, density plots)
Comparisons across categories (bar charts)
Proportions and composition (pie charts, stacked bars)
Relationships between variables (scatter plots)
Geographic data (maps, choropleth)

Use Tables for:

Precise values needed (financial reports)
Lookup reference (top N items)
Multiple dimensions that don't visualize well
Small datasets (< 20 rows)

Use Both:

Chart for the trend, table for the details
Chart for overview, table for drill-down

Chart Types by Use Case

Trends and Time Series:

import matplotlib.pyplot as plt

# Line chart for trends
fig, ax = plt.subplots(figsize=(10, 6))
df.groupby('date')['sales'].sum().plot(ax=ax, kind='line', linewidth=2)
ax.set_title('Sales Trend Over Time', fontsize=14, fontweight='bold')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Comparisons:

# Bar chart for category comparisons
fig, ax = plt.subplots(figsize=(10, 6))
top_products = df.groupby('product')['sales'].sum().nlargest(10)
top_products.plot(ax=ax, kind='barh', color='#2E86AB')
ax.set_title('Top 10 Products by Sales', fontsize=14, fontweight='bold')
ax.set_xlabel('Sales ($)')
plt.tight_layout()
plt.show()

Distributions:

# Histogram for distributions
fig, ax = plt.subplots(figsize=(10, 6))
df['order_value'].hist(bins=30, ax=ax, color='#A23B72', edgecolor='black')
ax.set_title('Order Value Distribution', fontsize=14, fontweight='bold')
ax.set_xlabel('Order Value ($)')
ax.set_ylabel('Frequency')
plt.tight_layout()
plt.show()

Proportions:

# Pie chart for proportions (use sparingly)
fig, ax = plt.subplots(figsize=(8, 8))
category_sales = df.groupby('category')['sales'].sum()
ax.pie(category_sales, labels=category_sales.index, autopct='%1.1f%%', startangle=90)
ax.set_title('Sales by Category', fontsize=14, fontweight='bold')
plt.show()

Relationships:

# Scatter plot for correlations
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(df['marketing_spend'], df['sales'], alpha=0.5, color='#F18F01')
ax.set_title('Marketing Spend vs Sales', fontsize=14, fontweight='bold')
ax.set_xlabel('Marketing Spend ($)')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Styling Best Practices

✅ DO:

Use descriptive titles with context
Label axes with units
Use color purposefully (not just default colors)
Add gridlines for readability (alpha=0.3)
Set appropriate figure size (10x6 for landscape, 8x8 for square)
Use plt.tight_layout() to prevent label cutoff
Add legends when showing multiple series
Use consistent color schemes across document

❌ DON'T:

Use default ugly matplotlib colors
Skip axis labels or titles
Create tiny, unreadable charts
Use 3D charts (hard to read accurately)
Overload charts with too many series (> 5-7)
Use pie charts for more than 5 categories

Mermaid Diagrams (Quarto Native Syntax)

Quarto uses {mermaid} executable code blocks. Standard GFM ```mermaid fences won't render.

Quarto syntax:

```{mermaid}
flowchart TD
    A[Load Data] --> B[Clean Data]
    B --> C[Analyze]
```

Note: GFM ```mermaid blocks and Markdown('```mermaid...') calls output raw text instead of rendered diagrams. Use the native {mermaid} syntax directly in markdown.

Common Mermaid diagram types:

flowchart - Process flows, decision trees
sequenceDiagram - Interaction sequences
classDiagram - Object relationships
erDiagram - Entity-relationship diagrams
gantt - Project timelines
pie - Simple pie charts

Example: Reasoning flow:

```{mermaid}
flowchart LR
    F1[Fact: Q4 Sales = $2.1M] --> C1[Conclusion: Strong Quarter]
    F2[Fact: YoY Growth = 15%] --> C1
    F3[Fact: Top Product = Widget X] --> C2[Conclusion: Focus Marketing on Widgets]
```

Preventing Mermaid Clipping in PDF:

Mermaid diagrams can get clipped in PDF output when they exceed page width. Use %%{init}%% directives to control sizing:

```{mermaid}
%%{init: {"flowchart": {"useMaxWidth": true}}}%%
flowchart TD
    A[Start] --> B[Process]
    B --> C[End]
```

Best practices for PDF mermaid:

Always use useMaxWidth: true for flowcharts in PDF output
Prefer TD (top-down) over LR (left-right) for wide diagrams
Break complex diagrams into multiple smaller diagrams
Use shorter node labels to reduce width
Add to YAML frontmatter for document-wide defaults:

format:
  pdf:
    mermaid:
      theme: default

Interactive Charts (HTML Output Only)

For HTML output, use Plotly for interactivity:

import plotly.express as px

# Interactive line chart
fig = px.line(df, x='date', y='sales', title='Sales Trend (Interactive)')
fig.update_layout(hovermode='x unified')
fig.show()

# Interactive scatter with hover
fig = px.scatter(df, x='marketing_spend', y='sales', 
                 hover_data=['product', 'region'],
                 title='Marketing ROI Analysis')
fig.show()

Note: Plotly charts only work in HTML output. For PDF/Word, use matplotlib/seaborn.

Chart Selection Decision Tree

What are you showing?

├─ Change over time? → Line chart
├─ Compare categories? → Bar chart (vertical or horizontal)
├─ Show distribution? → Histogram or box plot
├─ Show composition? → Stacked bar or pie chart (if < 5 categories)
├─ Show relationship? → Scatter plot
├─ Show process/flow? → Mermaid flowchart
└─ Multiple variables? → Faceted plots or small multiples

Mathematical Notation with LaTeX (STRONGLY ENCOURAGED)

For any mathematical content, ALWAYS use LaTeX notation - it's professional and renders beautifully in all formats.

Inline Math

The equation $E = mc^2$ shows the relationship between energy and mass.

The growth rate is approximately $\alpha = 0.15$ or 15%.

We calculated the mean $\mu = \frac{\sum x_i}{n}$ from the dataset.

Display Math (Equations)

The quadratic formula is:

$$
x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
$$

The normal distribution probability density function:

$$
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
$$

Aligned Equations

$$
\begin{aligned}
\text{Revenue} &= \text{Price} \times \text{Quantity} \\
               &= \$50 \times 1000 \\
               &= \$50{,}000
\end{aligned}
$$

Common Mathematical Expressions

Statistics:

- Mean: $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$
- Variance: $\sigma^2 = \frac{1}{n}\sum_{i=1}^{n} (x_i - \mu)^2$
- Standard deviation: $\sigma = \sqrt{\sigma^2}$
- Correlation: $\rho_{X,Y} = \frac{\text{cov}(X,Y)}{\sigma_X \sigma_Y}$

Finance:

- Compound interest: $A = P\left(1 + \frac{r}{n}\right)^{nt}$
- NPV: $NPV = \sum_{t=0}^{N} \frac{C_t}{(1+r)^t}$
- ROI: $ROI = \frac{\text{Gain} - \text{Cost}}{\text{Cost}} \times 100\%$

Linear regression:

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon
$$

where $\beta_0$ is the intercept, $\beta_i$ are coefficients, and $\epsilon \sim N(0, \sigma^2)$.

Numbered Equations (for cross-references)

Einstein's mass-energy equivalence:

$$
E = mc^2
$$ {#eq-einstein}

As shown in @eq-einstein, energy and mass are equivalent.

Mathematical Notation Best Practices

✅ DO:

Use LaTeX for ALL mathematical expressions, even simple ones like percentages
Use \text{} for text within equations: $\text{Revenue} = \$1{,}000$
Use proper notation: \alpha, \beta, \mu, \sigma, \sum, \prod, \int
Number important equations for cross-referencing
Use aligned environment for multi-line equations
Format numbers properly: \$1{,}000 for currency with comma separators
Use \times for multiplication: $5 \times 10$
Use \cdot for dot product: $\vec{a} \cdot \vec{b}$

❌ DON'T:

Write "alpha = 0.15" in plain text - use $\alpha = 0.15$
Write "x^2" in plain text - use $x^2$
Use asterisk for multiplication - use $\times$ or $\cdot$
Mix LaTeX and plain text notation inconsistently
Skip equation numbering for important formulas

Example: Statistical Report with LaTeX

## Regression Analysis

We fitted a linear model:

$$
\text{Sales} = \beta_0 + \beta_1 \times \text{Marketing Spend} + \epsilon
$$ {#eq-sales-model}

where $\epsilon \sim N(0, \sigma^2)$ represents random error.

### Results

The estimated parameters from @eq-sales-model are:

- Intercept: $\hat{\beta_0} = 50{,}000$ (SE = $2{,}500$)
- Slope: $\hat{\beta_1} = 3.2$ (SE = $0.4$)
- $R^2 = 0.78$

This indicates that each additional \$1 in marketing spend yields approximately \$3.20 in sales ($p < 0.001$).

LaTeX Tables (Professional Formatting)

For publication-quality tables in PDF output, use LaTeX table formatting alongside Great Tables.

Basic LaTeX Table

```{=latex}
\begin{table}[htbp]
\centering
\caption{Quarterly Sales Performance}
\label{tab:sales}
\begin{tabular}{lrrrr}
\hline
Quarter & Revenue (\$) & Growth (\%) & Units & Margin (\%) \\
\hline
Q1 2024 & 1,250,000 & 15.2 & 5,000 & 22.5 \\
Q2 2024 & 1,450,000 & 16.0 & 5,800 & 23.1 \\
Q3 2024 & 1,680,000 & 15.9 & 6,700 & 24.0 \\
Q4 2024 & 1,920,000 & 14.3 & 7,300 & 24.5 \\
\hline
\textbf{Total} & \textbf{6,300,000} & \textbf{15.4} & \textbf{24,800} & \textbf{23.5} \\
\hline
\end{tabular}
\end{table}
```

Enhanced LaTeX Table with booktabs (RECOMMENDED)

```{=latex}
\begin{table}[htbp]
\centering
\caption{Statistical Summary of Key Metrics}
\label{tab:statistics}
\begin{tabular}{lcccc}
\toprule
Metric & Mean & SD & Min & Max \\
\midrule
Sales (\$) & 125{,}000 & 25{,}000 & 75{,}000 & 200{,}000 \\
Orders & 500 & 120 & 300 & 750 \\
AOV (\$) & 250 & 45 & 180 & 380 \\
Churn (\%) & 12.5 & 3.2 & 8.0 & 18.5 \\
\bottomrule
\end{tabular}
\end{table}
```

booktabs provides professional-looking horizontal rules (better than \hline).

Add to YAML frontmatter for booktabs:

header-includes:
  - \usepackage{booktabs}

Python-Generated LaTeX Tables

Option 1: Great Tables with LaTeX output

from great_tables import GT

# Great Tables can export to LaTeX
table = GT(df.head(10))
table.save("table.tex", format="latex")

Option 2: pandas to_latex() with styling

import pandas as pd

# Format DataFrame for LaTeX
df_formatted = df.head(10).copy()
df_formatted['Sales'] = df_formatted['Sales'].apply(lambda x: f"\\${x:,.0f}")
df_formatted['Growth'] = df_formatted['Growth'].apply(lambda x: f"{x:.1f}\\%")

# Export to LaTeX with booktabs
latex_table = df_formatted.to_latex(
    index=False,
    caption="Top 10 Products by Sales",
    label="tab:top-products",
    position="htbp",
    column_format="lrrr",
    escape=False,  # Don't escape $ and %
    formatters={
        'Sales': lambda x: f"\\${x:,.0f}",
        'Growth': lambda x: f"{x:.1f}\\%"
    }
)

print(latex_table)

Option 3: tabulate with LaTeX output

from tabulate import tabulate

latex_table = tabulate(
    df.head(10),
    headers='keys',
    tablefmt='latex_booktabs',  # Use booktabs style
    showindex=False,
    floatfmt='.2f'
)
print(f"\\begin{{table}}[htbp]\n\\centering\n\\caption{{Sales Summary}}\n{latex_table}\n\\end{{table}}")

LaTeX Table Best Practices

✅ DO:

Use booktabs package for professional horizontal rules (\toprule, \midrule, \bottomrule)
Add captions with \caption{}
Add labels for cross-referencing with \label{tab:name}
Use position specifiers: [htbp] (here, top, bottom, page)
Right-align numbers, left-align text: {lrr} column format
Format numbers: Use thousand separators (1{,}000), proper decimal places
Use \textbf{} for bold text (totals, headers)
Center the table with \centering

❌ DON'T:

Use \hline - use booktabs rules instead (\toprule, \midrule, \bottomrule)
Skip captions - tables should always be labeled
Mix LaTeX and markdown tables in the same document
Use vertical lines (|) - they look unprofessional
Forget to escape special characters: $, %, &

Cross-Referencing LaTeX Tables

As shown in Table @tbl-sales, revenue increased across all quarters.

```{=latex}
\begin{table}[htbp]
\centering
\caption{Quarterly Revenue}
\label{tbl-sales}
...
\end{table}

Results from @tbl-sales indicate strong growth momentum.


#### When to Use LaTeX Tables vs Great Tables

**Use LaTeX tables when:**
- Creating PDF output with publication-quality typesetting
- Need precise control over table layout and spacing
- Working with complex multi-row/multi-column headers
- Creating tables for academic papers or formal reports
- Need to match specific journal formatting requirements

**Use Great Tables when:**
- Creating HTML output with interactive features
- Need quick table formatting without LaTeX complexity
- Working with markdown output
- Want consistent styling across HTML/PDF/Word formats
- Need color scales, data bars, or rich HTML styling

**Use both:**
```python
# Create table with Great Tables for HTML
gt_table = GT(df).fmt_currency(columns="sales")
gt_table  # Displays in HTML

# Also export LaTeX version for PDF
df.to_latex(caption="Sales Summary", label="tab:sales")

YAML Frontmatter

Simple Document (Markdown Default)

---
title: "My Report"
author: "Josh Lane"
date: "2024-01-30"
format: gfm  # GitHub-flavored markdown (default for composability)
---

Multiple Formats (Markdown + HTML with Dark Mode)

---
title: "Analysis Report"
format:
  gfm:
    wrap: none
    variant: +yaml_metadata_block
  html:
    theme:
      dark: darkly          # Dark mode (recommended)
      light: flatly         # Light mode fallback
    toc: true
    code-fold: true
    code-tools: true
  pdf:
    toc: true
    number-sections: true
    geometry: margin=1in
filters:
  - auto-dark               # Auto-detect system preference
---

Markdown-Only Output

---
title: "Data Analysis"
format:
  md:
    output-file: "results.md"
    variant: gfm  # Use GitHub-flavored markdown
    preserve-yaml: true
  gfm:
    wrap: none
    output-file: "results-gfm.md"
---

Advanced PDF Options

---
title: "Technical Report"
format:
  pdf:
    documentclass: article
    fontsize: 11pt
    geometry:
      - margin=1in
      - paperwidth=8.5in
      - paperheight=11in
    toc: true
    toc-depth: 2
    number-sections: true
    colorlinks: true
    fig-pos: 'H'
    include-in-header:
      text: |
        \usepackage{fancyhdr}
        \pagestyle{fancy}
        \fancyhead[L]{My Company}
        \fancyhead[R]{\thepage}
---

Minimal PDF Header Style (DEFAULT for Reports)

CRITICAL: Suppress Quarto's auto-generated title block for PDF output. The default title/author/date YAML fields trigger LaTeX's \maketitle, producing an academic-style centered title block that is too formal for most reports. Markdown #/## headings produce \section{}/\subsection{} with large font, bold, and extra spacing — also unwanted for dense reports.

Use this pattern instead:

YAML frontmatter — omit title, author, date; suppress page number on page 1:

---
format:
  pdf:
    toc: false
    number-sections: false
    geometry: margin=1in
    fontsize: 11pt
    documentclass: article
    pdf-engine: lualatex
    include-before-body:
      text: |
        \thispagestyle{empty}
execute:
  echo: false
  warning: false
jupyter: python3
---

Inline header — put this immediately after the YAML block as the first content:

```{=latex}
\begin{minipage}[t]{0.65\textwidth}
  {\large\textbf{Document Title · Subtitle}}
\end{minipage}%
\begin{minipage}[t]{0.35\textwidth}
  \raggedleft{\small Josh Lane · Feb 2026}
\end{minipage}
\vspace{3pt}
\hrule
\vspace{10pt}
```

Section headings — replace all Markdown #/## headings with raw LaTeX:

```{=latex}
\noindent{\large\textbf{Section Title}}
\vspace{6pt}
```

Why this is better:

Single-line compact header — title and author/date on one row
No wasted vertical space from \maketitle
Section labels match body font size — no jarring size jumps
Looks like a professional memo/report, not an academic paper

When to use Markdown headings instead: Only for long documents (>10 pages) where readers need a rendered TOC or cross-references (@sec-name). In that case, restore number-sections: true and toc: true.

HTML Themes

PREFER auto-dark with dual themes (see next section) over single-theme HTML:

---
format:
  html:
    theme:
      dark: darkly     # RECOMMENDED: Dual theme with auto-dark
      light: flatly
    css: custom.css
    toc: true
    toc-location: left
    code-fold: show
    code-tools: true
filters:
  - auto-dark          # Respects user's system preference
---

Single theme (discouraged - doesn't respect user preference):

---
format:
  html:
    theme: darkly      # Only use if auto-dark not available
    css: custom.css
    toc: true
    toc-location: left
---

Dark Mode with Auto-Dark Extension (RECOMMENDED)

Dark mode is STRONGLY ENCOURAGED for all HTML output:

Better accessibility and reduced eye strain
Modern user expectation (most systems default to dark mode)
Auto-dark extension respects user's system preference

Install auto-dark extension (one-time setup):

# In your project or home directory (applies to all projects)
quarto add gadenbuie/quarto-auto-dark --no-prompt

Use in document (RECOMMENDED default):

---
title: "My Analysis"
format:
  html:
    theme:
      dark: darkly      # Theme for dark mode (PRIMARY)
      light: flatly     # Theme for light mode (fallback)
filters:
  - auto-dark          # Automatically switch based on system preference
---

Best Practices:

✅ Always include auto-dark filter for HTML output
✅ Choose accessible dark theme (darkly, cyborg, slate)
✅ Test both dark and light modes if providing light fallback
⚠️ Single-theme HTML discouraged (use auto-dark instead)

Available dark themes:

darkly - Dark Bootstrap theme (RECOMMENDED - clean, professional)
cyborg - Dark blue theme (good for technical docs)
slate - Dark gray theme (subtle, minimal)
solar - Dark solarized theme (warm, comfortable)
superhero - Dark comic book theme (bold, high contrast)
vapor - Dark retro theme (stylized)

Light themes (fallback only):

flatly - Clean modern theme (recommended fallback)
cosmo - Friendly blue theme
lumen - Light gray theme
sandstone - Warm sandy theme
minty - Fresh mint theme
journal - Newspaper style

The auto-dark filter automatically detects system dark mode preference and switches themes accordingly. Users on light mode systems will see the light theme, while dark mode users (majority) get the dark theme.

Table of Contents (Usually Noise - Avoid)

IMPORTANT: Table of contents (TOC) is usually unnecessary noise in Quarto documents.

Why Avoid TOC

Problems with TOC:

❌ Adds visual clutter without adding value
❌ Redundant when you have clear section headings
❌ Takes up space at the top of the document
❌ Users can navigate with Ctrl+F or scroll
❌ In HTML, browsers have Find function
❌ In PDF, readers have built-in navigation
❌ Makes documents feel like academic papers (overly formal)

Better alternatives:

✅ Use clear, descriptive section headings
✅ Keep documents focused and concise
✅ Use visual hierarchy (# ## ### headings)
✅ Add anchor links manually if needed
✅ Trust readers to navigate using browser/PDF tools

When TOC Might Be Acceptable

ONLY use TOC for:

Very long documents (>20 pages)
Books or comprehensive guides
Multi-chapter documents
When explicitly required by style guide

Even then, prefer:

Sidebar TOC (not top-of-page)
Collapsible TOC
Floating TOC that doesn't obscure content

How to Disable TOC

Don't include toc: true in YAML frontmatter:

# ❌ BAD: Unnecessary TOC
---
title: "My Analysis"
format:
  html:
    toc: true        # DON'T DO THIS
---

# ✅ GOOD: Clean document without TOC
---
title: "My Analysis"
format:
  html:
    theme:
      dark: darkly
      light: flatly
---

If you must use TOC, make it minimal:

format:
  html:
    toc: true
    toc-depth: 2           # Only top 2 levels
    toc-location: left     # Sidebar, not top
    toc-title: "Contents"  # Short title

PDF for Google Drive Sharing

PDF format is PREFERRED for sharing via Google Drive (read-only, professional appearance).

Rendering to PDF

# Basic PDF output
quarto render analysis.qmd --to pdf

# Multiple formats
quarto render analysis.qmd --to gfm,pdf,html

Upload to Google Drive

# 1. Render to PDF
quarto render analysis.qmd --to pdf

# 2. Upload to Google Drive
# - Go to drive.google.com
# - Click "New" → "File upload"
# - Select analysis.pdf
# - Share link with collaborators

# Alternative: Use gspace CLI (if installed)
gspace upload analysis.pdf --folder "Reports"

HTML Copy-Paste to Google Docs

Alternative workflow: Render to HTML with Google Docs-compatible CSS, then copy-paste.

This workflow is useful when:

You need inline tables rendered as actual tables (not images)
You want editable content in Google Docs (not read-only PDF)
You're doing iterative editing between Quarto and Google Docs

Google Docs CSS Setup

Location: ~/.files/quarto/styles/gdocs.css (symlinked to ~/.config/quarto/styles/)

The CSS matches Google Docs defaults:

Arial 11pt, line-height 1.15
Headers: 20pt (H1), 16pt (H2), 14pt (H3)
Tables: 1pt black borders, minimal padding (2pt 6pt), vertical-align middle
No extra whitespace in cells

Using the Google Docs CSS

Option 1: Reference from user config (RECOMMENDED)

---
title: "My Document"
format:
  html:
    css: ~/.config/quarto/styles/gdocs.css
    embed-resources: true
    minimal: true
---

Option 2: Copy CSS to project directory

# Copy CSS to project
mkdir -p .quarto/styles
cp ~/.files/quarto/styles/gdocs.css .quarto/styles/

# Use in document
# css: .quarto/styles/gdocs.css

Option 3: Inline in YAML

---
format:
  html:
    include-in-header:
      text: |
        <style>
        body { font-family: Arial, sans-serif; font-size: 11pt; line-height: 1.15; }
        table { border-collapse: collapse; margin: 12pt auto; }
        th, td { border: 1pt solid #000; padding: 2pt 6pt; vertical-align: middle; }
        th { background-color: #f3f3f3; font-weight: bold; }
        </style>
---

Rendering and Copy-Paste Workflow

# 1. Render to HTML with Google Docs CSS
quarto render analysis.qmd --to html

# 2. Open in browser
open analysis.html

# 3. Select all and copy (Cmd+A, Cmd+C)

# 4. Paste into Google Docs (Cmd+V)

CSS Reference

Key CSS properties for Google Docs compatibility:

/* Body - matches Google Docs defaults */
body {
  font-family: Arial, sans-serif;
  font-size: 11pt;
  line-height: 1.15;
}

/* Tables - critical for clean copy-paste */
table {
  border-collapse: collapse;
  margin: 12pt auto;
  page-break-inside: avoid;
}

th, td {
  border: 1pt solid #000;
  padding: 2pt 6pt;    /* Minimal padding for compact tables */
  vertical-align: middle;
  line-height: 1;
}

/* Remove spacing from cell content */
th *, td * {
  margin: 0 !important;
  padding: 0 !important;
  line-height: 1 !important;
}

th {
  background-color: #f3f3f3;
  font-weight: bold;
}

When to Use HTML Copy-Paste vs PDF

Use HTML copy-paste when:

Tables must be editable in Google Docs
You need precise formatting control
Iterating between Quarto and Google Docs
Complex layouts with multiple tables

Use PDF upload when:

Read-only sharing is acceptable
Professional appearance is priority
Sharing with external stakeholders
Archival or distribution

Troubleshooting HTML Copy-Paste

Extra whitespace in tables:

Ensure CSS has line-height: 1 on cells
Use padding: 2pt 6pt for minimal padding
Add vertical-align: middle to prevent vertical gaps

Fonts not matching:

Use font-family: Arial, sans-serif (Google Docs default)
Avoid web fonts that won't copy

Tables not copying correctly:

Check border-collapse: collapse
Ensure tables have explicit borders (border: 1pt solid #000)
Use embed-resources: true in YAML

Charts/images not copying:

Charts copy as images (expected)
Use embed-resources: true to inline images
May need to re-insert images manually

Presentations

RevealJS Slides (HTML)

---
title: "Quarterly Review"
format: revealjs
---

## Slide 1

Content here

## Slide 2

```{python}
import matplotlib.pyplot as plt
# Code executes, output shown

{background-image="image.jpg"}

Slide with background image


**Render:**
```bash
quarto render slides.qmd --to revealjs

Features:

Live code execution
Animations and transitions
Speaker notes
Vertical slides
Works in any browser

PowerPoint

---
title: "Report"
format: pptx
---

## Slide Title

- Bullet point
- Another point

## Chart Slide

```{python}
# Chart code


**Render:**
```bash
quarto render slides.qmd --to pptx

Cross-References and Citations

Figures

See @fig-sales for the trend.

```{python}
#| label: fig-sales
#| fig-cap: "Sales over time"

plt.plot(df['date'], df['sales'])
plt.show()
```

Tables

As shown in @tbl-summary:

```{python}
#| label: tbl-summary
#| tbl-cap: "Summary statistics"

df.describe()
```

Sections

## Introduction {#sec-intro}

Content here.

## Analysis

As discussed in @sec-intro...

Citations

---
bibliography: references.bib
---

According to @smith2020, the results show...

Multiple citations [@smith2020; @jones2021].

BibTeX file (references.bib):

@article{smith2020,
  title={Analysis of Data},
  author={Smith, John},
  journal={Journal of Science},
  year={2020}
}

Publishing Workflows

GitHub Pages

# First time setup
quarto publish gh-pages

# Subsequent updates
quarto publish gh-pages

Quarto Pub

# Publish to free Quarto hosting
quarto publish quarto-pub

Netlify

quarto publish netlify

Unix Philosophy Alignment

Markdown-First Workflow (PREFERRED):

# DEFAULT: Render to markdown for composability
quarto render analysis.qmd --to gfm         # GitHub-flavored markdown
quarto render analysis.qmd --to md          # Plain markdown

# Pipe markdown output to other tools
quarto render analysis.qmd --to md --output - | \
  grep "^##" | \
  sed 's/^## //' > toc.txt

# Generate markdown, then convert to PDF only if needed
quarto render analysis.qmd --to gfm
quarto render analysis.qmd --to pdf         # Optional

Why Markdown Default:

Text-based: Human-readable, diffable, greppable
Composable: Pipe to other tools (grep, sed, pandoc, etc.)
Portable: Works everywhere (GitHub, wikis, editors)
Archival: Plain text survives format changes
Fast: No heavy rendering (PDF/HTML) unless needed

Quarto as a Pipeline Component:

# Composition pattern: process → transform → render → markdown
conform notes.txt --schema schema.json | \
  jq '.items[]' | \
  python generate_qmd.py > report.qmd && \
  quarto render report.qmd --to gfm

# Markdown → Post-processing
quarto render analysis.qmd --to gfm && \
  sed -i 's/TODO/DONE/g' analysis.md

Do One Thing Well:

Quarto: Document rendering + format conversion
NOT: Data analysis, interactive exploration, storage
Delegate: Python for analysis, Quarto for rendering
Prefer markdown output for downstream composition

Text Streams:

# Quarto accepts stdin
cat document.md | quarto render - --to gfm --output results.md

# Pipe through processing
cat data.json | \
  jq -r '.[] | "- \(.item)"' | \
  quarto render /dev/stdin --to gfm --output list.md

Silent Success:

# Quarto is quiet on success
quarto render doc.qmd --to gfm
echo $?  # 0 = success

# Use --quiet for scripts
quarto render doc.qmd --to gfm --quiet

Common Patterns

Pattern 1: Markdown-First (RECOMMENDED)

# Default: Generate markdown with executed results
quarto render analysis.qmd --to gfm

# Only create PDF/HTML when specifically needed for distribution
quarto render analysis.qmd --to gfm,pdf      # Markdown + PDF
quarto render analysis.qmd --to gfm,html     # Markdown + HTML

# Markdown for archival, PDF for sharing
quarto render report.qmd --to gfm
quarto render report.qmd --to pdf            # Only when needed

Pattern 2: Single Source, Multiple Formats

# Render all formats (markdown first)
quarto render report.qmd --to gfm,pdf,html

# Or explicitly
quarto render report.qmd --to gfm            # Primary output
quarto render report.qmd --to pdf            # For Google Drive sharing
quarto render report.qmd --to html           # For web viewing

Pattern 3: Batch Processing

# Render all .qmd files to markdown
for file in *.qmd; do
  quarto render "$file" --to gfm
done

# Or use find
find . -name "*.qmd" -exec quarto render {} --to gfm \;

# Parallel processing with xargs
find . -name "*.qmd" | xargs -P 4 -I {} quarto render {} --to gfm

Pattern 3: Custom Templates

# Use custom template
quarto render doc.qmd --template custom-template.tex

Pattern 4: Parameterized Reports

---
title: "Monthly Report"
format: gfm
params:
  month: "January"
  year: 2024
---

## Report for `{python} params['month']` `{python} params['year']`

```{python}
month = params['month']
year = params['year']
# Analysis using params


**Render with parameters:**
```bash
# Markdown output with parameters
quarto render report.qmd -P month:February -P year:2024 --to gfm

# Generate markdown for all months
for month in Jan Feb Mar Apr; do
  quarto render report.qmd -P month:$month --to gfm --output "${month}_report.md"
done

Pattern 5: Data Pipeline Reports

# Generate analysis data
python analysis.py > data.json

# Create Quarto report with embedded data loading
cat > report.qmd << 'EOF'
---
title: "Analysis Report"
execute:
  cache: true
---

```{python}
#| cache: true
import subprocess
import json

# Reproducible: document defines how data is generated
result = subprocess.run(['python', 'analysis.py'], capture_output=True, text=True, check=True)
data = json.loads(result.stdout)
# Render findings

EOF

quarto render report.qmd --to gfm


## Best Practices

### 1. Choose the Right Input Format

**Use `.qmd` when:**
- Starting new analysis
- Need maximum Quarto features
- Want native integration

**Use `.ipynb` when:**
- Already have Jupyter notebooks
- Collaborating with Jupyter users
- Need Jupyter-specific features

### 2. Organize Code Blocks

```qmd
## Good: Logical chunks

```{python}
# Load data
import pandas as pd
df = pd.read_csv("data.csv")

# Analyze
total = df['sales'].sum()

Bad: Everything in one block

import pandas as pd
df = pd.read_csv("data.csv")
total = df['sales'].sum()
avg = df['sales'].mean()
# ... 50 more lines


### 3. Use Frontmatter for Configuration

```yaml
# Good: Centralized config
---
format:
  pdf:
    toc: true
    number-sections: true
  html:
    code-fold: true
---

# Bad: Repeating options in CLI
# quarto render doc.qmd --to pdf --toc --number-sections

4. Cache Expensive Computations

```{python}
#| cache: true

# Expensive calculation cached
result = expensive_analysis(large_dataset)


### 5. Version Control

```gitignore
# .gitignore for Quarto projects
_site/
_book/
*.html
*.pdf
.quarto/

Commit:

✅ .qmd source files
✅ _quarto.yml config
✅ Data files (if small)
✅ references.bib
❌ Rendered outputs (regenerate)
❌ .quarto/ cache directory

Troubleshooting

PDF Generation Fails

Problem: LaTeX errors when rendering PDF

Solution 1: Install TinyTeX

quarto install tinytex

Solution 2: Use Typst (modern alternative)

---
format:
  typst: default
---

Solution 3: Use Chrome headless

---
format:
  pdf:
    pdf-engine: chrome
---

Python Code Not Executing

Problem: Code blocks don't run

Check:

Python is installed (python3 --version)
Required packages installed (pip install pandas matplotlib)
No syntax errors in code blocks

Debug:

quarto render doc.qmd --execute-debug

Jupyter Kernel Not Found

Problem: Can't render .ipynb files

Solution:

# Install Jupyter
python3 -m pip install jupyter

# Or use uv
uv tool install jupyter

Quick Reference

Common Commands

quarto render doc.qmd                  # Render with default format
quarto render doc.qmd --to pdf        # Render to PDF
quarto render doc.qmd --to html       # Render to HTML
quarto preview doc.qmd                # Live preview
quarto create project website site   # Create website project
quarto publish gh-pages               # Publish to GitHub Pages
quarto install tinytex                # Install LaTeX
quarto check                          # Verify installation

Output Format Options

--to gfm            # GitHub-flavored markdown (default)
--to pdf            # PDF via LaTeX or typst
--to html           # HTML
--to revealjs       # HTML slides
--to pptx           # PowerPoint
--to typst          # Typst (modern LaTeX alternative)
--to epub           # eBook

Common YAML Frontmatter

---
title: "Document Title"
author: "Josh Lane"
date: "2024-01-30"
format:
  pdf:
    toc: true               # Table of contents
    number-sections: true   # Numbered sections
    geometry: margin=1in    # Page margins
  html:
    toc: true
    code-fold: true         # Collapsible code
    theme: cosmo            # Visual theme
---

Resources

Official docs: https://quarto.org
Gallery: https://quarto.org/docs/gallery/
Guide: https://quarto.org/docs/guide/
Extensions: https://quarto.org/docs/extensions/
Publishing: https://quarto.org/docs/publishing/

Document & Report Writing Principles

Data Sources Stay in Code, Not Prose

When writing Quarto documents, strategy memos, or any analytical report:

Never name data sources in prose — table names (all_customers_revenue_final), field names (salesforce_account_id, recognized_revenue_c), database names (ep-core-data), query tools (BigQuery, Prophet), or internal implementation details belong in code cells only
Describe the data, not where it came from — write "actual revenue from those accounts" not "revenue from all_customers_revenue_final joined via salesforce_account_id"
Describe the method, not the tool — write "trend model with yearly seasonality" not "Prophet with changepoint_prior_scale"
The code is the documentation — readers who need provenance can read the code cells; prose is for interpretation and insight

What Belongs in Prose vs. Code

| Prose | Code | |---|---| | "actual revenue from sales-linked accounts" | JOIN all_customers_revenue_final ON salesforce_account_id | | "trend model incorporating deal count" | Prophet.add_regressor('won_count_norm') | | "CRM-linked accounts" | WHERE salesforce_account_id IS NOT NULL | | "80% of revenue is attributable" | sf_corr_attributed_pct = ... | | "pipeline data available since April 2024" | cache TTL, query date bounds |

Quarto Skill

Quarto is an open-source scientific and technical publishing system built on Pandoc. It renders computational documents (with Python, R, Julia code) to publication-quality output in multiple formats.

Bootstrap with `epq scaffold` (PREFERRED for new projects)

New QMD analysis projects must be created via the epq CLI — do NOT manually create pyproject.toml, _quarto.yml, justfile, figures/_style.py, or latex-header.tex:

epq scaffold ~/workspace/projects/my-analysis   # generates all boilerplate
cd ~/workspace/projects/my-analysis
just bootstrap                                   # uv sync + ipykernel install
epq audit .                                      # verify clean (0 warnings)

To audit or retrofit an existing project:

epq audit <path>    # JSON violations with file/line/suggestion
epq fix <path>      # unified diffs for auto-fixable issues (LLM reviews and applies)
epq list-rules      # enumerate all rule IDs

Shared library — import in every analysis QMD setup cell:

from epq import style, cache, bq, fmt

style.apply_style()          # canonical rcParams (150/200 DPI, sans-serif fallbacks)
style.NAVY, style.TEAL, ...  # workspace palette — never redefine inline
cache.read_cache("name")     # 24h TTL file-based cache → None if stale/missing
bq.run_bq_query(SQL)         # subprocess bigquery CLI wrapper, max_results=10000
fmt.millions_formatter()     # FuncFormatter for ax.yaxis.set_major_formatter()

Full authoring reference: ~/src/analysis-doc/docs/AGENTS.md Retrofit guide: ~/src/analysis-doc/docs/RETROFIT.md

External Python Figures Pattern (PREFERRED for complex documents)

For documents with multiple visualizations, extract all matplotlib code into standalone Python modules in figures/. The QMD becomes a thin shell with stub cells only.

Architecture

{name}.qmd          ← thin shell: prose + data-load cells + stub figure cells only
_quarto.yml         ← jupyter: {name} (set by epq scaffold)
latex-header.tex    ← local copy (set by epq scaffold)
justfile            ← imports ~/src/analysis-doc/tools/justfile
figures/
  __init__.py
  fig_NAME.py       ← one module per figure; render(data) contract
scripts/data/
  extract_NAME.py   ← standalone BigQuery extractor; writes data/cache/*.json
data/cache/         ← JSON cache files (gitignored)
{name}_files/
  figure-pdf/       ← dev loop writes here (pre-created by epq scaffold)

Figure Module Contract

One file per figure (not a dispatcher). render(data: dict) is the only public function.

# figures/fig_revenue.py
from pathlib import Path
from epq import style, fmt  # palette and formatters from epq — never local _style.py

LABEL = "fig-revenue"       # matches QMD #| label: fig-revenue
FIG_WIDTH = 8.5
FIG_HEIGHT = 4.0
FIG_CAP = "Insight-focused caption — not a data description."


def render(data: dict) -> None:
    """Render figure. Called from QMD stub cell with shared data dict.

    Do NOT call plt.show() or plt.savefig() here — Quarto handles capture.
    Do NOT call plt.close() here — handled in __main__ and between QMD cells.
    """
    import matplotlib.pyplot as plt
    style.apply_style()
    fig, ax = plt.subplots(figsize=(FIG_WIDTH, FIG_HEIGHT))

    # ... all matplotlib code, reading from data dict ...
    df = data.get("revenue", _load_sample_data())
    ax.bar(df["month"], df["revenue"], color=style.NAVY)
    ax.yaxis.set_major_formatter(fmt.millions_formatter())
    ax.set_title(FIG_CAP.rstrip("."))
    plt.tight_layout()


def _load_sample_data():
    """Synthetic fallback for dev loop (no BQ needed)."""
    import pandas as pd
    return pd.DataFrame({"month": ["Q1", "Q2", "Q3", "Q4"],
                         "revenue": [1.2e6, 1.4e6, 1.3e6, 1.6e6]})


if __name__ == "__main__":
    """Dev loop — saves to {project}_files/figure-pdf/{LABEL}-output-1.png.

    Writes to the same path Quarto uses, so visual inspection is against the
    real render artifact. Run via: just dev-fig revenue
    """
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt

    _project_root = Path(__file__).parent.parent
    _out_dir = _project_root / f"{_project_root.name}_files" / "figure-pdf"
    _out_dir.mkdir(parents=True, exist_ok=True)
    out = str(_out_dir / f"{LABEL}-output-1.png")

    render({})                                          # synthetic data fallback
    plt.savefig(out, dpi=150, bbox_inches="tight")     # savefig BEFORE close
    print(f"Saved {out}")
    plt.close("all")

QMD Stub Cell

#| label: fig-revenue
#| fig-cap: "Revenue by stream."
#| fig-height: 4.0
#| fig-width: 8.5
#| fig-pos: "H"
#| out-width: 100%
import sys; sys.path.insert(0, str(Path("."))) if "." not in sys.path else None
from figures import fig_revenue
fig_revenue.render(data)

Justfile (generated by `epq scaffold`)

Projects get a thin justfile that imports the canonical recipe library:

# Project-local overrides go here. Canonical recipes imported below.
import? '~/src/analysis-doc/tools/justfile'

The canonical justfile provides:

# Render figure to {project}_files/figure-pdf/fig-NAME-output-1.png
dev-fig NAME:
    PYTHONPATH=. uv run python figures/fig_{{NAME}}.py
    @echo "→ $(basename $PWD)_files/figure-pdf/fig-{{NAME}}-output-1.png"

# Full render (clears Jupyter cache)
render:
    rm -rf .jupyter_cache/
    quarto render *.qmd

Figure Audit and Visual Iteration Protocol

CRITICAL: Any task involving figures — iteration, audit, or review — MUST render and visually inspect the PNG before reporting. Code-only review is incomplete.

Required sequence for every figure change or audit:

just dev-fig NAME → {project}_files/figure-pdf/fig-NAME-output-1.png
Read the PNG using the Read tool — do not skip this step
Apply the visual readability checklist to what you see in the PNG
Fix issues, re-render, re-inspect until checklist passes
Only then check factual accuracy against prose/data

Do NOT use just preview-fig / Playwright — it screenshots HTML chrome, not the raw figure. Read the PNG directly from {project}_files/figure-pdf/.

Visual Readability Checklist (inspect the rendered PNG, not the code):

[ ] Suptitle not clipped — suptitle(y=1.02) ALWAYS clips in PDF. Use y=1.0 + fig.subplots_adjust(top=0.82–0.88). Never use y > 1.0.
[ ] No text collisions — suptitle and panel titles have clear separation
[ ] Text readable at print scale — PDF page is 6.5in wide; >1.5× zoom = illegible
[ ] Panels not squished — 3-panel: FIG_HEIGHT ≥ 4.0; single-panel ≥ 3.0; swim-lane ≥ 4.5
[ ] Bar labels within bounds — set_clip_on(True) makes labels invisible, not small
[ ] No partial annotations — text near ylim/xlim edges fully in frame
[ ] Text contrast correct — see color guard below

Text contrast rule (most common source of invisible text):

# Use epq.style helper — covers all palette fills correctly:
from epq import style
tc = style.text_color_for(fill_color)   # WHITE for dark fills, NAVY for light

# Manual guard if needed:
DARK_FILLS = (style.NAVY, style.TEAL, style.CORAL, style.PURPLE, style.GOLD)
tc = style.WHITE if fill in DARK_FILLS else style.NAVY
#                                               ^^^^ NAVY, never SLATE
# SLATE on LIGHT_SLATE = 2.6:1 contrast — fails AA, looks muddy at print scale

Verified contrast ratios:

NAVY/TEAL/CORAL/PURPLE/GOLD fills → WHITE text (8–16:1 ✅)
LIGHT_SLATE fill → NAVY text (5.5:1 ✅); SLATE fails (2.6:1 ❌)
Pale backgrounds (LIGHT_BG, NAVY_BG) → NAVY text; WHITE is invisible (~1.07:1 ❌)

Common visual defects invisible in code:

Suptitle bleeding into panel titles → increase FIG_HEIGHT, use subplots_adjust
Bar labels extending past xlim → invisible (not small); expand xlim or reduce offset
Blank/white PNG → plt.savefig() called after plt.close(). Fix: call savefig first:
```
render({})
plt.savefig(out, dpi=150, bbox_inches="tight")
plt.close("all")
```

Key Rules

Edit figures/fig_NAME.py, not the QMD — QMD stubs never change unless label/caption/dimensions change
One module per figure — render(data: dict) -> None, no dispatcher pattern
from epq import style, fmt — never a local _style.py copy
PYTHONPATH=. required when running modules directly: PYTHONPATH=. uv run python figures/fig_NAME.py
render() must NOT call plt.show(), plt.close(), or plt.savefig() — Quarto handles capture; __main__ handles save
matplotlib.use("Agg") before any pyplot import in __main__
data dict is the only input — never read cache files inside figure modules; provide synthetic fallback in _load_sample_data()
Dev loop output: {project}_files/figure-pdf/{LABEL}-output-1.png (pre-created by epq scaffold)
Reference implementation: ~/workspace/projects/luma-revenue-forecast/ and ~/workspace/projects/revenue-forecast-2026/

PDF Project Bootstrap

Use epq scaffold — it handles all of the following automatically:

epq scaffold ~/workspace/projects/my-project
cd ~/workspace/projects/my-project
just bootstrap    # runs: uv sync + ipykernel install + mkdir data/cache

Manual checklist (only if NOT using epq scaffold):

Create pyproject.toml with [tool.uv] package = false and run uv sync
Register kernel: uv run python -m ipykernel install --user --name=<project>
_quarto.yml: set jupyter: <project> (must match registered kernel name exactly)
Copy latex-header.tex locally from ~/src/analysis-doc/templates/ (do not reference it via external path)
Never borrow another project's venv or kernel — each project must own its own

PDF Title Suppression

Never use YAML title: or subtitle: — they produce \maketitle which double-renders with any custom header
Use a raw LaTeX inline header in the document body instead:

{\large\textbf{Document Title}}
\hfill
{\small\color{gray} Author \quad\textbullet\quad \today}
\vspace{4pt}
\hrule
\vspace{8pt}

Add pagetitle: " " to YAML to suppress the HTML <title> without breaking Pandoc

PDF Figure Rules

Split multi-story panels

Legend placement

Top-right (loc='upper right') when data occupies the bottom portion of the chart
Below chart when dense: fig.legend(loc='lower center', ncol=N, bbox_to_anchor=(0.5, -0.02)) + fig.subplots_adjust(bottom=0.20)
Never place legend over data area

Date axes — always explicit

Never rely on AutoDateLocator — it overcrowds on multi-year spans:

ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=[1, 7]))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b '%y"))
ax.tick_params(axis='x', labelsize=8)

Never auto-open artifacts

Justfile render recipes must not include && open <file>. The user opens files manually.

LaTeX Max-Runs Warning

Use \needspace not \clearpage (see needspace sizing table in the \needspace section)
Remove labelformat=empty from \captionsetup — caption numbering churn drives oscillation
Use htbp float placement rather than forced H

\needspace sizing reference — X = figure height + 1.2in overhead:

| Figure height | \needspace | |---|---| | 3.2in | \needspace{4.5in} | | 4.0in | \needspace{5.2in} | | 5.5in | \needspace{6.8in} | | Prose only | \needspace{2.5in} |

BigQuery + Pandas Gotchas (in Quarto Documents)

BigQuery returns nullable Int64 for integer columns — always .astype('float64') before fillna()
Quarterly data on a monthly x-axis: df.set_index('quarter').reindex(monthly_idx, method='ffill') + ax.step(..., where='post')
Never put \n inside BigQuery SQL string literals in Python f-strings — use spaces instead
Always define intermediate variables BEFORE the Markdown(f"""...""") call — f-strings evaluate at call time, not definition time

Best Practices (TL;DR)

Markdown-First: Default to format: gfm with wrap: none for composability, portability, and archival
No Line Wrapping: Always use wrap: none for GFM output (avoids artificial line breaks)
Dark Mode: Always use auto-dark filter with dual themes for HTML output (accessibility and modern UX)
Visual Expression: Use charts and formatted tables, NEVER raw data dumps (df.head(), print(dict))
LaTeX for Math: Use LaTeX notation for ALL mathematical expressions ($\alpha = 0.15$, not "alpha = 0.15")
Professional Tables: Use LaTeX tables (booktabs) for PDF, Great Tables for HTML
No TOC: Table of contents is usually noise - use clear section headings instead
PDF for Sharing: Use --to pdf for Google Drive sharing (read-only, professional)
Blank Lines Before Lists: ALWAYS include a blank line before every list (bullet or numbered) - no exceptions
No Appendix for Sources: Data sources belong in code blocks, not appendix - only add external sources not directly referenced in code to appendix
Use Markdown() Class: ALWAYS use Markdown() for text output in code blocks - NEVER use print() or printf() (output must render as formatted markdown)
Minimal PDF Titling: For PDF output, suppress YAML title/author/date fields (they produce an academic title block via \maketitle). Use a raw LaTeX minipage inline header instead. Use ## markdown headings for section headings (NOT raw LaTeX \noindent{\large\textbf{...}} blocks — those fight with Quarto's float placement). Exception: only use raw LaTeX headings for the document title line itself.
PDF Figure Sizing: Every chart chunk MUST have #| fig-pos: "H", #| fig-width: N, #| fig-height: N, #| out-width: 100%. Always end chunks with plt.close('all'). Set ax.text(...).set_clip_on(True) on all annotation labels. Never use mixed coordinate transforms. See "PDF Figure Sizing — Critical Patterns" section for full details.

PDF Figure Sizing — Critical Patterns

CRITICAL: Matplotlib figure sizing in Quarto PDF output is failure-prone. Follow these rules exactly.

The Working Pattern (copy verbatim)

Every chart chunk that renders to PDF must have ALL of these chunk options:

#| label: fig-my-chart
#| fig-pos: "H"          # force-here via float.sty — prevents deferral/stacking
#| fig-width: 6.5        # must match figsize width in Python code
#| fig-height: 3.5       # must match figsize height in Python code
#| out-width: 100%       # tells LaTeX to scale to full text column width
#| fig-cap: "Caption text with no bare % characters — write 'percent' instead."

And in the Python code:

fig, ax = plt.subplots(figsize=(6.5, 3.5))  # must match chunk fig-width/fig-height
# ... chart code ...
plt.tight_layout()
plt.show()
plt.close('all')   # REQUIRED — prevents state leaking between chunks

rcParams That Must Not Be Changed

plt.rcParams.update({
    'savefig.bbox': None,        # fills declared figsize exactly — do NOT set to 'tight'
    'savefig.pad_inches': 0,
    'figure.dpi': 200,
    'savefig.dpi': 300,
})

Root Causes of "Comically Small" Figures

These are the diagnosed failure modes, in order of frequency:

1. Text labels extending beyond xlim/ylim — MOST COMMON

# ❌ BROKEN: annotation text positioned beyond axis limits
for bar, row in zip(bars, df.itertuples()):
    ax.text(bar.get_width() + 4, ..., f"{row.value}")  # +4 may push past xlim
ax.set_xlim(0, 380)  # text at bar.width+4 can exceed 380 → bbox explosion

# ✅ FIX: clip text labels to axes, OR increase xlim to accommodate labels
for bar, row in zip(bars, df.itertuples()):
    t = ax.text(bar.get_width() + 4, ..., f"{row.value}")
    t.set_clip_on(True)   # ← prevents bbox from expanding to include clipped text
# OR: ax.set_xlim(0, 420)  # ensure xlim accommodates largest label

Rule: After setting xlim, add t.set_clip_on(True) to ALL ax.text() calls, or add enough xlim headroom to fit the longest annotation.

2. Mixed coordinate transforms on annotations

# ❌ BROKEN: mixes data coordinates with axis-fraction transform
ax.annotate("", xy=(x0, y0 + 3), xytext=(x1, y1 + 3), ...)  # data coords
ax.text(0.5, max_val + 9, "label", transform=ax.get_xaxis_transform())  # mixed!

# ✅ FIX: use pure axes fraction for floating annotations
ax.annotate("label text",
            xy=(0.5, 0.85), xycoords='axes fraction',
            ha='center', va='center', fontsize=9, ...)

3. Missing plt.close('all') between chunks

plt.tight_layout()
plt.show()
plt.close('all')

4. fig-pos: "!ht" instead of "H"

5. Missing #| fig-width / #| fig-height on chunk

Without explicit chunk-level sizing, Quarto uses YAML defaults and may not pre-allocate the correct float box size before Python renders into it. Always specify both per-chunk.

Diagnosing Figure Size Issues

To identify which figures are malformed without waiting for a full visual review:

# Add keep-tex to render temporarily
cd /path/to/doc && uv run quarto render doc.qmd --to pdf -M keep-tex:true

# Check actual page dimensions of each generated figure PDF
for f in doc_files/figure-pdf/*.pdf; do
  echo -n "$f: "
  pdfinfo "$f" 2>/dev/null | grep "Page size"
done

# A correct 6.5×3.5in figure should be ~468×252 pts (at 72 pts/in)
# A figure with width << 440 pts or height >> 400 pts is malformed

The Nuclear Option

If a figure keeps rendering incorrectly despite all fixes, force PNG raster output:

#| label: fig-problematic
#| fig-pos: "H"
#| dev: png            # ← bypass the PDF vector pipeline entirely
#| dpi: 150
#| fig-width: 6.5
#| fig-height: 3.5
#| out-width: 100%

PNG output is immune to all the bbox/transform issues because matplotlib renders to a fixed-size raster and Quarto embeds it directly. Use as a last resort since vector PDF is crisper.

Caption Numbering

% Restore default numbering (Figure 1., Figure 2., etc.) with bold prefix:
\captionsetup{font={small,it},justification=centering,skip=6pt,labelfont=bf}

% Suppress numbering (caption text only, no "Figure N." prefix):
\captionsetup{font={small,it},justification=centering,skip=6pt,labelformat=empty,labelsep=none}

Never use bare % in fig-cap strings — write "percent" or "percentage points" instead. LaTeX may fail to compile depending on pandoc version.

Reference Lines (axvline/axhline) Opacity

Reference lines (baselines, averages) should be visible context, not dominant elements. Always set alpha=0.4:

ax.axvline(48.9, color=NAVY,  linestyle=":",  linewidth=1.6, alpha=0.4, label="Baseline", zorder=3)
ax.axhline(37.8, color=SLATE, linestyle="--", linewidth=1.4, alpha=0.4, label="Avg", zorder=3)

At alpha=1.0 (default), reference lines dominate the chart and compete with the data bars. alpha=0.4 keeps them readable without visual dominance.

Visual Expression Philosophy

CRITICAL: Quarto documents are for COMMUNICATION, not raw data dumps.

Quarto outputs are static documents meant to convey insights to humans. Raw dataframes, print statements, and JSON blobs fail to communicate effectively.

Visual Hierarchy (Use in Order)

Charts/Plots - For trends, distributions, comparisons, relationships
Formatted Tables - For structured data with styling and context
Formatted Metrics - For key numbers with context and formatting
Raw Output - NEVER (not even for debugging - use separate analysis files)

Anti-Patterns: What NOT to Do

# ❌ BAD: Raw dataframe dump
df.head()

# ❌ BAD: Print statements - output renders as plain text, not formatted markdown
print(f"Total: {total}")
print(data_dict)

# ❌ BAD: printf/print for text - use Markdown() instead
print("## Summary\n- Item 1\n- Item 2")  # Renders as plain text!

# ❌ BAD: Bare variable returning data structure
result  # Returns raw dict/JSON

# ❌ BAD: DataFrame info without formatting
df.describe()
df.info()

# ✅ GOOD: Use Markdown() for ALL text output
from IPython.display import Markdown
Markdown(f"""
## Summary

- **Total**: {total:,}
- **Average**: {avg:.2f}
""")

❌ BAD: Plain text for mathematical notation
- The growth rate is alpha = 0.15 or 15%
- We calculated the mean mu = sum(xi)/n
- The correlation coefficient r = 0.85

❌ BAD: No table formatting
```python
print(df.head())

❌ BAD: Using asterisks for equations

E = m * c^2
y = beta0 + beta1 * x


### Good Patterns: Visual Communication

```python
# ✅ GOOD: Chart for trends
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
df.groupby('date')['sales'].sum().plot(ax=ax, kind='line')
ax.set_title('Sales Trend Over Time')
ax.set_ylabel('Sales ($)')
plt.tight_layout()
plt.show()

# ✅ GOOD: Formatted table using Great Tables
from great_tables import GT

(GT(df.head(10))
    .tab_header(title="Top 10 Sales Records")
    .fmt_currency(columns="sales", currency="USD")
    .fmt_date(columns="date", date_style="medium"))

# ✅ GOOD: Formatted table using pandas markdown
from IPython.display import Markdown
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))

# ✅ GOOD: Formatted metrics in markdown with LaTeX
from IPython.display import Markdown

Markdown(f"""
## Key Metrics

- **Total Sales**: ${total_sales:,.2f}
- **Average Order**: ${avg_order:,.2f}
- **Growth Rate**: $\\alpha = {growth_rate:.1%}$ (15% YoY)
- **Top Product**: {top_product}

### Statistical Summary

The linear regression model $y = \\beta_0 + \\beta_1 x + \\epsilon$ yielded:

- Slope: $\\hat{{\\beta_1}} = 3.2$ (SE = 0.4)
- $R^2 = 0.78$, indicating strong fit
""")

# NOTE: Mermaid diagrams must use native Quarto syntax outside Python blocks
# Use ```{mermaid} directly in markdown, NOT inside Markdown() calls

Why Visual Expression Matters

Documents are for humans: Show insights, not data structures
Static format: No interactive exploration - must communicate clearly on first view
Traceable reasoning: Visualize fact→conclusion chains, not raw JSON
Professional output: Charts and tables look polished in PDF/HTML/Word
Accessibility: Visual hierarchy helps readers navigate content
Shareability: Well-formatted outputs communicate without explanation

Narrative Structure

Build understanding progressively through a series of sections:

Document Flow

Abstract - The punchline first (executive summary for busy readers)
Key Findings - Scannable bullet points with confidence levels
Base Facts - Individual observations/data, each in its own section
Synthesis Sections - Combine earlier facts into higher-level insights
Conclusion - Final synthesis referencing the insights above

Base Facts (Individual Sections)

Each base fact is an independent observation with its own data and evidence. Use descriptive headers (not "Fact 1"):

## Response Time Distribution

\needspace{3in}

Analysis of the past 7 days shows significant tail latency:
- p50: 45ms
- p95: 230ms  
- p99: 890ms (concerning)

```{python}
#| echo: false
# Chart showing latency distribution


### Synthesis Sections

Synthesis sections **explicitly reference** which earlier sections they build upon:

```qmd
## Performance Degradation Under Load

\needspace{4in}

Building on the response time distribution and traffic patterns above, we 
observe a clear correlation: p99 latency spikes to 2.3s during the 2-4pm 
peak traffic window. The system handles baseline load well but degrades 
significantly under peak conditions.

Keeping Content Together (PDF)

Use \needspace{Xin} before sections with charts/diagrams to prevent awkward page breaks and large whitespace gaps:

## Revenue by Carrier

\needspace{4in}

```{python}
# Chart code here


**Guidelines:**
- **Mermaid diagrams**: `\needspace{2in}`
- **Single chart**: `\needspace{3in}`
- **Chart + explanation**: `\needspace{4in}`

**Requires** in YAML frontmatter:
```yaml
format:
  pdf:
    include-in-header:
      text: |
        \usepackage{needspace}

Markdown Formatting Rules (CRITICAL)

IMPORTANT: Markdown requires blank lines between paragraphs for proper rendering.

Line Breaks and Paragraphs

The Problem:

❌ BAD: This will render as one long line
This text appears on a new line in the source
But it renders on the same line as above
Because there's no blank line between them

The Solution:

✅ GOOD: This renders as separate paragraphs

This text appears on its own line because there's a blank line above it.

Each paragraph needs a blank line before and after it.

Common Markdown Patterns

Paragraphs (need blank lines):

This is paragraph one.

This is paragraph two.

This is paragraph three.

Lists REQUIRE blank line before the list:

❌ BAD: List doesn't render correctly
Here is some text:
- Item 1
- Item 2

✅ GOOD: Blank line before list
Here is some text:

- Item 1
- Item 2

Numbered lists also require blank line before:

❌ BAD: Numbered list broken
The steps are:
1. First step
2. Second step

✅ GOOD: Blank line before numbered list
The steps are:

1. First step
2. Second step

Lists (no blank lines between items):

- Item 1
- Item 2
- Item 3

Multi-paragraph list items (blank lines within item):

- Item 1 with first paragraph

  Item 1 continued with second paragraph (indented 2 spaces)

- Item 2 starts here

Headers (blank line before and after):

Previous paragraph ends here.

## Section Header

New paragraph starts here.

Code blocks (blank line before and after):

Previous paragraph ends here.

```python
print("code block")
```

New paragraph starts here.

Quarto-Specific Markdown

Python code output:

#| echo: false
from IPython.display import Markdown

# ❌ BAD: Single newline won't create paragraph break
Markdown("Line 1\nLine 2")  # Renders as: Line 1 Line 2

# ✅ GOOD: Double newline creates paragraph break
Markdown("Line 1\n\nLine 2")  # Renders as separate paragraphs

# ✅ GOOD: Use triple-quoted string with blank lines
Markdown("""
Paragraph one.

Paragraph two.

Paragraph three.
""")

F-strings in Markdown:

#| echo: false
from IPython.display import Markdown

# ✅ GOOD: Blank lines between paragraphs
Markdown(f"""
## Analysis Results

The growth rate is {growth_rate:.1%}.

This represents a significant increase over last quarter.

We recommend increasing inventory by {inventory_increase:,} units.
""")

Best Practices

✅ DO:

Use blank lines between all paragraphs
Use blank lines before and after headers
Use blank lines before and after code blocks
Use blank lines before and after tables
Use triple-quoted strings for multi-line markdown in Python

❌ DON'T:

Use single newlines and expect paragraph breaks
Forget blank lines around headers or code blocks
Mix single and double newlines inconsistently
Use \n in strings expecting paragraph breaks (use \n\n)

Testing Markdown Formatting

Quick test:

# Create test document
cat > test.qmd << 'EOF'
---
title: "Markdown Test"
format: gfm
---

Paragraph 1.

Paragraph 2.

## Header

Paragraph 3.
EOF

# Render and check output
quarto render test.qmd --to gfm
cat test.md

LLM Self-Reasoning with Quarto

Use /think command for structured analysis with graduated detail.

Document Structure

Documents use graduated detail so readers can stop at their desired depth:

Abstract (paragraph)

Self-contained executive summary
Complete story: what, why, result, meaning
Include key metrics and confidence assessment

Key Findings (3-5 bullets)

Result + confidence + brief evidence
Scannable - each finding valuable standalone
Format: **[Finding]**: [Result] — [Evidence] (Confidence)

Investigation (detailed)

Observations (sourced facts with academic citations [source])
Analysis (visual reasoning, statistical evidence)
Interpretation (what it means, confidence, dependencies)

Appendix (optional)

Investigation notes, dead ends, debugging traces
External data sources NOT directly referenced in code blocks (e.g., verbal conversations, meeting notes, prior analyses)
Do NOT duplicate data sources already expressed in code blocks - the code IS the source documentation

Visual Evidence

Include diagram, chart, or table for most findings:

Mermaid diagrams for reasoning flows
Charts for quantitative analysis
Formatted tables for comparative data

Example Invocation

/think Why is the login endpoint returning 500 errors intermittently?

Creates analysis document with abstract-first structure, key findings with confidence levels, and visual reasoning chains.

See /think command for full template.

When to Use Quarto

Perfect for:

LLM epistemological reasoning (use /think command)
Static reports and documentation (no interactivity needed)
Multi-format publishing (PDF + HTML + Word from single source)
Scientific documents (equations, citations, cross-references)
Presentations (RevealJS HTML slides, PowerPoint, Beamer PDF)
Websites and blogs (multi-page projects)
Exporting Jupyter notebooks for publication

NOT for:

Interactive dashboards (use Shiny or dedicated dashboard tools)
Real-time data updates (use web dashboards)
When you need widgets/sliders for end users

Data Provenance and Portability

CRITICAL: Quarto documents must be reproducible with documented dependencies.

When someone runs quarto render analysis.qmd, they should be able to get identical results given:

The document itself (.qmd file)
Documented external dependencies (with setup instructions)
Access to the same data sources (APIs, databases)

The Portability Contract

A .qmd file defines a complete data pipeline. The document contains:

Data extraction logic - How to obtain the data (queries, API calls, etc.)
Transformation code - How to process and analyze
Presentation - Charts, tables, narrative
Dependency documentation - Setup instructions for heavy dependencies

Practical limits: Some dependencies are too expensive to rebuild on every render:

Vector indexes (LanceDB, FAISS) - Document how to build, reference existing
Large datasets - Commit to git or document extraction, don't re-download
ML models - Reference by path with setup instructions

The key is documentation: readers must understand what's needed and how to set it up.

Anti-Pattern: Opaque File References

❌ BAD: Referencing local files without provenance

df = pd.read_json('/tmp/orders.jsonl', lines=True)  # Where did this come from?
df = pd.read_csv('sales.csv')  # Who created this? When? How?

# "This JSONL file" without explaining its origin
data = load_data('extracted_metrics.jsonl')  # Non-portable!

Good Pattern: Embed Data Extraction

✅ PREFERRED: Document defines where data comes from

#| cache: true
import subprocess
import io
import pandas as pd

# Extract from BigQuery - cached to avoid re-running on every render
result = subprocess.run([
    'bigquery', 'query',
    '''SELECT * FROM production.orders 
       WHERE date >= '2024-01-01' 
       AND status = 'completed' ''',
    '--format', 'jsonl'
], capture_output=True, text=True, check=True)

df = pd.read_json(io.StringIO(result.stdout), lines=True)

✅ ALSO GOOD: Canonical external sources

#| cache: true
import pandas as pd

# Public dataset with stable URL
df = pd.read_csv('https://data.company.com/public/sales-2024.csv')

# Or versioned data in the same repository
df = pd.read_csv('data/sales-2024-v2.csv')  # Committed to git with the .qmd

Heavy Dependencies: Document, Don't Rebuild

Rule of thumb: Un-cached renders should complete in < 60 seconds.

< 60 seconds → Embed in document (with cache: true)
> 60 seconds → Document as external dependency with setup instructions

Some dependencies are too expensive to recreate on every render. Document them clearly so readers can set up the environment.

✅ GOOD: Reference with setup documentation

#| echo: false
import subprocess

# DEPENDENCY: LanceDB index at ~/.lancedb/documents
# Setup: lancer ingest -t documents ~/corpus/*.md
# This index contains ~50k documents and takes ~10 min to build

result = subprocess.run(
    ['lancer', 'search', '-t', 'documents', 'shipping rate errors', '--limit', '20'],
    capture_output=True, text=True, check=True
)
relevant_docs = result.stdout

✅ GOOD: Prerequisites section in document

---
title: "Knowledge Base Analysis"
---

## Prerequisites

This analysis requires the following setup:

1. **LanceDB index**: `lancer ingest -t documents ~/corpus/*.md`
2. **BigQuery access**: Authenticated via `gcloud auth application-default login`
3. **Data snapshot**: Run `./scripts/extract-data.sh` (takes ~5 min)

## Analysis
...

❌ BAD: Silent dependency on local state

# No documentation about what this index is or how to create it
results = lancer.search("documents", "query")  # Will fail for anyone else

Caching for Iteration Speed

Use cache: true to avoid re-running expensive operations during iteration.

Requires jupyter-cache (one-time install):

uv pip install jupyter-cache

Per-cell caching:

#| cache: true
#| label: data-extraction

# This cell only re-executes if the code changes
result = subprocess.run(['bigquery', 'query', ...], capture_output=True, text=True)
df = pd.read_json(io.StringIO(result.stdout), lines=True)

Document-wide caching in YAML frontmatter:

---
title: "Analysis Report"
execute:
  cache: true
---

Freeze for Project-Level Caching

For projects with many documents, use freeze to cache execution results in version control:

# _quarto.yml (project config)
execute:
  freeze: auto  # Re-render only when source changes

Key difference:

cache: true - Caches cell outputs locally (Jupyter Cache)
freeze: auto - Stores results in _freeze/ directory (can commit to git)

When to use freeze:

Large projects with many collaborators
Documents with environment-specific dependencies
When you want cached results portable across machines (commit _freeze/)

Cache-Read-or-Query Pattern (BigQuery / Expensive APIs)

Use this pattern when cache: true is insufficient — specifically when:

Querying BigQuery or other expensive external APIs
Cache must survive Quarto kernel restarts (Jupyter cache does not)
You want explicit control over cache invalidation (not tied to source changes)
Cache files are machine-specific and should be gitignored

Set execute: cache: false in frontmatter when using this pattern (disable Jupyter cache to avoid double-caching).

Helper functions (add once per document, in a setup cell):

import json
from pathlib import Path
from datetime import datetime, timezone

_CACHE_DIR = Path('data/cache')
_CACHE_DIR.mkdir(parents=True, exist_ok=True)

def _cache_path(name):
    return _CACHE_DIR / f"{name}.json"

def _read_cache(name):
    p = _cache_path(name)
    if p.exists():
        return json.loads(p.read_text())
    return None

def _write_cache(name, records, scalars=None):
    p = _cache_path(name)
    p.write_text(json.dumps({
        '_queried_at': datetime.now(timezone.utc).isoformat(),
        'records': records,
        'scalars': scalars or {}
    }, default=str))

Per-dataset usage template (repeat for each dataset):

_c = _read_cache('my_dataset')
if _c:
    df = pd.DataFrame(_c['records'])
    my_scalar = float(_c['scalars']['my_scalar'])
else:
    df = run_bq_query(my_query)
    my_scalar = float(df['col'].values[0])
    _write_cache('my_dataset', df.to_dict(orient='records'), {
        'my_scalar': my_scalar,
    })

Never do:

except Exception: my_scalar = 42 — silent fallback masks broken queries
Assign a constant inside try: without a preceding query call — hidden constant, not a live value
Omit _write_cache() in the else branch — next render re-queries unnecessarily

Cache file format:

{
    "_queried_at": "2026-02-20T16:00:00Z",
    "records": [...],
    "scalars": {...}
}

Serialization notes:

Numpy arrays → store as lists (df['col'].tolist()), reconstruct with np.array(...)
Dates → use default=str in json.dumps to handle non-serializable types

Cache invalidation:

# .gitignore
data/cache/

# Justfile recipe
delete-cache:
    rm -rf data/cache/

Complete Example: Reproducible Analysis

---
title: "Q4 2024 Sales Analysis"
author: "Josh Lane"
date: "2024-12-31"
format:
  gfm:
    wrap: none
  html:
    theme:
      dark: darkly
      light: flatly
execute:
  cache: true
filters:
  - auto-dark
---

## Data Extraction

```{python}
#| cache: true
#| label: extract-sales
import subprocess
import io
import pandas as pd

# Reproducible: Query is embedded in the document
result = subprocess.run([
    'bigquery', 'query',
    '''
    SELECT date, product, region, sales, units
    FROM production.sales
    WHERE EXTRACT(QUARTER FROM date) = 4
      AND EXTRACT(YEAR FROM date) = 2024
    ''',
    '--format', 'jsonl'
], capture_output=True, text=True, check=True)

df = pd.read_json(io.StringIO(result.stdout), lines=True)
df['date'] = pd.to_datetime(df['date'])

Analysis

#| echo: false
from IPython.display import Markdown

total_sales = df['sales'].sum()
top_product = df.groupby('product')['sales'].sum().idxmax()

Markdown(f"""
### Key Metrics

- **Total Q4 Sales**: ${total_sales:,.2f}
- **Top Product**: {top_product}
- **Records**: {len(df):,}
""")


### Best Practices Summary

✅ **DO:**

- Embed data extraction commands in the document
- Use `cache: true` for expensive operations
- Reference canonical external URLs when possible
- Commit small data files alongside the `.qmd`
- Use `freeze` for project-level caching

❌ **DON'T:**

- Reference opaque local files (`/tmp/data.jsonl`)
- Say "this file" without showing how it was created
- Assume the reader has access to your local machine
- Leave data provenance undocumented

## Decision Tree: Quarto vs Alternatives


## Installation

Quarto is already installed (version 1.8.27).

**Optional dependencies:**

```bash
# TinyTeX for better PDF generation (LaTeX)
quarto install tinytex

# Chromium for PDF generation (alternative to LaTeX)
quarto install chromium

Current setup:

✅ Pandoc 3.6.3 (embedded)
✅ Chrome headless (system installation)
✅ Python 3.14.2 detected
✅ Jupyter installed (for Python code execution)
❌ TinyTeX not installed (optional)

Neovim Integration

quarto-nvim plugin is installed and configured.

Features:

LSP support for .qmd files (Python, bash, lua, html code chunks)
Syntax highlighting for code chunks
Diagnostics and completion in code cells
Live preview with :QuartoPreview

Keybindings:

<leader>qp - Preview current document (live reload)
<leader>qc - Close preview
<leader>qm - Render to markdown (GFM) - RECOMMENDED default
<leader>qh - Render to HTML
<leader>qd - Render to PDF

Treesitter support: Quarto syntax highlighting requires treesitter parsers:

# In Neovim
:TSInstall markdown
:TSInstall markdown_inline
:TSInstall python

Basic Usage

Quick Start: Native .qmd Files

Create a Quarto document:

# Create analysis.qmd
cat > analysis.qmd << 'EOF'
---
title: "Sales Analysis Q4 2024"
author: "Josh Lane"
date: "2024-01-30"
format:
  gfm:
    wrap: none              # No line wrapping
  html:
    theme:
      dark: darkly          # Dark mode (recommended)
      light: flatly         # Light mode fallback
    code-fold: true
execute:
  cache: true               # Cache expensive queries
filters:
  - auto-dark               # Respect system preference
---

## Data Extraction

```{python}
#| cache: true
#| echo: false
import subprocess
import io
import pandas as pd

# Reproducible: Query embedded in document
result = subprocess.run([
    'bigquery', 'query',
    '''SELECT date, product, sales 
       FROM production.sales 
       WHERE EXTRACT(QUARTER FROM date) = 4 
       AND EXTRACT(YEAR FROM date) = 2024''',
    '--format', 'jsonl'
], capture_output=True, text=True, check=True)

df = pd.read_json(io.StringIO(result.stdout), lines=True)
df['date'] = pd.to_datetime(df['date'])

Overview

#| echo: false
import matplotlib.pyplot as plt
from IPython.display import Markdown

total_sales = df['sales'].sum()
avg_daily = df.groupby('date')['sales'].sum().mean()
top_product = df.groupby('product')['sales'].sum().idxmax()

# Display key metrics
Markdown(f"""
### Key Metrics

- **Total Sales**: ${total_sales:,.2f}
- **Average Daily Sales**: ${avg_daily:,.2f}
- **Top Product**: {top_product}
- **Records Analyzed**: {len(df):,}
""")

Sales Trend Analysis

#| label: fig-sales-trend
#| fig-cap: "Daily sales trending upward in Q4 2024"
#| echo: false

fig, ax = plt.subplots(figsize=(10, 6))
daily_sales = df.groupby('date')['sales'].sum()
daily_sales.plot(ax=ax, kind='line', linewidth=2, color='#2E86AB')
ax.set_title('Sales Trend Over Time', fontsize=14, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Top Products

#| echo: false
# Format top 10 products as markdown table
top_products = (df.groupby('product')['sales']
                .sum()
                .sort_values(ascending=False)
                .head(10)
                .reset_index())

top_products.columns = ['Product', 'Total Sales']
top_products['Total Sales'] = top_products['Total Sales'].apply(lambda x: f"${x:,.2f}")

Markdown(top_products.to_markdown(index=False, tablefmt='pipe'))

Statistical Analysis

#| echo: false
import numpy as np

# Calculate growth metrics
daily_sales = df.groupby('date')['sales'].sum()
growth_rate = (daily_sales.iloc[-1] - daily_sales.iloc[0]) / daily_sales.iloc[0]
avg_growth = daily_sales.pct_change().mean()

Markdown(f"""
The sales data exhibits a compound growth pattern modeled by:

$$
S(t) = S_0 \\times (1 + r)^t
$$

where $S_0$ represents initial sales, $r = {avg_growth:.3f}$ is the average daily growth rate, 
and $t$ is time in days.

**Key Statistical Findings:**

- Overall Q4 growth: $\\Delta S = {growth_rate:.1%}$
- Average daily growth: $\\bar{{r}} = {avg_growth:.3%}$
- Standard deviation: $\\sigma = {daily_sales.std():,.2f}$
- Correlation with marketing spend: $\\rho = 0.82$ (strong positive)

These metrics indicate statistically significant growth ($p < 0.01$) with 
consistent upward momentum throughout the quarter.
""")

Conclusion

Q4 2024 showed strong performance with total sales of ${total_sales:,.2f}. The upward trend in daily sales indicates positive momentum heading into the next quarter. EOF

Install auto-dark extension (one-time, if not already installed)

quarto add gadenbuie/quarto-auto-dark --no-prompt

Render to markdown (DEFAULT - composable, archival)

quarto render analysis.qmd --to gfm

Optional: Render to HTML with dark mode for web viewing

quarto render analysis.qmd --to html # Uses auto-dark theme

Optional: Render to other formats only when needed

quarto render analysis.qmd --to pdf # For Google Drive sharing/printing quarto render analysis.qmd --to html # For web viewing


### File Formats Quarto Can Render

**Input formats:**
- `.qmd` - Quarto markdown (native, recommended)
- `.ipynb` - Jupyter notebooks
- `.md` - Plain markdown (no code execution)
- `.Rmd` - R Markdown files

**Output formats:**
- **Markdown**: `md` (plain), `gfm` (GitHub-flavored) - **PREFERRED default**
- **Documents**: PDF, HTML, Word, ODT, ePub, Typst
- **Presentations**: RevealJS (HTML), PowerPoint, Beamer (PDF)
- **Websites**: Multi-page sites, blogs, books
- **Dashboards**: Interactive dashboards (with Shiny or Observable JS)

## Core Commands

```bash
# Render document to markdown (DEFAULT - composable, text-based)
quarto render document.qmd --to md           # Executable markdown with results
quarto render document.qmd --to gfm          # GitHub-flavored markdown

# Render to other formats
quarto render document.qmd --to pdf          # PDF (for Google Drive sharing)
quarto render document.qmd --to html         # HTML (avoid --toc, use clear headings)

# Render Jupyter notebook to markdown
quarto render notebook.ipynb --to md         # Markdown with executed results

# Render plain markdown (no code execution)
quarto render README.md --to pdf

# Multiple formats at once
quarto render document.qmd --to md,pdf,html

# Preview with live reload
quarto preview document.qmd

# Create new project
quarto create project website mysite
quarto create project book mybook

# Publish
quarto publish gh-pages                       # GitHub Pages
quarto publish quarto-pub                     # Quarto Pub
quarto publish netlify                        # Netlify

Python Code Execution

Code Blocks

## Analysis Section

```{python}
import pandas as pd

df = pd.read_csv("data.csv")
df.head()
```

Code Block Options

```{python}
#| label: fig-sales
#| fig-cap: "Sales over time"
#| echo: false
#| warning: false

plt.figure(figsize=(10, 6))
df.plot(x='date', y='sales')
plt.show()
```

Common options:

echo: false - Hide code, show output only
code-fold: true - Collapsible code blocks
warning: false - Hide warnings
message: false - Hide messages
label: fig-name - Reference label for cross-references
fig-cap: "Caption" - Figure caption

Inline Python Expressions

The total is `{python} f"${total:,.2f}"`.

There are `{python} len(df)` rows in the dataset.

Output Formatting

## Display Options

```{python}
#| output: asis
print("**Bold text** from code")
```

```{python}
#| output: false
# Code runs but output is hidden
result = expensive_calculation()
```

Table Formatting (CRITICAL)

NEVER use raw df.head() or bare dataframes. ALWAYS format tables for presentation.

Option 1: Great Tables (RECOMMENDED for rich formatting)

Installation:

uv add great-tables

Basic usage:

from great_tables import GT

# Simple formatted table
GT(df.head(10))

# With styling
(GT(df.head(10))
    .tab_header(title="Sales Summary", subtitle="Q4 2024")
    .fmt_currency(columns="sales", currency="USD")
    .fmt_percent(columns="growth_rate", decimals=1)
    .fmt_number(columns="quantity", decimals=0)
    .fmt_date(columns="date", date_style="medium")
    .tab_source_note("Source: Company Database"))

Advanced styling:

from great_tables import GT, loc, style

(GT(top_products)
    .tab_header(title="Top 10 Products by Revenue")
    .fmt_currency(columns="revenue", currency="USD")
    .data_color(
        columns="revenue",
        palette=["lightblue", "darkblue"],
        domain=[0, df['revenue'].max()]
    )
    .tab_style(
        style=style.text(weight="bold"),
        locations=loc.body(columns="product")
    ))

Option 2: pandas `.to_markdown()` (Simple, built-in)

For markdown output (use Markdown() to render properly):

from IPython.display import Markdown

# Basic markdown table
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))

# With custom formatting
formatted_df = df.head(10).copy()
formatted_df['sales'] = formatted_df['sales'].apply(lambda x: f"${x:,.2f}")
formatted_df['date'] = pd.to_datetime(formatted_df['date']).dt.strftime('%Y-%m-%d')
Markdown(formatted_df.to_markdown(index=False, tablefmt='pipe'))

Table format options:

'pipe' - GitHub-flavored markdown pipes (RECOMMENDED)
'grid' - ASCII grid
'simple' - Simple spacing
'html' - HTML table (for HTML output)

Option 3: tabulate (Flexible formatting)

Installation:

uv add tabulate

Usage:

from tabulate import tabulate
from IPython.display import Markdown

# Basic table
Markdown(tabulate(df.head(10), headers='keys', tablefmt='pipe', showindex=False))

# With custom formatting
Markdown(tabulate(
    df.head(10),
    headers=['Product', 'Sales', 'Date'],
    tablefmt='pipe',
    floatfmt='.2f',
    showindex=False
))

Best Practices for Tables

✅ DO:

Use Great Tables for HTML/PDF output (rich formatting)
Use .to_markdown() for markdown output (simplicity)
Format currency, percentages, dates before display
Add titles, subtitles, and source notes
Limit to top N rows (10-20 max) - don't dump entire dataset
Apply color scales for numerical columns
Bold headers and important columns
Pair non-trivial tables with visuals - tables showing trends, comparisons, or distributions need an accompanying chart

❌ DON'T:

Use raw df.head() without formatting
Display more than 20 rows in a table
Show raw timestamps or unformatted numbers
Include index column unless meaningful
Use print() for tables (use Markdown() instead to render properly)
Present analytical tables (trends, comparisons) without an accompanying chart

Table Formatting by Output Format

For GFM/Markdown:

from IPython.display import Markdown
# Use Markdown() to render tables properly in Quarto
Markdown(df.head(10).to_markdown(index=False, tablefmt='pipe'))

For HTML:

# Use Great Tables for rich styling
from great_tables import GT
GT(df.head(10)).fmt_currency(columns="sales")

For PDF:

# Use Great Tables or formatted markdown
# Great Tables renders well in PDF via LaTeX
GT(df.head(10))

Data Visualization Best Practices

Charts and diagrams should be your PRIMARY communication tool, not an afterthought.

When to Use Charts vs Tables

Use Charts for:

Trends over time (line charts)
Distributions (histograms, density plots)
Comparisons across categories (bar charts)
Proportions and composition (pie charts, stacked bars)
Relationships between variables (scatter plots)
Geographic data (maps, choropleth)

Use Tables for:

Precise values needed (financial reports)
Lookup reference (top N items)
Multiple dimensions that don't visualize well
Small datasets (< 20 rows)

Use Both:

Chart for the trend, table for the details
Chart for overview, table for drill-down

Chart Types by Use Case

Trends and Time Series:

import matplotlib.pyplot as plt

# Line chart for trends
fig, ax = plt.subplots(figsize=(10, 6))
df.groupby('date')['sales'].sum().plot(ax=ax, kind='line', linewidth=2)
ax.set_title('Sales Trend Over Time', fontsize=14, fontweight='bold')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Comparisons:

# Bar chart for category comparisons
fig, ax = plt.subplots(figsize=(10, 6))
top_products = df.groupby('product')['sales'].sum().nlargest(10)
top_products.plot(ax=ax, kind='barh', color='#2E86AB')
ax.set_title('Top 10 Products by Sales', fontsize=14, fontweight='bold')
ax.set_xlabel('Sales ($)')
plt.tight_layout()
plt.show()

Distributions:

# Histogram for distributions
fig, ax = plt.subplots(figsize=(10, 6))
df['order_value'].hist(bins=30, ax=ax, color='#A23B72', edgecolor='black')
ax.set_title('Order Value Distribution', fontsize=14, fontweight='bold')
ax.set_xlabel('Order Value ($)')
ax.set_ylabel('Frequency')
plt.tight_layout()
plt.show()

Proportions:

# Pie chart for proportions (use sparingly)
fig, ax = plt.subplots(figsize=(8, 8))
category_sales = df.groupby('category')['sales'].sum()
ax.pie(category_sales, labels=category_sales.index, autopct='%1.1f%%', startangle=90)
ax.set_title('Sales by Category', fontsize=14, fontweight='bold')
plt.show()

Relationships:

# Scatter plot for correlations
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(df['marketing_spend'], df['sales'], alpha=0.5, color='#F18F01')
ax.set_title('Marketing Spend vs Sales', fontsize=14, fontweight='bold')
ax.set_xlabel('Marketing Spend ($)')
ax.set_ylabel('Sales ($)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Styling Best Practices

✅ DO:

Use descriptive titles with context
Label axes with units
Use color purposefully (not just default colors)
Add gridlines for readability (alpha=0.3)
Set appropriate figure size (10x6 for landscape, 8x8 for square)
Use plt.tight_layout() to prevent label cutoff
Add legends when showing multiple series
Use consistent color schemes across document

❌ DON'T:

Use default ugly matplotlib colors
Skip axis labels or titles
Create tiny, unreadable charts
Use 3D charts (hard to read accurately)
Overload charts with too many series (> 5-7)
Use pie charts for more than 5 categories

Mermaid Diagrams (Quarto Native Syntax)

Quarto uses {mermaid} executable code blocks. Standard GFM ```mermaid fences won't render.

Quarto syntax:

```{mermaid}
flowchart TD
    A[Load Data] --> B[Clean Data]
    B --> C[Analyze]
```

Note: GFM ```mermaid blocks and Markdown('```mermaid...') calls output raw text instead of rendered diagrams. Use the native {mermaid} syntax directly in markdown.

Common Mermaid diagram types:

flowchart - Process flows, decision trees
sequenceDiagram - Interaction sequences
classDiagram - Object relationships
erDiagram - Entity-relationship diagrams
gantt - Project timelines
pie - Simple pie charts

Example: Reasoning flow:

```{mermaid}
flowchart LR
    F1[Fact: Q4 Sales = $2.1M] --> C1[Conclusion: Strong Quarter]
    F2[Fact: YoY Growth = 15%] --> C1
    F3[Fact: Top Product = Widget X] --> C2[Conclusion: Focus Marketing on Widgets]
```

Preventing Mermaid Clipping in PDF:

Mermaid diagrams can get clipped in PDF output when they exceed page width. Use %%{init}%% directives to control sizing:

```{mermaid}
%%{init: {"flowchart": {"useMaxWidth": true}}}%%
flowchart TD
    A[Start] --> B[Process]
    B --> C[End]
```

Best practices for PDF mermaid:

Always use useMaxWidth: true for flowcharts in PDF output
Prefer TD (top-down) over LR (left-right) for wide diagrams
Break complex diagrams into multiple smaller diagrams
Use shorter node labels to reduce width
Add to YAML frontmatter for document-wide defaults:

format:
  pdf:
    mermaid:
      theme: default

Interactive Charts (HTML Output Only)

For HTML output, use Plotly for interactivity:

import plotly.express as px

# Interactive line chart
fig = px.line(df, x='date', y='sales', title='Sales Trend (Interactive)')
fig.update_layout(hovermode='x unified')
fig.show()

# Interactive scatter with hover
fig = px.scatter(df, x='marketing_spend', y='sales', 
                 hover_data=['product', 'region'],
                 title='Marketing ROI Analysis')
fig.show()

Note: Plotly charts only work in HTML output. For PDF/Word, use matplotlib/seaborn.

Chart Selection Decision Tree

What are you showing?

├─ Change over time? → Line chart
├─ Compare categories? → Bar chart (vertical or horizontal)
├─ Show distribution? → Histogram or box plot
├─ Show composition? → Stacked bar or pie chart (if < 5 categories)
├─ Show relationship? → Scatter plot
├─ Show process/flow? → Mermaid flowchart
└─ Multiple variables? → Faceted plots or small multiples

Mathematical Notation with LaTeX (STRONGLY ENCOURAGED)

For any mathematical content, ALWAYS use LaTeX notation - it's professional and renders beautifully in all formats.

Inline Math

The equation $E = mc^2$ shows the relationship between energy and mass.

The growth rate is approximately $\alpha = 0.15$ or 15%.

We calculated the mean $\mu = \frac{\sum x_i}{n}$ from the dataset.

Display Math (Equations)

The quadratic formula is:

$$
x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
$$

The normal distribution probability density function:

$$
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
$$

Aligned Equations

$$
\begin{aligned}
\text{Revenue} &= \text{Price} \times \text{Quantity} \\
               &= \$50 \times 1000 \\
               &= \$50{,}000
\end{aligned}
$$

Common Mathematical Expressions

Statistics:

- Mean: $\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$
- Variance: $\sigma^2 = \frac{1}{n}\sum_{i=1}^{n} (x_i - \mu)^2$
- Standard deviation: $\sigma = \sqrt{\sigma^2}$
- Correlation: $\rho_{X,Y} = \frac{\text{cov}(X,Y)}{\sigma_X \sigma_Y}$

Finance:

- Compound interest: $A = P\left(1 + \frac{r}{n}\right)^{nt}$
- NPV: $NPV = \sum_{t=0}^{N} \frac{C_t}{(1+r)^t}$
- ROI: $ROI = \frac{\text{Gain} - \text{Cost}}{\text{Cost}} \times 100\%$

Linear regression:

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon
$$

where $\beta_0$ is the intercept, $\beta_i$ are coefficients, and $\epsilon \sim N(0, \sigma^2)$.

Numbered Equations (for cross-references)

Einstein's mass-energy equivalence:

$$
E = mc^2
$$ {#eq-einstein}

As shown in @eq-einstein, energy and mass are equivalent.

Mathematical Notation Best Practices

✅ DO:

Use LaTeX for ALL mathematical expressions, even simple ones like percentages
Use \text{} for text within equations: $\text{Revenue} = \$1{,}000$
Use proper notation: \alpha, \beta, \mu, \sigma, \sum, \prod, \int
Number important equations for cross-referencing
Use aligned environment for multi-line equations
Format numbers properly: \$1{,}000 for currency with comma separators
Use \times for multiplication: $5 \times 10$
Use \cdot for dot product: $\vec{a} \cdot \vec{b}$

❌ DON'T:

Write "alpha = 0.15" in plain text - use $\alpha = 0.15$
Write "x^2" in plain text - use $x^2$
Use asterisk for multiplication - use $\times$ or $\cdot$
Mix LaTeX and plain text notation inconsistently
Skip equation numbering for important formulas

Example: Statistical Report with LaTeX

## Regression Analysis

We fitted a linear model:

$$
\text{Sales} = \beta_0 + \beta_1 \times \text{Marketing Spend} + \epsilon
$$ {#eq-sales-model}

where $\epsilon \sim N(0, \sigma^2)$ represents random error.

### Results

The estimated parameters from @eq-sales-model are:

- Intercept: $\hat{\beta_0} = 50{,}000$ (SE = $2{,}500$)
- Slope: $\hat{\beta_1} = 3.2$ (SE = $0.4$)
- $R^2 = 0.78$

This indicates that each additional \$1 in marketing spend yields approximately \$3.20 in sales ($p < 0.001$).

LaTeX Tables (Professional Formatting)

For publication-quality tables in PDF output, use LaTeX table formatting alongside Great Tables.

Basic LaTeX Table

```{=latex}
\begin{table}[htbp]
\centering
\caption{Quarterly Sales Performance}
\label{tab:sales}
\begin{tabular}{lrrrr}
\hline
Quarter & Revenue (\$) & Growth (\%) & Units & Margin (\%) \\
\hline
Q1 2024 & 1,250,000 & 15.2 & 5,000 & 22.5 \\
Q2 2024 & 1,450,000 & 16.0 & 5,800 & 23.1 \\
Q3 2024 & 1,680,000 & 15.9 & 6,700 & 24.0 \\
Q4 2024 & 1,920,000 & 14.3 & 7,300 & 24.5 \\
\hline
\textbf{Total} & \textbf{6,300,000} & \textbf{15.4} & \textbf{24,800} & \textbf{23.5} \\
\hline
\end{tabular}
\end{table}
```

Enhanced LaTeX Table with booktabs (RECOMMENDED)

```{=latex}
\begin{table}[htbp]
\centering
\caption{Statistical Summary of Key Metrics}
\label{tab:statistics}
\begin{tabular}{lcccc}
\toprule
Metric & Mean & SD & Min & Max \\
\midrule
Sales (\$) & 125{,}000 & 25{,}000 & 75{,}000 & 200{,}000 \\
Orders & 500 & 120 & 300 & 750 \\
AOV (\$) & 250 & 45 & 180 & 380 \\
Churn (\%) & 12.5 & 3.2 & 8.0 & 18.5 \\
\bottomrule
\end{tabular}
\end{table}
```

booktabs provides professional-looking horizontal rules (better than \hline).

Add to YAML frontmatter for booktabs:

header-includes:
  - \usepackage{booktabs}

Python-Generated LaTeX Tables

Option 1: Great Tables with LaTeX output

from great_tables import GT

# Great Tables can export to LaTeX
table = GT(df.head(10))
table.save("table.tex", format="latex")

Option 2: pandas to_latex() with styling

import pandas as pd

# Format DataFrame for LaTeX
df_formatted = df.head(10).copy()
df_formatted['Sales'] = df_formatted['Sales'].apply(lambda x: f"\\${x:,.0f}")
df_formatted['Growth'] = df_formatted['Growth'].apply(lambda x: f"{x:.1f}\\%")

# Export to LaTeX with booktabs
latex_table = df_formatted.to_latex(
    index=False,
    caption="Top 10 Products by Sales",
    label="tab:top-products",
    position="htbp",
    column_format="lrrr",
    escape=False,  # Don't escape $ and %
    formatters={
        'Sales': lambda x: f"\\${x:,.0f}",
        'Growth': lambda x: f"{x:.1f}\\%"
    }
)

print(latex_table)

Option 3: tabulate with LaTeX output

from tabulate import tabulate

latex_table = tabulate(
    df.head(10),
    headers='keys',
    tablefmt='latex_booktabs',  # Use booktabs style
    showindex=False,
    floatfmt='.2f'
)
print(f"\\begin{{table}}[htbp]\n\\centering\n\\caption{{Sales Summary}}\n{latex_table}\n\\end{{table}}")

LaTeX Table Best Practices

✅ DO:

Use booktabs package for professional horizontal rules (\toprule, \midrule, \bottomrule)
Add captions with \caption{}
Add labels for cross-referencing with \label{tab:name}
Use position specifiers: [htbp] (here, top, bottom, page)
Right-align numbers, left-align text: {lrr} column format
Format numbers: Use thousand separators (1{,}000), proper decimal places
Use \textbf{} for bold text (totals, headers)
Center the table with \centering

❌ DON'T:

Use \hline - use booktabs rules instead (\toprule, \midrule, \bottomrule)
Skip captions - tables should always be labeled
Mix LaTeX and markdown tables in the same document
Use vertical lines (|) - they look unprofessional
Forget to escape special characters: $, %, &

Cross-Referencing LaTeX Tables

As shown in Table @tbl-sales, revenue increased across all quarters.

```{=latex}
\begin{table}[htbp]
\centering
\caption{Quarterly Revenue}
\label{tbl-sales}
...
\end{table}

Results from @tbl-sales indicate strong growth momentum.


#### When to Use LaTeX Tables vs Great Tables

**Use LaTeX tables when:**
- Creating PDF output with publication-quality typesetting
- Need precise control over table layout and spacing
- Working with complex multi-row/multi-column headers
- Creating tables for academic papers or formal reports
- Need to match specific journal formatting requirements

**Use Great Tables when:**
- Creating HTML output with interactive features
- Need quick table formatting without LaTeX complexity
- Working with markdown output
- Want consistent styling across HTML/PDF/Word formats
- Need color scales, data bars, or rich HTML styling

**Use both:**
```python
# Create table with Great Tables for HTML
gt_table = GT(df).fmt_currency(columns="sales")
gt_table  # Displays in HTML

# Also export LaTeX version for PDF
df.to_latex(caption="Sales Summary", label="tab:sales")

YAML Frontmatter

Simple Document (Markdown Default)

---
title: "My Report"
author: "Josh Lane"
date: "2024-01-30"
format: gfm  # GitHub-flavored markdown (default for composability)
---

Multiple Formats (Markdown + HTML with Dark Mode)

---
title: "Analysis Report"
format:
  gfm:
    wrap: none
    variant: +yaml_metadata_block
  html:
    theme:
      dark: darkly          # Dark mode (recommended)
      light: flatly         # Light mode fallback
    toc: true
    code-fold: true
    code-tools: true
  pdf:
    toc: true
    number-sections: true
    geometry: margin=1in
filters:
  - auto-dark               # Auto-detect system preference
---

Markdown-Only Output

---
title: "Data Analysis"
format:
  md:
    output-file: "results.md"
    variant: gfm  # Use GitHub-flavored markdown
    preserve-yaml: true
  gfm:
    wrap: none
    output-file: "results-gfm.md"
---

Advanced PDF Options

---
title: "Technical Report"
format:
  pdf:
    documentclass: article
    fontsize: 11pt
    geometry:
      - margin=1in
      - paperwidth=8.5in
      - paperheight=11in
    toc: true
    toc-depth: 2
    number-sections: true
    colorlinks: true
    fig-pos: 'H'
    include-in-header:
      text: |
        \usepackage{fancyhdr}
        \pagestyle{fancy}
        \fancyhead[L]{My Company}
        \fancyhead[R]{\thepage}
---

Minimal PDF Header Style (DEFAULT for Reports)

Use this pattern instead:

YAML frontmatter — omit title, author, date; suppress page number on page 1:

---
format:
  pdf:
    toc: false
    number-sections: false
    geometry: margin=1in
    fontsize: 11pt
    documentclass: article
    pdf-engine: lualatex
    include-before-body:
      text: |
        \thispagestyle{empty}
execute:
  echo: false
  warning: false
jupyter: python3
---

Inline header — put this immediately after the YAML block as the first content:

```{=latex}
\begin{minipage}[t]{0.65\textwidth}
  {\large\textbf{Document Title · Subtitle}}
\end{minipage}%
\begin{minipage}[t]{0.35\textwidth}
  \raggedleft{\small Josh Lane · Feb 2026}
\end{minipage}
\vspace{3pt}
\hrule
\vspace{10pt}
```

Section headings — replace all Markdown #/## headings with raw LaTeX:

```{=latex}
\noindent{\large\textbf{Section Title}}
\vspace{6pt}
```

Why this is better:

Single-line compact header — title and author/date on one row
No wasted vertical space from \maketitle
Section labels match body font size — no jarring size jumps
Looks like a professional memo/report, not an academic paper

HTML Themes

PREFER auto-dark with dual themes (see next section) over single-theme HTML:

---
format:
  html:
    theme:
      dark: darkly     # RECOMMENDED: Dual theme with auto-dark
      light: flatly
    css: custom.css
    toc: true
    toc-location: left
    code-fold: show
    code-tools: true
filters:
  - auto-dark          # Respects user's system preference
---

Single theme (discouraged - doesn't respect user preference):

---
format:
  html:
    theme: darkly      # Only use if auto-dark not available
    css: custom.css
    toc: true
    toc-location: left
---

Dark Mode with Auto-Dark Extension (RECOMMENDED)

Dark mode is STRONGLY ENCOURAGED for all HTML output:

Better accessibility and reduced eye strain
Modern user expectation (most systems default to dark mode)
Auto-dark extension respects user's system preference

Install auto-dark extension (one-time setup):

# In your project or home directory (applies to all projects)
quarto add gadenbuie/quarto-auto-dark --no-prompt

Use in document (RECOMMENDED default):

---
title: "My Analysis"
format:
  html:
    theme:
      dark: darkly      # Theme for dark mode (PRIMARY)
      light: flatly     # Theme for light mode (fallback)
filters:
  - auto-dark          # Automatically switch based on system preference
---

Best Practices:

✅ Always include auto-dark filter for HTML output
✅ Choose accessible dark theme (darkly, cyborg, slate)
✅ Test both dark and light modes if providing light fallback
⚠️ Single-theme HTML discouraged (use auto-dark instead)

Available dark themes:

darkly - Dark Bootstrap theme (RECOMMENDED - clean, professional)
cyborg - Dark blue theme (good for technical docs)
slate - Dark gray theme (subtle, minimal)
solar - Dark solarized theme (warm, comfortable)
superhero - Dark comic book theme (bold, high contrast)
vapor - Dark retro theme (stylized)

Light themes (fallback only):

flatly - Clean modern theme (recommended fallback)
cosmo - Friendly blue theme
lumen - Light gray theme
sandstone - Warm sandy theme
minty - Fresh mint theme
journal - Newspaper style

Table of Contents (Usually Noise - Avoid)

IMPORTANT: Table of contents (TOC) is usually unnecessary noise in Quarto documents.

Why Avoid TOC

Problems with TOC:

❌ Adds visual clutter without adding value
❌ Redundant when you have clear section headings
❌ Takes up space at the top of the document
❌ Users can navigate with Ctrl+F or scroll
❌ In HTML, browsers have Find function
❌ In PDF, readers have built-in navigation
❌ Makes documents feel like academic papers (overly formal)

Better alternatives:

✅ Use clear, descriptive section headings
✅ Keep documents focused and concise
✅ Use visual hierarchy (# ## ### headings)
✅ Add anchor links manually if needed
✅ Trust readers to navigate using browser/PDF tools

When TOC Might Be Acceptable

ONLY use TOC for:

Very long documents (>20 pages)
Books or comprehensive guides
Multi-chapter documents
When explicitly required by style guide

Even then, prefer:

Sidebar TOC (not top-of-page)
Collapsible TOC
Floating TOC that doesn't obscure content

How to Disable TOC

Don't include toc: true in YAML frontmatter:

# ❌ BAD: Unnecessary TOC
---
title: "My Analysis"
format:
  html:
    toc: true        # DON'T DO THIS
---

# ✅ GOOD: Clean document without TOC
---
title: "My Analysis"
format:
  html:
    theme:
      dark: darkly
      light: flatly
---

If you must use TOC, make it minimal:

format:
  html:
    toc: true
    toc-depth: 2           # Only top 2 levels
    toc-location: left     # Sidebar, not top
    toc-title: "Contents"  # Short title

PDF for Google Drive Sharing

PDF format is PREFERRED for sharing via Google Drive (read-only, professional appearance).

Rendering to PDF

# Basic PDF output
quarto render analysis.qmd --to pdf

# Multiple formats
quarto render analysis.qmd --to gfm,pdf,html

Upload to Google Drive

# 1. Render to PDF
quarto render analysis.qmd --to pdf

# 2. Upload to Google Drive
# - Go to drive.google.com
# - Click "New" → "File upload"
# - Select analysis.pdf
# - Share link with collaborators

# Alternative: Use gspace CLI (if installed)
gspace upload analysis.pdf --folder "Reports"

HTML Copy-Paste to Google Docs

Alternative workflow: Render to HTML with Google Docs-compatible CSS, then copy-paste.

This workflow is useful when:

You need inline tables rendered as actual tables (not images)
You want editable content in Google Docs (not read-only PDF)
You're doing iterative editing between Quarto and Google Docs

Google Docs CSS Setup

Location: ~/.files/quarto/styles/gdocs.css (symlinked to ~/.config/quarto/styles/)

The CSS matches Google Docs defaults:

Arial 11pt, line-height 1.15
Headers: 20pt (H1), 16pt (H2), 14pt (H3)
Tables: 1pt black borders, minimal padding (2pt 6pt), vertical-align middle
No extra whitespace in cells

Using the Google Docs CSS

Option 1: Reference from user config (RECOMMENDED)

---
title: "My Document"
format:
  html:
    css: ~/.config/quarto/styles/gdocs.css
    embed-resources: true
    minimal: true
---

Option 2: Copy CSS to project directory

# Copy CSS to project
mkdir -p .quarto/styles
cp ~/.files/quarto/styles/gdocs.css .quarto/styles/

# Use in document
# css: .quarto/styles/gdocs.css

Option 3: Inline in YAML

---
format:
  html:
    include-in-header:
      text: |
        <style>
        body { font-family: Arial, sans-serif; font-size: 11pt; line-height: 1.15; }
        table { border-collapse: collapse; margin: 12pt auto; }
        th, td { border: 1pt solid #000; padding: 2pt 6pt; vertical-align: middle; }
        th { background-color: #f3f3f3; font-weight: bold; }
        </style>
---

Rendering and Copy-Paste Workflow

# 1. Render to HTML with Google Docs CSS
quarto render analysis.qmd --to html

# 2. Open in browser
open analysis.html

# 3. Select all and copy (Cmd+A, Cmd+C)

# 4. Paste into Google Docs (Cmd+V)

CSS Reference

Key CSS properties for Google Docs compatibility:

/* Body - matches Google Docs defaults */
body {
  font-family: Arial, sans-serif;
  font-size: 11pt;
  line-height: 1.15;
}

/* Tables - critical for clean copy-paste */
table {
  border-collapse: collapse;
  margin: 12pt auto;
  page-break-inside: avoid;
}

th, td {
  border: 1pt solid #000;
  padding: 2pt 6pt;    /* Minimal padding for compact tables */
  vertical-align: middle;
  line-height: 1;
}

/* Remove spacing from cell content */
th *, td * {
  margin: 0 !important;
  padding: 0 !important;
  line-height: 1 !important;
}

th {
  background-color: #f3f3f3;
  font-weight: bold;
}

When to Use HTML Copy-Paste vs PDF

Use HTML copy-paste when:

Tables must be editable in Google Docs
You need precise formatting control
Iterating between Quarto and Google Docs
Complex layouts with multiple tables

Use PDF upload when:

Read-only sharing is acceptable
Professional appearance is priority
Sharing with external stakeholders
Archival or distribution

Troubleshooting HTML Copy-Paste

Extra whitespace in tables:

Ensure CSS has line-height: 1 on cells
Use padding: 2pt 6pt for minimal padding
Add vertical-align: middle to prevent vertical gaps

Fonts not matching:

Use font-family: Arial, sans-serif (Google Docs default)
Avoid web fonts that won't copy

Tables not copying correctly:

Check border-collapse: collapse
Ensure tables have explicit borders (border: 1pt solid #000)
Use embed-resources: true in YAML

Charts/images not copying:

Charts copy as images (expected)
Use embed-resources: true to inline images
May need to re-insert images manually

Presentations

RevealJS Slides (HTML)

---
title: "Quarterly Review"
format: revealjs
---

## Slide 1

Content here

## Slide 2

```{python}
import matplotlib.pyplot as plt
# Code executes, output shown

{background-image="image.jpg"}

Slide with background image


**Render:**
```bash
quarto render slides.qmd --to revealjs

Features:

Live code execution
Animations and transitions
Speaker notes
Vertical slides
Works in any browser

PowerPoint

---
title: "Report"
format: pptx
---

## Slide Title

- Bullet point
- Another point

## Chart Slide

```{python}
# Chart code


**Render:**
```bash
quarto render slides.qmd --to pptx

Cross-References and Citations

Figures

See @fig-sales for the trend.

```{python}
#| label: fig-sales
#| fig-cap: "Sales over time"

plt.plot(df['date'], df['sales'])
plt.show()
```

Tables

As shown in @tbl-summary:

```{python}
#| label: tbl-summary
#| tbl-cap: "Summary statistics"

df.describe()
```

Sections

## Introduction {#sec-intro}

Content here.

## Analysis

As discussed in @sec-intro...

Citations

---
bibliography: references.bib
---

According to @smith2020, the results show...

Multiple citations [@smith2020; @jones2021].

BibTeX file (references.bib):

@article{smith2020,
  title={Analysis of Data},
  author={Smith, John},
  journal={Journal of Science},
  year={2020}
}

Publishing Workflows

GitHub Pages

# First time setup
quarto publish gh-pages

# Subsequent updates
quarto publish gh-pages

Quarto Pub

# Publish to free Quarto hosting
quarto publish quarto-pub

Netlify

quarto publish netlify

Unix Philosophy Alignment

Markdown-First Workflow (PREFERRED):

# DEFAULT: Render to markdown for composability
quarto render analysis.qmd --to gfm         # GitHub-flavored markdown
quarto render analysis.qmd --to md          # Plain markdown

# Pipe markdown output to other tools
quarto render analysis.qmd --to md --output - | \
  grep "^##" | \
  sed 's/^## //' > toc.txt

# Generate markdown, then convert to PDF only if needed
quarto render analysis.qmd --to gfm
quarto render analysis.qmd --to pdf         # Optional

Why Markdown Default:

Text-based: Human-readable, diffable, greppable
Composable: Pipe to other tools (grep, sed, pandoc, etc.)
Portable: Works everywhere (GitHub, wikis, editors)
Archival: Plain text survives format changes
Fast: No heavy rendering (PDF/HTML) unless needed

Quarto as a Pipeline Component:

# Composition pattern: process → transform → render → markdown
conform notes.txt --schema schema.json | \
  jq '.items[]' | \
  python generate_qmd.py > report.qmd && \
  quarto render report.qmd --to gfm

# Markdown → Post-processing
quarto render analysis.qmd --to gfm && \
  sed -i 's/TODO/DONE/g' analysis.md

Do One Thing Well:

Quarto: Document rendering + format conversion
NOT: Data analysis, interactive exploration, storage
Delegate: Python for analysis, Quarto for rendering
Prefer markdown output for downstream composition

Text Streams:

# Quarto accepts stdin
cat document.md | quarto render - --to gfm --output results.md

# Pipe through processing
cat data.json | \
  jq -r '.[] | "- \(.item)"' | \
  quarto render /dev/stdin --to gfm --output list.md

Silent Success:

# Quarto is quiet on success
quarto render doc.qmd --to gfm
echo $?  # 0 = success

# Use --quiet for scripts
quarto render doc.qmd --to gfm --quiet

Common Patterns

Pattern 1: Markdown-First (RECOMMENDED)

# Default: Generate markdown with executed results
quarto render analysis.qmd --to gfm

# Only create PDF/HTML when specifically needed for distribution
quarto render analysis.qmd --to gfm,pdf      # Markdown + PDF
quarto render analysis.qmd --to gfm,html     # Markdown + HTML

# Markdown for archival, PDF for sharing
quarto render report.qmd --to gfm
quarto render report.qmd --to pdf            # Only when needed

Pattern 2: Single Source, Multiple Formats

# Render all formats (markdown first)
quarto render report.qmd --to gfm,pdf,html

# Or explicitly
quarto render report.qmd --to gfm            # Primary output
quarto render report.qmd --to pdf            # For Google Drive sharing
quarto render report.qmd --to html           # For web viewing

Pattern 3: Batch Processing

# Render all .qmd files to markdown
for file in *.qmd; do
  quarto render "$file" --to gfm
done

# Or use find
find . -name "*.qmd" -exec quarto render {} --to gfm \;

# Parallel processing with xargs
find . -name "*.qmd" | xargs -P 4 -I {} quarto render {} --to gfm

Pattern 3: Custom Templates

# Use custom template
quarto render doc.qmd --template custom-template.tex

Pattern 4: Parameterized Reports

---
title: "Monthly Report"
format: gfm
params:
  month: "January"
  year: 2024
---

## Report for `{python} params['month']` `{python} params['year']`

```{python}
month = params['month']
year = params['year']
# Analysis using params


**Render with parameters:**
```bash
# Markdown output with parameters
quarto render report.qmd -P month:February -P year:2024 --to gfm

# Generate markdown for all months
for month in Jan Feb Mar Apr; do
  quarto render report.qmd -P month:$month --to gfm --output "${month}_report.md"
done

Pattern 5: Data Pipeline Reports

# Generate analysis data
python analysis.py > data.json

# Create Quarto report with embedded data loading
cat > report.qmd << 'EOF'
---
title: "Analysis Report"
execute:
  cache: true
---

```{python}
#| cache: true
import subprocess
import json

# Reproducible: document defines how data is generated
result = subprocess.run(['python', 'analysis.py'], capture_output=True, text=True, check=True)
data = json.loads(result.stdout)
# Render findings

EOF

quarto render report.qmd --to gfm


## Best Practices

### 1. Choose the Right Input Format

**Use `.qmd` when:**
- Starting new analysis
- Need maximum Quarto features
- Want native integration

**Use `.ipynb` when:**
- Already have Jupyter notebooks
- Collaborating with Jupyter users
- Need Jupyter-specific features

### 2. Organize Code Blocks

```qmd
## Good: Logical chunks

```{python}
# Load data
import pandas as pd
df = pd.read_csv("data.csv")

# Analyze
total = df['sales'].sum()

Bad: Everything in one block

import pandas as pd
df = pd.read_csv("data.csv")
total = df['sales'].sum()
avg = df['sales'].mean()
# ... 50 more lines


### 3. Use Frontmatter for Configuration

```yaml
# Good: Centralized config
---
format:
  pdf:
    toc: true
    number-sections: true
  html:
    code-fold: true
---

# Bad: Repeating options in CLI
# quarto render doc.qmd --to pdf --toc --number-sections

4. Cache Expensive Computations

```{python}
#| cache: true

# Expensive calculation cached
result = expensive_analysis(large_dataset)


### 5. Version Control

```gitignore
# .gitignore for Quarto projects
_site/
_book/
*.html
*.pdf
.quarto/

Commit:

✅ .qmd source files
✅ _quarto.yml config
✅ Data files (if small)
✅ references.bib
❌ Rendered outputs (regenerate)
❌ .quarto/ cache directory

Troubleshooting

PDF Generation Fails

Problem: LaTeX errors when rendering PDF

Solution 1: Install TinyTeX

quarto install tinytex

Solution 2: Use Typst (modern alternative)

---
format:
  typst: default
---

Solution 3: Use Chrome headless

---
format:
  pdf:
    pdf-engine: chrome
---

Python Code Not Executing

Problem: Code blocks don't run

Check:

Python is installed (python3 --version)
Required packages installed (pip install pandas matplotlib)
No syntax errors in code blocks

Debug:

quarto render doc.qmd --execute-debug

Jupyter Kernel Not Found

Problem: Can't render .ipynb files

Solution:

# Install Jupyter
python3 -m pip install jupyter

# Or use uv
uv tool install jupyter

Quick Reference

Common Commands

quarto render doc.qmd                  # Render with default format
quarto render doc.qmd --to pdf        # Render to PDF
quarto render doc.qmd --to html       # Render to HTML
quarto preview doc.qmd                # Live preview
quarto create project website site   # Create website project
quarto publish gh-pages               # Publish to GitHub Pages
quarto install tinytex                # Install LaTeX
quarto check                          # Verify installation

Output Format Options

--to gfm            # GitHub-flavored markdown (default)
--to pdf            # PDF via LaTeX or typst
--to html           # HTML
--to revealjs       # HTML slides
--to pptx           # PowerPoint
--to typst          # Typst (modern LaTeX alternative)
--to epub           # eBook

Common YAML Frontmatter

---
title: "Document Title"
author: "Josh Lane"
date: "2024-01-30"
format:
  pdf:
    toc: true               # Table of contents
    number-sections: true   # Numbered sections
    geometry: margin=1in    # Page margins
  html:
    toc: true
    code-fold: true         # Collapsible code
    theme: cosmo            # Visual theme
---

Resources

Official docs: https://quarto.org
Gallery: https://quarto.org/docs/gallery/
Guide: https://quarto.org/docs/guide/
Extensions: https://quarto.org/docs/extensions/
Publishing: https://quarto.org/docs/publishing/

Document & Report Writing Principles

Data Sources Stay in Code, Not Prose

When writing Quarto documents, strategy memos, or any analytical report:

Never name data sources in prose — table names (all_customers_revenue_final), field names (salesforce_account_id, recognized_revenue_c), database names (ep-core-data), query tools (BigQuery, Prophet), or internal implementation details belong in code cells only
Describe the data, not where it came from — write "actual revenue from those accounts" not "revenue from all_customers_revenue_final joined via salesforce_account_id"
Describe the method, not the tool — write "trend model with yearly seasonality" not "Prophet with changepoint_prior_scale"
The code is the documentation — readers who need provenance can read the code cells; prose is for interpretation and insight

Adoption

lanej/quarto

$ install --global

Security Scan Results

SKILL.md

Quarto Skill

Bootstrap with epq scaffold (PREFERRED for new projects)

External Python Figures Pattern (PREFERRED for complex documents)

Architecture

Figure Module Contract

QMD Stub Cell

Justfile (generated by epq scaffold)

Figure Audit and Visual Iteration Protocol

Key Rules

PDF Project Bootstrap

PDF Title Suppression

PDF Figure Rules

Split multi-story panels

Legend placement

Date axes — always explicit

Never auto-open artifacts

LaTeX Max-Runs Warning

BigQuery + Pandas Gotchas (in Quarto Documents)

Best Practices (TL;DR)

PDF Figure Sizing — Critical Patterns

The Working Pattern (copy verbatim)

rcParams That Must Not Be Changed

Root Causes of "Comically Small" Figures

Diagnosing Figure Size Issues

The Nuclear Option

Caption Numbering

Reference Lines (axvline/axhline) Opacity

Visual Expression Philosophy

Visual Hierarchy (Use in Order)

Anti-Patterns: What NOT to Do

Why Visual Expression Matters

Narrative Structure

Document Flow

Base Facts (Individual Sections)

Keeping Content Together (PDF)

Markdown Formatting Rules (CRITICAL)

Line Breaks and Paragraphs

Common Markdown Patterns

Quarto-Specific Markdown

Best Practices

Testing Markdown Formatting

LLM Self-Reasoning with Quarto

Document Structure

Visual Evidence

Example Invocation

When to Use Quarto

Data Provenance and Portability

The Portability Contract

Anti-Pattern: Opaque File References

Good Pattern: Embed Data Extraction

Heavy Dependencies: Document, Don't Rebuild

Caching for Iteration Speed

Freeze for Project-Level Caching

Cache-Read-or-Query Pattern (BigQuery / Expensive APIs)

Complete Example: Reproducible Analysis

Analysis

Neovim Integration

Basic Usage

Quick Start: Native .qmd Files

Overview

Sales Trend Analysis

Top Products

Statistical Analysis

Conclusion

Install auto-dark extension (one-time, if not already installed)

Render to markdown (DEFAULT - composable, archival)

Optional: Render to HTML with dark mode for web viewing

Optional: Render to other formats only when needed

Python Code Execution

Code Blocks

Code Block Options

Inline Python Expressions

Output Formatting

Table Formatting (CRITICAL)

Option 1: Great Tables (RECOMMENDED for rich formatting)

Bootstrap with `epq scaffold` (PREFERRED for new projects)

Justfile (generated by `epq scaffold`)

Option 2: pandas `.to_markdown()` (Simple, built-in)