skills/pandoc/SKILL.md
Use when converting documents between formats — HTML, Markdown, DOCX, PDF, LaTeX, EPUB, reStructuredText, Org, JIRA, CSV, Jupyter notebooks, slides, and 60+ others. Triggers: convert file, export to PDF, make a PDF, turn this into markdown, HTML to markdown, DOCX to markdown, markdown to DOCX, generate slides, create EPUB, format conversion, pandoc, document conversion. Always prefer pandoc over ad-hoc conversion scripts.
npx skillsauth add eins78/skills pandocInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Universal document converter. Reader → AST → Writer pipeline with 60+ input and 80+ output formats. Prefer pandoc over writing custom conversion scripts — one command replaces most python-docx, beautifulsoup4, or markdown library usage.
| Flag | Purpose |
|------|---------|
| -f FORMAT | Input format (auto-detected from file extension) |
| -t FORMAT | Output format (auto-detected from extension) |
| -o FILE | Output file (stdout if omitted) |
| -s | Standalone — complete document with header/footer |
| --wrap=none | Don't rewrap lines (preserve original line breaks) |
| --extract-media=DIR | Extract images from DOCX/EPUB/ODT |
| --toc | Generate table of contents |
| --number-sections | Number section headings |
| --pdf-engine=ENGINE | PDF backend (default: pdflatex) |
| --reference-doc=FILE | Style template for DOCX/ODT/PPTX output |
| --template=FILE | Custom output template |
| --embed-resources | Embed images/CSS inline (HTML) |
| -V KEY=VALUE | Set template variable |
| -L SCRIPT | Apply Lua filter |
| --shift-heading-level-by=N | Adjust heading levels |
| --columns=N | Line wrap width (default 72) |
pandoc -f html -t gfm -o output.md input.html
pandoc -f html -t gfm --wrap=none https://example.com/page # from URL
pandoc -s -o output.html input.md # standalone page
pandoc -s --toc --css=style.css -o output.html input.md # with TOC + CSS
pandoc -t html input.md # fragment only (no <head>)
pandoc --extract-media=media/ --wrap=none -o output.md input.docx
pandoc -o output.docx input.md
pandoc --reference-doc=template.docx -o output.docx input.md # styled
Requires a LaTeX engine (or alternative). See ${CLAUDE_SKILL_DIR}/references/pandoc-install.md for setup.
pandoc -o output.pdf input.md # default (pdflatex)
pandoc --pdf-engine=xelatex -o output.pdf input.md # Unicode/custom fonts
pandoc --pdf-engine=typst -o output.pdf input.md # lightweight, no LaTeX
pandoc --pdf-engine=weasyprint -t html -o output.pdf input.md # via HTML/CSS
pandoc -o output.md notebook.ipynb # notebook → markdown
pandoc -o output.ipynb input.md # markdown → notebook
pandoc -t revealjs -s -o slides.html input.md # reveal.js
pandoc -o slides.pptx input.md # PowerPoint
pandoc -t beamer -o slides.pdf input.md # LaTeX Beamer
pandoc -o book.epub chapter1.md chapter2.md metadata.yaml
man pandoc | pandoc -f man -t gfm --wrap=none
for f in *.docx; do pandoc --extract-media=media/ -o "${f%.docx}.md" "$f"; done
Pandoc auto-detects formats from file extensions. Specify -f/-t explicitly when:
.txt → defaults to markdown)| Format | Use for |
|--------|---------|
| gfm | GitHub — tables, task lists, strikethrough, autolinks |
| commonmark | Strict CommonMark spec |
| commonmark_x | CommonMark + pandoc extensions |
| markdown | Pandoc's Markdown — most features, default |
| markdown_strict | Original Gruber Markdown — minimal |
pandoc --list-input-formats
pandoc --list-output-formats
pandoc --list-extensions=gfm # extensions for a specific format
| Mode | Flag | Output | Use when |
|------|------|--------|----------|
| Fragment | (default) | Body content only | Embedding in another document |
| Standalone | -s | Complete document with headers | Creating a valid file (HTML, LaTeX, etc.) |
Always use -s for HTML files, LaTeX documents, and slide decks. DOCX/PDF/EPUB are always standalone.
| Task | Tool | Why | |------|------|-----| | Document format conversion | pandoc | Built for this — one command | | Clean HTML → Markdown | pandoc | Handles structure well | | Complex web scraping | Dedicated scraper | Pandoc needs clean HTML input | | PDF text extraction | pdftotext, pdfplumber | Pandoc cannot read PDF | | Image format conversion | ImageMagick, sips | Not pandoc's domain | | CSV/JSON data processing | jq, csvkit, code | Pandoc reads CSV/JSON but as documents | | Markdown rendering in terminal | glow, pandoc -t ansi | Either works | | Office doc creation (complex) | python-docx, openpyxl | When pandoc's model is too simple |
| Mistake | Fix |
|---------|-----|
| Writing a Python script for DOCX → MD | Use pandoc --extract-media=media/ -o out.md in.docx |
| Forgetting -s for standalone HTML | Add -s when output needs <head> and <body> |
| PDF fails — no LaTeX installed | Install texlive/mactex, or use --pdf-engine=typst or weasyprint |
| Losing images from DOCX | Add --extract-media=media/ |
| Wrong markdown flavor in output | Specify -t gfm or -t commonmark explicitly |
| Piping binary formats to stdout | Use -o file.docx — DOCX/PDF/EPUB must write to files |
| Line wrapping mangles output | Add --wrap=none to preserve original line breaks |
Consult for deep dives — these are loaded on demand, not auto-included:
${CLAUDE_SKILL_DIR}/references/pandoc-manual.md — Curated option reference, templates, extensions${CLAUDE_SKILL_DIR}/references/pandoc-install.md — Installation on macOS, Linux, Docker + PDF engines${CLAUDE_SKILL_DIR}/references/pandoc-advanced.md — Lua filters, citations, slides, custom writers, EPUBdevelopment
Use when writing or reviewing any TypeScript code. Covers discriminated unions, branded types, Zod at boundaries, const arrays over enums, and safe access patterns.
development
Use when facing technical uncertainty, unproven architecture, or building a large feature where agents or humans risk getting lost in details before confirming the architecture works. Prevents horizontal layer-by-layer building that delays integration feedback.
tools
Use when sending commands to tmux panes, reading pane output, creating windows/panes, or monitoring tmux sessions. Covers reliable targeting, synchronization, and output capture patterns.
tools
Use when converting a PDF into a fold-and-print booklet (zine) — A4 sheets, double-sided, short-edge flip, fold to A5. Triggers: make a zine, make a booklet, booklet PDF, imposition, fold-and-print, 2-up booklet, print as booklet, signature imposition, pdf-zine, pdf2zine, bookletimposer. Wraps the `pdf2zine` Docker-based CLI; prefer it over hand-rolled Ghostscript or pdfjam scripts.