aops-tools/skills/convert-to-md/SKILL.md
Batch convert documents (DOCX, PDF, XLSX, TXT, PPTX, MSG, DOC) to markdown, preserving tracked changes and comments.
npx skillsauth add nicsuzor/academicops convert-to-mdInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Taxonomy note: This skill provides domain expertise (HOW) for batch document conversion to markdown. See [[aops-core/skills/remember/references/TAXONOMY.md]] for the skill/workflow distinction.
Batch convert documents to markdown format, preserving tracked changes, comments, and other markup.
/convert-to-md [directory]
| Format | Method | Notes |
| ------ | ---------------------------- | ------------------------------------ |
| DOCX | pandoc --track-changes=all | Preserves comments & tracked changes |
| PDF | PyMuPDF | Text extraction |
| XLSX | pandas | Converts to markdown tables |
| TXT | rename | Direct rename to .md |
| PPTX | pandoc | Slide content to markdown |
| MSG | extract-msg | Email metadata + body |
| DOC | textutil | macOS native (fallback) |
| DOTX | pandoc | Word templates |
Install dependencies (if needed):
uv add pymupdf pandas openpyxl tabulate extract-msg
Convert DOCX (preserves comments/edits):
for f in *.docx; do
pandoc --track-changes=all -f docx -t markdown -o "${f%.docx}.md" "$f" && rm "$f"
done
Convert PDF:
import fitz
from pathlib import Path
for pdf in Path(".").glob("*.pdf"):
doc = fitz.open(pdf)
text = "\n\n".join(page.get_text() for page in doc)
pdf.with_suffix(".md").write_text(text.strip())
pdf.unlink()
Convert XLSX to tables:
import pandas as pd
for xlsx in Path(".").glob("*.xlsx"):
xls = pd.ExcelFile(xlsx)
content = f"# {xlsx.stem}\n\n"
for sheet in xls.sheet_names:
df = pd.read_excel(xlsx, sheet_name=sheet)
content += f"## {sheet}\n\n{df.to_markdown(index=False)}\n\n"
xlsx.with_suffix(".md").write_text(content)
xlsx.unlink()
Convert TXT: for f in *.txt; do mv "$f" "${f%.txt}.md"; done
Convert MSG:
import extract_msg
msg = extract_msg.Message("file.msg")
content = f"# {msg.subject}\n\n**From:** {msg.sender}\n**Date:** {msg.date}\n\n{msg.body}"
Clean up: Remove *:Zone.Identifier files (Windows metadata)
pandoc (system): DOCX, PPTX, DOTX conversiontextutil (macOS): DOC fallbackpymupdf (Python): PDF text extractionpandas, openpyxl, tabulate (Python): XLSX tablesextract-msg (Python): Outlook MSG filestools
Streamlit implementation of the analyst presentation layer. Use when building or updating a Streamlit dashboard that displays pre-computed research data. This is the Streamlit-specific HOW for the tech-agnostic principles in the aops-tools analyst skill — display only, never transform.
tools
Python plotting and statistical-modelling libraries (matplotlib, seaborn, statsmodels) for the analyst presentation and statistical-methodology layers. Use when producing publication-quality figures or fitting statistical models in Python. Library-specific HOW for the tech-agnostic principles in the aops-tools analyst skill.
tools
dbt (data build tool) implementation of the analyst transformation layer. Use when a project has a dbt/ directory or you need to build, test, or document SQL transformations as version-controlled, reproducible dbt models. This is the dbt-specific HOW for the tech-agnostic principles in the aops-tools analyst skill.
development
Core academicOps skill — institutional memory, strategic coordination, workflow routing, and framework governance. Merges butler (chief-of-staff) with framework development conventions.