skills/rdc-convert/SKILL.md
Convert Office documents to/from Markdown with the build-corpus CLI: .docx/.pptx/.ppt → Markdown (Word OMML equations become KaTeX-readable TeX; tables, images, headings preserved), and Markdown → Word (.docx) where inline $...$ and display $$...$$ LaTeX become NATIVE Office Math (OMML) that Word renders as real equations. Use this skill whenever the user asks to convert a Word/PowerPoint document to Markdown, build a Markdown corpus from Office files, turn Markdown into a .docx (optionally with a .dotx template), or "open the report" to edit. Install build-corpus straight from GitHub and run it in the session.
npx skillsauth add LIFEAI/rdc-skills rdc-convertInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
build-corpus is a Python CLI that converts between Office documents and Markdown.
This is a self-contained skill: install the tool from GitHub into the current
session and run build-corpus. No local checkout or other rdc skill is required.
.docx / .pptx / .ppt → Markdown — preserves Word OMML equations (as
KaTeX-readable TeX), tables, images, headings, and lists..docx) — inline $...$ and display $$...$$ LaTeX are
converted to native OMML Word renders as real equations; optional .dotx
template via --word-template.The published PyPI/npm packages lag GitHub. Install from the repository to get the current behavior (native LaTeX→OMML, the fidelity report, the escaped-currency fix):
pip install "git+https://github.com/LIFEAI/build-corpus.git@feat/dual-package-ubuntu"
# once merged to main, drop the @branch:
# pip install "git+https://github.com/LIFEAI/build-corpus.git"
build-corpus --help
This installs the build-corpus command and its dependencies (latex2mathml,
mathml2omml, python-docx, Pillow, omml2latex). Notes:
--break-system-packages,
use a venv, or pipx install "git+https://github.com/LIFEAI/build-corpus.git@feat/dual-package-ubuntu".[s3] to the package spec..ppt input also needs LibreOffice (soffice) on PATH (apt install libreoffice).
.docx/.pptx need nothing extra.build-corpus <input> [input ...] [options]
<input> — one or more .md, .docx, .pptx, or .ppt files or directories.
| Flag | Values / default | Effect |
|------|------------------|--------|
| --out <dir> | path | Output directory for the converted tree. |
| --out-same-dir | — | Write .md, .assets, and reports beside each source file. |
| --to | auto | markdown | word (default auto) | Output target. auto infers from a single-file input. |
| --images | assets | base64 | s3 (default assets) | Image handling. |
| --equations | tex | image (default tex) | docx→md: OMML → KaTeX TeX, or rendered images (debug). |
| --inline-images | — | Emit <name>.inline.md with images embedded as data URIs. |
| --word-template <file> | .docx/.dotx | Template for Markdown → Word exports. |
| --move-sources | — | After a successful convert, move sources into a sources/ folder. |
| --config <file> | JSON | Conversion/output/S3 defaults (CLI flags override). |
S3/R2 (only with --images s3): --s3-bucket, --s3-public-base-url,
--s3-prefix, --s3-endpoint-url (required for R2), --s3-region (auto for R2),
--s3-access-key-id, --s3-secret-access-key, --s3-cache-control, --s3-acl.
--equations): tex converts Word OMML equations to
KaTeX-readable TeX (default); image renders them as images (debug only).--to word): inline $...$ and display $$...$$ LaTeX are
converted to native OMML (\sum, \int, \frac, \Delta, \rightarrow,
\leq, …). Escaped currency like \$252.3B is kept as literal text, never mistaken
for math. Unparseable fragments fall back to Cambria-Math text and are flagged in the
report. Fence display math with $$ on their own lines, no blank lines inside.Each md→word export writes export-report.json (and a batch report) so you can
confirm nothing was silently dropped or changed:
fidelity_ok — top-level ship gate (true only when every row matches and no
equation fell back).reconciliation — input vs output per type (tables, equations
{in/out_omml/fell_back}, images {in/out/failed}, code_blocks, headings, links).issues — { type, line, source|target, reason } per problem.text_fixups — markdown escapes the engine resolved (e.g. currency_unescaped).Image-failure reason values: missing-file, unsupported-on-platform (EMF/WMF —
install LibreOffice), unsupported-format (.html/.jsx — route to a render pipeline),
svg-needs-rasterization / mislabeled-svg (rasterize the SVG to PNG and repoint),
skipped-remote.
build-corpus input.docx --out out # docx → markdown
build-corpus deck.pptx --out out # pptx → markdown
build-corpus ./word-files --out ./markdown # whole folder, recursive
build-corpus ./word-files --out-same-dir # write beside each source
build-corpus input.md --to word --out out # markdown → Word (LaTeX → OMML)
build-corpus input.md --to word --word-template custom.dotx --out out
build-corpus report.md --inline-images # → report.inline.md
build-corpus input.docx --images s3 --config build-corpus.config.json
--images s3).--move-sources is passed.github.com/LIFEAI/build-corpusbuild-corpus, npm regen-mde.development
Read recent enhancement-log entries, cluster failures by pattern, generate candidate verifier rules, test them against the known-good corpus and the failure corpus, and propose pull requests adding the highest-confidence rules to forbidden-patterns.json. Use this skill on a nightly cadence (3 AM PT), or manually when the user says "extract verifier rules", "promote enhancement log", "what new rules should we add", or after a significant brochure run produced many failures.
testing
Orchestrate a Brochurify job from source ingest through delivered PDF, using six parallel-dispatched typed sub-agents and the convergence loop. Use this skill EVERY TIME the user invokes Brochurify directly via "brochurify this", "make a brochure from", "convert this to a brochure PDF", or "rdc:brochurify". Also runs automatically when a job arrives from the broker via monkey_dispatch. The skill enforces D-001 through D-016 from the brochurify DECISIONS-LOG.
devops
The mandatory contract for authoring brochure JSX using @lifeai/brochure-kit. Use this skill EVERY TIME any AI engine (Claude, Cursor, Copilot, /design, Cowork, v0) generates JSX intended for the Brochurify pipeline — whether the user says "write a brochure," "make a one-pager," "draft a PDF report," or any equivalent. Also trigger when a file imports from @lifeai/brochure-kit. Failing to read this skill before authoring is a defect.
testing
Usage `rdc:housekeeping [--fix]` — Weekly maintenance audit: directory structure verification, PUBLISH.md URL validation, CLAUDE.md freshness, orphan detection, places compliance, and stale version scan. Produces `.rdc/reports/YYYY-MM-DD-housekeeping.md`. With `--fix`, auto-remediate safe issues.