docx/SKILL.md
Word DOCX create, read, edit, review. Triggers: Word doc, .docx, reports, memos, letters, templates.
npx skillsauth add lidge-jun/cli-jaw-skills docxInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for any .docx task: create, read, edit, review, template-fill, or QA verification.
Triggers: "Word doc", ".docx", reports, memos, letters, templates.
Primary tool: officecli (officecli on PATH) for ~80% of tasks (add/set/remove, validate, query, track-change accept/reject).
Fallback: Python OOXML scripts (scripts/*.py) for what officecli cannot do — tracked-change creation, OMML equations, bulk pattern matching, unpack/edit/repack. See §3.
DOCX only. Do NOT use this skill for PDFs, spreadsheets, HWPX, Google Docs, or any other format.
OfficeCLI discovery rule: before guessing paths, element names, or properties, ask the installed CLI. Use officecli --help for workflow entry points and officecli help docx ... --json for machine-readable schema.
Same-file execution rule: run OfficeCLI commands against the same .docx sequentially. Do not run officecli view, officecli validate, officecli query, or officecli get in parallel against one package. If a file lock occurs, stop and report the exact command and path before making a copy or retrying.
| Task | Tool | Command pattern | Notes |
|------|------|-----------------|-------|
| Format like existing doc | shell + officecli | cp source.docx target.docx && officecli open target.docx | Inherit styles/headers/footers. See §2. |
| Create blank DOCX | officecli | officecli create report.docx | Start from real Office file |
| Add paragraph | officecli | officecli add FILE /body --type paragraph --prop text="..." | Primary write path |
| Edit paragraph/run | officecli | officecli set FILE /body/p[N] --prop ... | Exact path targeting |
| Read text/outline/stats | officecli | officecli view FILE text | Modes: text, annotated, outline, stats, issues, html |
| Query document | officecli | officecli query FILE "p[style=Heading1]" | CSS-like selectors |
| Template-safe replacement | officecli | officecli set FILE / --prop find="{{X}}" --prop replace="Y" | Preserves template structure |
| Validation / issue scan | officecli | officecli validate FILE | Pair with view FILE issues |
| Accept/reject tracked changes | officecli | officecli set FILE / --prop accept-changes=all | Also reject-changes=all |
| CREATE tracked changes | Python (L3/L4) | scripts/docx_cli.py + ooxml/redline_diff.py | officecli cannot create — only accept/reject |
| OMML equations | Python (L4) | Unpack → inject <m:oMath> → repack | officecli cannot generate OMML |
| Complex anchored comments | Python (L3) | python3 scripts/comment.py IN OUT --text "..." --anchor "..." | For comments beyond officecli |
| PDF conversion / visual QA | soffice | soffice --headless --convert-to pdf FILE | Screenshot-based QA |
| Edit existing document | -- | Read editing.md | Detailed editing guides |
| Create from scratch | -- | Read creating.md | Detailed creation recipes |
When the user says "format like X.docx", "match existing style", "based on template", or provides a source file — start from the source file. Don't rebuild from scratch.
cp source.docx target.docx — inherits all styles, margins, numbering, headers, footersofficecli open target.docx — daemon starts; command returns immediately (do NOT run as run_in_background shell)/styles, /numbering, /header, /footer--prop style=Heading1) — they auto-applyPandoc-generated and Word-generated documents have specific style IDs (e.g. Heading1, BodyText, FirstParagraph) unique to that document. Adding a new style with the same name causes:
officecli validate errors about duplicate style IDstests/fixtures/*.docx — pre-built working examples shipped with this skillofficecli create blank — only when nothing else applies# CORRECT: inherit Pandoc styles
cp Assignment1.docx Assignment2.docx
officecli open Assignment2.docx
officecli remove Assignment2.docx "/body/p[1]" # remove old body (keep styles/headers/footers)
# ... remove more paragraphs ...
officecli add Assignment2.docx /body --type paragraph --prop text="New title" --prop style=Heading1
officecli close Assignment2.docx
# WRONG: recreate styles that already exist — validate fails
officecli add doc.docx /styles --type style --prop name=Heading1 --prop size=20pt ...
officecli covers most DOCX tasks. For the rest, use these reference docs + Python scripts.
references/)| File | Read when | Contains |
|------|-----------|----------|
| references/cjk-handling.md | Korean text / East Asian font / wrapping issues | rFonts East Asian fonts, lang tags, accessibility |
| references/tracked-changes.md | Track changes / comments / redline work | w:ins, w:del, comments XML, script usage examples |
| references/docx-js-api.md | DEPRECATED — npm docx library API, not applicable to officecli ecosystem | Ignore — kept for historical reference only |
scripts/) — Python OOXML Toolkit| Script | Run when | Command |
|--------|----------|---------|
| scripts/docx_cli.py | Unified Python CLI — unpack, save, validate, repair, search, TOC, chunk, comment, accept-changes, merge-runs | python3 scripts/docx_cli.py {open\|save\|validate\|repair\|text\|search\|toc\|chunk\|comment\|accept-changes\|merge-runs} |
| scripts/accept_changes.py | Accept all tracked changes (alternative to officecli) | python3 scripts/accept_changes.py IN.docx OUT.docx |
| scripts/comment.py | Add W3C-compliant OOXML comments anchored to text | python3 scripts/comment.py IN.docx OUT.docx --text "..." --anchor "..." |
| scripts/ooxml/merge_runs.py | Merge adjacent runs with identical formatting (post-edit cleanup) | python3 scripts/ooxml/merge_runs.py unpacked/ |
| scripts/ooxml/redline_diff.py | Validate tracked-change correctness vs original | python3 scripts/ooxml/redline_diff.py unpacked/ original.docx |
| scripts/ooxml/simplify_tracked.py | Simplify same-author adjacent tracked changes | python3 scripts/ooxml/simplify_tracked.py unpacked/ |
When officecli can't do the job, escalate in this order:
| Level | When | Tool |
|-------|------|------|
| L1 officecli high-level | Typical add/set/remove operations | officecli add/set/remove/query/view |
| L2 officecli raw-set | XML injection — PAGE field, fldChar, hyperlink anchor, custom attributes | officecli raw-set FILE PATH --xpath X --action A --xml ... |
| L3 Python script | Bulk tracked-change ops, comment add, merge runs, redline validation | python3 scripts/*.py |
| L4 Unpack → edit XML → repack | OMML equations, custom style injection, pattern-match editing, anything L1-L3 can't reach | scripts/docx_cli.py open FILE work/ → edit work/word/*.xml → scripts/docx_cli.py save work/ OUT.docx |
Escalation signals:
<m:oMath> XML — officecli cannot generate these)docx_cli.py search/replace)references/*.md BEFORE giving upAdditional detail lives in companion files. Load only the one you need.
| Subskill | Path | When to use |
|----------|------|-------------|
| officecli-academic-paper | ./officecli-academic-paper/SKILL.md | Academic papers, citations, bibliography, TOC for papers |
| creating.md | ./creating.md | Detailed creation recipes (new documents from scratch) |
| editing.md | ./editing.md | Detailed editing guides (modify existing documents) |
Is the document an academic paper (thesis, journal, conference)?
YES --> read officecli-academic-paper/SKILL.md
NO --> continue with this file
User provided a source file to match?
YES --> §2 Reference-Based Editing + ./editing.md
NO --> ./creating.md
# Verify heading hierarchy
officecli view report.docx outline
Use professional, muted tones only:
| Purpose | Allowed colors | Hex examples |
|---------|---------------|--------------|
| Headings, emphasis | Navy | #003366, #1B2A4A |
| Body accents, borders | Charcoal | #333333, #4A4A4A |
| Highlights, callouts | Forest green | #2E5E3F, #1A4731 |
NEVER use rainbow colors, bright primary colors, or more than 3 accent colors in a single document.
| Script | Primary font | Fallback | |--------|-------------|----------| | Korean | Malgun Gothic | Pretendard | | English / Latin | Calibri | Aptos |
The CJK fork auto-applies East Asian fonts, but verify with:
officecli raw report.docx /document | grep rFonts
Choose a readable body font (Calibri, Cambria, Georgia, Times New Roman). Keep body at 11-12pt. Headings should step up: H1=18pt minimum (20pt preferred for long documents), H2=14pt bold, H3=12pt bold.
Use paragraph spacing (spaceBefore/spaceAfter) instead of empty paragraphs. Line spacing of 1.15x-1.5x for body text.
Always set margins explicitly. US Letter default: pageWidth=12240, pageHeight=15840, margins=1440 (1 inch).
TOC generation depends entirely on heading styles. Before inserting TOC:
officecli view FILE outline to verify hierarchy.Alternate row shading for readability. Header row with contrasting background. Consistent cell padding. Use color sparingly -- accent color for headings or table headers, not rainbow formatting.
| Content Type | Recommended Element(s) | Why |
|---|---|---|
| Sequential items | Bulleted list (listStyle=bullet) | Scanning is faster than inline commas |
| Step-by-step process | Numbered list (listStyle=numbered) | Numbers communicate order |
| Comparative data | Table with header row | Columns enable side-by-side comparison |
| Trend data | Embedded chart (chartType=line/column) | Visual pattern recognition |
| Key definition | Hanging indent paragraph | Offset term from definition |
| Legal/contract clause | Numbered list with bookmarks | Cross-referencing via bookmarks |
| Mathematical content | Equation element (formula=LaTeX) | Proper OMML rendering |
| Citation/reference | Footnote or endnote | Keeps body text clean |
| Pull quote / callout | Paragraph with border + shading | Visual distinction from body |
| Multi-section layout | Section breaks with columns | Column control per section |
After ANY DOCX creation or edit, ALWAYS execute both steps:
# Step 1: Structural validation
officecli validate output.docx
# Step 2: Visual PDF verification
soffice --headless --convert-to pdf --outdir /tmp output.docx
# Open/inspect PDF to confirm: formatting, tables, images, headers/footers
validate reports errors, fix them before delivering the file.# Required
python3 -c "import docx, lxml" || echo "MISSING: pip install python-docx lxml"
# On-demand: LibreOffice (auto-install when PDF conversion needed)
which soffice >/dev/null 2>&1 || echo "INFO: LibreOffice not installed — will auto-install when PDF conversion is needed"
# On-demand: OfficeCLI (install from forked repo when L1/L2 needed)
which officecli >/dev/null 2>&1 || echo "INFO: OfficeCLI not installed — install on-demand: bash \"\$(npm root -g)/cli-jaw/scripts/install-officecli.sh\""
Always confirm syntax from help before guessing:
officecli --help
officecli help docx
officecli help docx add
officecli help docx set
officecli help docx query
officecli help docx add paragraph --json
officecli help docx set run --json
Drill into a specific area:
officecli help docx add paragraph
officecli help docx add picture
officecli help docx set run
officecli help docx set style
officecli help all --jsonl | grep '"format":"docx"'
| Binary | Path | Notes |
|--------|------|-------|
| officecli | officecli (PATH) | Global install includes CJK fork with CjkHelper.cs -- auto-applies fonts/lang tags |
Run commands one at a time. Do not write all commands into a shell script and execute it as a single block.
OfficeCLI is incremental: every add, set, and remove immediately modifies the file and returns output.
get or validate before building on top of it.officecli view doc.docx text # Full text extraction
officecli view doc.docx text --max-lines 200 # Truncated extraction
officecli view doc.docx text --start 1 --end 50 # Range extraction
officecli view doc.docx outline # Structure: stats, headings, headers/footers
officecli view doc.docx annotated # Style/font/size per run, equations as LaTeX
officecli view doc.docx stats # Paragraph count, style/font distribution
officecli get doc.docx / # Document root (metadata, page setup)
officecli get doc.docx /body --depth 1 # List body children
officecli get doc.docx "/body/p[1]" # Specific paragraph
officecli get doc.docx "/body/p[1]/r[1]" # Specific run
officecli get doc.docx "/body/tbl[1]" --depth 3 # Table structure
officecli get doc.docx /styles # Style definitions
officecli get doc.docx "/styles/Heading1" # Specific style
officecli get doc.docx "/header[1]" # Header/footer
officecli get doc.docx /numbering # Numbering definitions
officecli get doc.docx "/body/p[1]" --json # JSON output for scripting
officecli query doc.docx 'paragraph[style=Heading1]' # By style
officecli query doc.docx 'p:contains("quarterly")' # By text content
officecli query doc.docx 'p:empty' # Empty paragraphs
officecli query doc.docx 'image:no-alt' # Images without alt text
officecli query doc.docx 'p[align=center] > r[bold=true]' # Compound selectors
officecli query doc.docx 'paragraph[size>=24pt]' # By size
officecli query doc.docx 'field[fieldType!=page]' # Fields by type
Standard footer setup (always use this pattern for documents with a cover page):
Known CLI bug:
--prop field=pageis silently ignored inadd --type footercommands. The footer is created with static text only. You must useraw-setto inject the PAGE field after creating the footer.
# Step 1. Empty footer for cover page (auto-enables differentFirstPage)
officecli add doc.docx / --type footer --prop type=first --prop text=""
# Step 2. Default footer with static "Page " text
officecli add doc.docx / --type footer --prop text="Page " --prop type=default --prop align=center --prop size=9pt --prop font=Calibri
# Step 3. Inject PAGE field via raw-set (footer[2] = default when first-page footer also exists)
officecli raw-set doc.docx "/footer[2]" \
--xpath "//w:p" \
--action append \
--xml '<w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:rPr><w:rFonts w:ascii="Calibri" w:hAnsi="Calibri"/><w:sz w:val="18"/></w:rPr><w:fldChar w:fldCharType="begin"/></w:r><w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:rPr><w:rFonts w:ascii="Calibri" w:hAnsi="Calibri"/><w:sz w:val="18"/></w:rPr><w:instrText xml:space="preserve"> PAGE </w:instrText></w:r><w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:rPr><w:rFonts w:ascii="Calibri" w:hAnsi="Calibri"/><w:sz w:val="18"/></w:rPr><w:fldChar w:fldCharType="end"/></w:r>'
Footer index rule: When both a first-page footer and a default footer are added, the default footer is
/footer[2]. If there is no first-page footer, the default footer is/footer[1]. Always verify withofficecli get doc.docx "/footer[2]"(or"/footer[1]") to confirm the<w:fldChar>element is present.
LibreOffice rendering note: Page number fields may display as static "Page" in LibreOffice PDF preview -- this is a LibreOffice limitation. Open in Microsoft Word to see actual page numbers. Confirm the field with
officecli get doc.docx "/footer[2]"-- output must showfldCharchildren.
Always use open/close -- it is the smart default. Every command benefits: no repeated file I/O.
officecli open doc.docx # Load once into memory (returns IMMEDIATELY; daemon in bg)
officecli add doc.docx ... # All commands run in memory -- fast
officecli set doc.docx ...
officecli close doc.docx # Write once to disk
Do NOT run
officecli openas a background shell job (e.g. viarun_in_background). It returns immediately and the daemon lives in the background automatically. Running it as a monitored shell creates zombies and file locks. If stuck:pkill -9 -f "officecli.*resident"then retry.
Execute multiple operations in a single open/save cycle:
cat <<'EOF' | officecli batch doc.docx
[
{"command":"add","parent":"/body","type":"paragraph","props":{"text":"Introduction","style":"Heading1"}},
{"command":"add","parent":"/body","type":"paragraph","props":{"text":"This report covers Q4 results.","font":"Calibri","size":"11pt"}}
]
EOF
Batch supports: add, set, get, query, remove, move, swap, view, raw, raw-set, validate.
Batch fields: command, path, parent, type, from, to, index, after, before, props (dict), selector, mode, depth, part, xpath, action, xml.
parent = container to add into (for add). path = element to modify (for set, get, remove, move, swap).
Error decoding:
'X' is an invalid start of a value= shell syntax leaked into JSON (unquoted$, stray shell metachar). Use heredoccat <<'EOF' | officecli batch FILEwith single-quoted'EOF'delimiter — prevents shell expansion.
| Pitfall | Correct Approach |
|---------|-----------------|
| --name "foo" | Use --prop name="foo" -- all attributes go through --prop |
| Guessing property names | Run officecli help docx set paragraph --json to see exact names |
| \n in shell strings | Use \\n for newlines in --prop text="line1\\nline2" |
| Modifying an open file | Close the file in Word first |
| Hex colors with # | Use FF0000 not #FF0000 -- no hash prefix |
| Paths are 1-based | /body/p[1], /body/tbl[1] -- XPath convention |
| --index is 0-based | --index 0 = first position -- array convention |
| Unquoted [N] in zsh/bash | Shell glob-expands /body/p[1] -- always quote paths: "/body/p[1]" |
| Spacing in raw numbers | Use unit-qualified values: '12pt', '0.5cm', '1.5x' not raw twips |
| Empty paragraphs for spacing | Use spaceBefore/spaceAfter properties on paragraphs |
| $ in --prop text= (shell) | Use single quotes: --prop text='$50M' |
| $ and ' in batch JSON | Use heredoc: cat <<'EOF' \| officecli batch |
| Wrong border format | Use style;size;color;space format: single;4;FF0000;1 |
| listStyle on run | listStyle is a paragraph property, not a run property |
| Row-level bold/color/shd | Row set only supports height, header, and c1/c2/c3 text shortcuts. Use cell-level set for formatting |
| Section vs root property names | Section uses pagewidth/pageheight (lowercase). Document root uses pageWidth/pageHeight (camelCase) |
| --prop field=page in footer | SILENTLY IGNORED in add --type footer. Must use raw-set to inject <w:fldChar>. See §9.5 |
| Page number on cover | Adding --type footer --prop type=first auto-enables differentFirstPage. Do NOT use set / --prop differentFirstPage=true -- unsupported and silently fails |
| TOC skipped for multi-heading docs | Any document with 3+ headings requires a TOC. Add with --type toc --index 0 after cover page break |
| Code block indentation via spaces | Use ind.left paragraph property (e.g. --prop ind.left=720) -- consecutive spaces produce warnings |
| Recreating styles that exist in template | cp source.docx target.docx first. Don't add styles with existing IDs — validate fails. See §2 |
| officecli open as background shell | Run foreground — open returns immediately, daemon runs in bg automatically. Background shell spawn creates zombies + file locks |
| Batch JSON 'X' is an invalid start of a value | Shell syntax leaked into JSON. Use heredoc: cat <<'EOF' \| officecli batch FILE.docx |
| OMML equation creation | officecli cannot generate OMML. Options: (a) inline text, or (b) L4 — scripts/docx_cli.py open FILE work/ → inject <m:oMath> into work/word/document.xml → scripts/docx_cli.py save. No dedicated OMML guide; author XML by hand or port from an existing equation-containing fixture in tests/fixtures/ |
| Issue | Workaround |
|---|---|
| No visual preview | Unlike pptx (SVG/HTML), docx has no built-in rendering. Use view text/outline/annotated/issues for verification. Users must open in Word for visual check. |
| Track changes creation requires raw XML | OfficeCLI can accept/reject tracked changes but cannot create them via high-level commands. Use raw-set with XML or scripts/docx_cli.py (L3). |
| Tab stops may require raw XML | Tab stop creation is not exposed in high-level commands. Use raw-set to add tab stop definitions. |
| Chart series cannot be added after creation | set --prop data= can only update existing series, not add new ones. Delete and recreate the chart. |
| Complex numbering definitions | listStyle=bullet/numbered covers simple cases. For multi-level lists, use numId/numLevel properties. |
| Shell quoting in batch with echo | Use heredoc: cat <<'EOF' \| officecli batch doc.docx. |
| Batch intermittent failure | ~1-in-15 batch operations may fail with "Failed to send to resident". Retry or close/reopen file. Split large batches into 10-15 operation chunks. |
| Table-level padding produces invalid XML | Do not use set tbl[N] --prop padding=N. Use cell-level padding.top/padding.bottom. If already applied, remove with raw-set --xpath "//w:tbl[N]/w:tblPr/w:tblCellMar" --action remove. |
| Internal hyperlinks not supported | hyperlink only accepts absolute URIs. For #bookmark links, use raw-set with <w:hyperlink w:anchor="bookmarkName">. |
| Table --index positioning unreliable | --index N on add /body --type table may be ignored. Add content in desired order, or remove/re-add elements. |
| \mathcal in equations causes validation errors | Use \mathit or plain letters instead. |
| view text shows "1." for all numbered items | Display-only limitation. Rendered output in Word/LibreOffice shows correct auto-incrementing numbers. |
| chartType=pie/doughnut in LibreOffice PDF | Do NOT use these chart types when LibreOffice PDF delivery is required. Slices are invisible. Use chartType=column or bar instead. |
| OMML equation creation via officecli | officecli has no high-level OMML generator. Escalate to L4: scripts/docx_cli.py open FILE work/ → inject <m:oMath> into work/word/document.xml → scripts/docx_cli.py save. Copy OMML from a reference fixture (tests/fixtures/) rather than authoring from scratch. |
Assume there are problems. Your job is to find them.
officecli view doc.docx issues
officecli view doc.docx issues --type format
officecli view doc.docx issues --type content
officecli view doc.docx issues --type structure
officecli view doc.docx text
officecli view doc.docx outline
officecli query doc.docx 'p:empty'
officecli query doc.docx 'image:no-alt'
# Check for leftover placeholders
officecli query doc.docx 'p:contains("lorem")'
officecli query doc.docx 'p:contains("xxxx")'
officecli query doc.docx 'p:contains("placeholder")'
officecli get doc.docx "/footer[2]" --depth 3 (must show fldChar elements). Required: raw-set PAGE field injection -- --prop field=page in add command is silently ignored. If no first-page footer, use "/footer[1]".--prop type=first --prop text="") if document has a cover page--type toc --prop levels="1-3" --prop title="Table of Contents" --prop hyperlinks=true --prop pagenumbers=true --index 0)officecli validateview issues + view outline + view text + validateDo not declare success until you've completed at least one fix-and-verify cycle.
QA display notes:
view text shows "1." for ALL numbered list items regardless of actual rendered number. This is a display limitation -- not a defect.view issues flags "body paragraph missing first-line indent" on cover page paragraphs, centered headings, list items, callout boxes, etc. These warnings are expected. First-line indent is only required in APA/academic body text.raw-set, the XML structure must be exact. See the Headers & Footers section for the correct fldChar/instrText/fldChar sequence.spacing-after properties.listStyle=bullet or listStyle=number.--prop font=... when it suffices.references/*.md + check scripts/*.py (§3) BEFORE giving up or falling back to inline text. The Pre-officecli OOXML workflow is still available via scripts.| Tool | Purpose | Status |
|------|---------|--------|
| officecli (PATH) | Primary DOCX CLI -- global install includes CJK fork | Required |
| dotnet | Runtime/build for officecli | Required for fork builds |
| python3 | Fallback scripts (scripts/.py) | Required for L3/L4 |
| lxml | Python XML processing for scripts/ooxml/ | Required for L3/L4 (pip install lxml) |
| soffice | PDF conversion / .doc migration / macro workflows | Optional fallback |
| pdftoppm | Image-based QA after PDF render | Optional fallback |
dotnet publish -c Release -o build-local
development
Native Web UI structured renderer schemas for compose-block drafts, search-results cards, dataframe tables, chart-json charts, and diff output
tools
Unified search hub. Route any web/real-time/X lookup through a 4-tier escalation: built-in web search → cli-jaw browser CDP → progrok Grok OAuth → web-ai (Grok Expert / GPT Pro). Use for: search, 검색, web search, latest news, real-time info, X/Twitter, fact lookup, deep research.
development
UI/UX intent discovery, design vocabulary, product personalities, UX state patterns, typography line break judgment, favicon/product logo design, and logo trust section design. Use when user design direction is vague, when building onboarding/empty/error states, when setting up favicons or product logos, or when referencing a product aesthetic.
development
Canonical owner of module boundary rules, circular dependency detection/prevention, implicit coupling taxonomy, barrel/re-export discipline, and boundary-only defensive programming. Referenced by dev, dev-code-reviewer, dev-backend, dev-frontend stubs.