skills/docx/SKILL.md
Use when creating, editing, analyzing, or reviewing .docx files, implementing tracked changes (redlining), extracting text from Word documents, or converting documents to images. NEVER use for .xlsx spreadsheets (use xlsx skill), .pptx presentations (use pptx skill), or PDF-only operations.
npx skillsauth add sharkitect-solutions/sharkitect-claude-toolkit docxInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use "Text extraction" or "Raw XML access" sections below
Use "Creating a new Word document" workflow
Your own document + simple changes Use "Editing an existing Word document" workflow
Someone else's document Use "Redlining workflow" (recommended default)
Legal, academic, business, or government docs Use "Redlining workflow" (required)
Convert the document to markdown using pandoc to read text contents:
pandoc --track-changes=all path-to-file.docx -o output.md
# Options: --track-changes=accept/reject/all
Required for: comments, complex formatting, document structure, embedded media, and metadata.
python ooxml/scripts/unpack.py <office_file> <output_directory>
word/document.xml - Main document contentsword/comments.xml - Comments referenced in document.xmlword/media/ - Embedded images and media files<w:ins> (insertions) and <w:del> (deletions) tagsUse docx-js to create Word documents using JavaScript/TypeScript.
docx-js.md (~500 lines) completely from start to finish. NEVER set any range limits when reading this file. Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding.Use the Document library (Python library for OOXML manipulation). The library handles infrastructure setup and provides methods for document manipulation. For complex scenarios, access the underlying DOM directly through the library.
ooxml.md (~600 lines) completely from start to finish. NEVER set any range limits when reading this file. Read the full file content for the Document library API and XML patterns.python ooxml/scripts/unpack.py <office_file> <output_directory>python ooxml/scripts/pack.py <input_directory> <office_file>This workflow plans comprehensive tracked changes using markdown before implementing them in OOXML. CRITICAL: Implement ALL changes systematically.
Batching Strategy: Group related changes into batches of 3-10 changes. Test each batch before moving to the next.
Principle: Minimal, Precise Edits
Only mark text that actually changes. Repeating unchanged text makes edits harder to review and appears unprofessional. Break replacements into: [unchanged text] + [deletion] + [insertion] + [unchanged text]. Preserve the original run's RSID for unchanged text by extracting the <w:r> element from the original and reusing it.
Example - Changing "30 days" to "60 days":
# BAD - Replaces entire sentence
'<w:del><w:r><w:delText>The term is 30 days.</w:delText></w:r></w:del><w:ins><w:r><w:t>The term is 60 days.</w:t></w:r></w:ins>'
# GOOD - Only marks what changed, preserves original <w:r> for unchanged text
'<w:r w:rsidR="00AB12CD"><w:t>The term is </w:t></w:r><w:del><w:r><w:delText>30</w:delText></w:r></w:del><w:ins><w:r><w:t>60</w:t></w:r></w:ins><w:r w:rsidR="00AB12CD"><w:t> days.</w:t></w:r>'
Get markdown representation:
pandoc --track-changes=all path-to-file.docx -o current.md
Identify and group changes: Review the document and identify ALL changes needed, organizing into logical batches:
Location methods (for finding changes in XML):
Batch organization (group 3-10 related changes per batch):
Read documentation and unpack:
ooxml.md (~600 lines) completely from start to finish. NEVER set any range limits when reading this file. Pay special attention to "Document Library" and "Tracked Change Patterns" sections.python ooxml/scripts/unpack.py <file.docx> <dir>Implement changes in batches:
Suggested batch groupings:
For each batch:
a. Map text to XML: Grep for text in word/document.xml to verify how text is split across <w:r> elements.
b. Create and run script: Use get_node to find nodes, implement changes, then doc.save(). See "Document Library" section in ooxml.md for patterns.
Note: Always grep word/document.xml immediately before writing a script to get current line numbers. Line numbers change after each script run.
Pack the document:
python ooxml/scripts/pack.py unpacked reviewed-document.docx
Final verification:
pandoc --track-changes=all reviewed-document.docx -o verification.md
grep "original phrase" verification.md # Should NOT find it
grep "replacement phrase" verification.md # Should find it
Two-step process: DOCX to PDF, then PDF to JPEG.
# Step 1: Convert DOCX to PDF
soffice --headless --convert-to pdf document.docx
# Step 2: Convert PDF pages to JPEG (creates page-1.jpg, page-2.jpg, etc.)
pdftoppm -jpeg -r 150 document.pdf page
# Specific page range example
pdftoppm -jpeg -r 150 -f 2 -l 5 document.pdf page
Key flags: -r 150 = 150 DPI resolution, -f N = first page, -l N = last page, -png for PNG output.
When generating code for DOCX operations:
sudo apt-get install pandoc (text extraction)npm install -g docx (creating new documents)sudo apt-get install libreoffice (PDF conversion)sudo apt-get install poppler-utils (PDF-to-image conversion)pip install defusedxml (secure XML parsing)| Excuse | Why It's Wrong | |--------|----------------| | "I'll edit the XML directly without the Document library" | The Document library handles namespace management, RSID generation, and XML integrity automatically. Direct XML edits cause malformed documents that Word refuses to open. | | "Tracked changes aren't needed for internal documents" | Any document you didn't create yourself should use redlining. The document owner needs visibility into what changed -- internal docs are no exception. | | "I'll skip reading ooxml.md -- I already know OOXML" | The Document library has a specific API and patterns that differ from generic OOXML knowledge. Skipping it causes API misuse, broken scripts, and wasted time debugging. | | "The document is simple enough to create without reading docx-js.md" | docx-js has non-obvious syntax for tables, styles, and formatting. Even simple documents fail without the reference. The MANDATORY read takes less time than debugging. | | "I'll replace entire paragraphs instead of doing minimal edits" | Whole-paragraph replacements appear unprofessional in review mode and destroy document history. Reviewers cannot see what specifically changed without precise del/ins marking. | | "Pandoc conversion is good enough -- no need to verify XML" | Pandoc shows a flattened view. The XML may contain structural errors, duplicate runs, or malformed tags that pandoc hides. Verification catches silent failures before delivery. | | "I'll implement all changes in one script instead of batching" | Large scripts are nearly impossible to debug when one change breaks another. Batches of 3-10 isolate failures and allow incremental progress without starting over. | | "Line numbers from my last grep are still valid" | Line numbers shift after every script execution. Reusing stale line numbers inserts changes at the wrong location or throws index errors. Always re-grep before each script. |
Signs this skill is being applied incorrectly:
word/document.xml directly without using the Document libraryooxml.md or docx-js.md before startingword/document.xmldocx-js.md firstooxml.md first -- the Document library API is required; direct XML manipulation produces malformed files.docx-js.md first -- the library's syntax for styles, tables, and formatting is non-obvious and cannot be safely guessed.ooxml.md, docx-js.md, unpack/pack scripts) -- these are shared infrastructure; changes break all docx operations across every project.development
When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.
testing
--- name: using-sharkitect-methodology description: Use when starting any conversation in a Sharkitect workspace OR before any task involving NEW pricing, positioning, proposal, strategy, plan-execution, or schema-design work — mandates invocation of Sharkitect-specific methodology skills (pricing-strategy, marketing-strategy-pmm, smb-cfo, hq-revenue-ops, executing-plans, brainstorming) under the same anti-rationalization discipline as using-superpowers. Documentation has failed 4 times across H
testing
Use when user says 'end session', 'wrap up', 'stop for the day', 'done for today', 'close out', 'save session', 'wrapping up', or invokes /end-session. Runs the full 9-step end-of-session protocol: resource audit, MEMORY.md update, lessons capture, plan status, pending items, workspace checklist, .tmp/ audit, git commit+push, Supabase brain sync, session brief, summary. Final step schedules a detached self-kill of the current session ONLY (3s delay) so the window closes cleanly. Other claude.exe processes (active workspaces) are NOT touched -- orphan cleanup is handled separately by Claude-Orphan-Cleanup-Hourly with proper age safeguards. Do NOT use for: mid-session quick saves (use session-checkpoint), skill syncing (use sync-skills.py), brain memory queries (use supabase-sync.py pull), document freshness reviews (use document-lifecycle), resource gap detection (use resource-auditor).
testing
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, passive voice, negative parallelisms, and filler phrases.