Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

shuyu-labs/office-to-md

Name: office-to-md
Author: shuyu-labs

skills/codex/office-to-md/SKILL.md

npx skillsauth add shuyu-labs/webcode office-to-md

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Office Document to Markdown Converter

Convert various Office document formats to structured Markdown with text, table, and image extraction.

File Description

enhanced_parser.py - Core document parser
doc_converter.py - DOC to DOCX converter (requires LibreOffice)
requirements.txt - Python dependencies

Install Dependencies

pip install -r requirements.txt

Additional Dependencies for DOC Format

.doc format requires LibreOffice:

# Windows: Install LibreOffice from official website
# https://www.libreoffice.org/download/

# Linux
sudo apt install libreoffice

# Mac
brew install --cask libreoffice

Quick Start

Python Code

from enhanced_parser import EnhancedDocumentParser

# Initialize parser
parser = EnhancedDocumentParser(
    image_base_url="http://localhost:5000",
    image_save_dir="./static/images",
    filter_headers_footers=True  # Filter headers and footers
)

# Parse document
result = parser.parse_document("document.docx")

if result["success"]:
    print(result["markdown"])
    print(f"Extracted {result['images_count']} images")

Start API Service

# Start service using app.py from project root
python app.py

# Visit http://localhost:5000/analyzer to upload files

Supported Formats

| Format | Extensions | Notes | |--------|-----------|-------| | Word | .docx, .doc | .doc requires LibreOffice | | Excel | .xlsx, .xls | Supports multiple worksheets and date formats | | PowerPoint | .pptx | Extracts slide text and images | | PDF | .pdf | Auto-detects tables and images |

Features

Word Documents

Automatic heading level detection
Convert tables to Markdown tables
Extract inline images
Filter headers and footers
Preserve list formatting

Excel Workbooks

Support for multiple worksheets
Automatic date format detection (prevents display as numbers)
Convert to Markdown tables
Extract embedded images

PowerPoint Presentations

Extract content by slide
Extract images and text boxes
Preserve slide order

PDF Documents

Auto-detect tables (line detection + text position detection)
Extract page images
Intelligently identify headings and lists
Output content in original order

Advanced Options

DOC Conversion

# Test LibreOffice configuration
python doc_converter.py

PDF Table Strategy

parser = EnhancedDocumentParser(
    pdf_table_strategy="lines_strict"  # Default: strict line detection, fastest
    # "lines": Normal line detection
    # "text": Based on text position, more accurate but slower
)

Image Processing

parser = EnhancedDocumentParser(
    image_base_url="https://your-domain.com",  # Image access URL
    image_save_dir="./static/images"           # Image save directory
)

Return Format

{
  "success": true,
  "markdown": "# Document Title\n\nContent...",
  "images_count": 2,
  "images": [
    {
      "filename": "uuid.png",
      "url": "http://localhost:5000/static/images/uuid.png",
      "size": 12345
    }
  ],
  "file_type": "docx",
  "file_info": {
    "name": "document.docx",
    "size": 45678,
    "paragraphs": 50,
    "tables": 3
  }
}

Common Issues

DOC Conversion Failed

Ensure LibreOffice is installed
Run python doc_converter.py to test configuration

Dates Display as Numbers

Excel parsing automatically handles date formats
Ensure you're using the latest version of enhanced_parser.py

PDF Table Recognition Inaccurate

Try different pdf_table_strategy parameters
Use "lines_strict" for standard tables
Use "text" for complex tables

File Limitations

Maximum file size: 160MB
Supported extensions: docx, doc, pdf, xlsx, xls, pptx
Automatic cleanup of temporary files

shuyu-labs/office-to-md

skills/codex/office-to-md/SKILL.md

Convert Office documents (Word, Excel, PowerPoint, PDF) to Markdown format. ONLY use this skill when the user explicitly requests to CONVERT, TRANSFORM or PARSE a specific office file into Markdown. Do NOT trigger for general questions, documentation reading, or discussions about files.

276 stars

development

Updated May 19, 2026

$ install --global

skillsauth

npx skillsauth add shuyu-labs/webcode office-to-md

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 16, 2026, 4:47 AM226.9s4 files scanned

SKILL.md

name:: office-to-md
description:: Convert Office documents (Word, Excel, PowerPoint, PDF) to Markdown format. ONLY use this skill when the user explicitly requests to CONVERT, TRANSFORM or PARSE a specific office file into Markdown. Do NOT trigger for general questions, documentation reading, or discussions about files.

Office Document to Markdown Converter

Convert various Office document formats to structured Markdown with text, table, and image extraction.

File Description

enhanced_parser.py - Core document parser
doc_converter.py - DOC to DOCX converter (requires LibreOffice)
requirements.txt - Python dependencies

Install Dependencies

pip install -r requirements.txt

Additional Dependencies for DOC Format

.doc format requires LibreOffice:

# Windows: Install LibreOffice from official website
# https://www.libreoffice.org/download/

# Linux
sudo apt install libreoffice

# Mac
brew install --cask libreoffice

Quick Start

Python Code

from enhanced_parser import EnhancedDocumentParser

# Initialize parser
parser = EnhancedDocumentParser(
    image_base_url="http://localhost:5000",
    image_save_dir="./static/images",
    filter_headers_footers=True  # Filter headers and footers
)

# Parse document
result = parser.parse_document("document.docx")

if result["success"]:
    print(result["markdown"])
    print(f"Extracted {result['images_count']} images")

Start API Service

# Start service using app.py from project root
python app.py

# Visit http://localhost:5000/analyzer to upload files

Supported Formats

Features

Word Documents

Automatic heading level detection
Convert tables to Markdown tables
Extract inline images
Filter headers and footers
Preserve list formatting

Excel Workbooks

Support for multiple worksheets
Automatic date format detection (prevents display as numbers)
Convert to Markdown tables
Extract embedded images

PowerPoint Presentations

Extract content by slide
Extract images and text boxes
Preserve slide order

PDF Documents

Auto-detect tables (line detection + text position detection)
Extract page images
Intelligently identify headings and lists
Output content in original order

Advanced Options

DOC Conversion

# Test LibreOffice configuration
python doc_converter.py

PDF Table Strategy

parser = EnhancedDocumentParser(
    pdf_table_strategy="lines_strict"  # Default: strict line detection, fastest
    # "lines": Normal line detection
    # "text": Based on text position, more accurate but slower
)

Image Processing

parser = EnhancedDocumentParser(
    image_base_url="https://your-domain.com",  # Image access URL
    image_save_dir="./static/images"           # Image save directory
)

Return Format

{
  "success": true,
  "markdown": "# Document Title\n\nContent...",
  "images_count": 2,
  "images": [
    {
      "filename": "uuid.png",
      "url": "http://localhost:5000/static/images/uuid.png",
      "size": 12345
    }
  ],
  "file_type": "docx",
  "file_info": {
    "name": "document.docx",
    "size": 45678,
    "paragraphs": 50,
    "tables": 3
  }
}

Common Issues

DOC Conversion Failed

Ensure LibreOffice is installed
Run python doc_converter.py to test configuration

Dates Display as Numbers

Excel parsing automatically handles date formats
Ensure you're using the latest version of enhanced_parser.py

PDF Table Recognition Inaccurate

Try different pdf_table_strategy parameters
Use "lines_strict" for standard tables
Use "text" for complex tables

File Limitations

Maximum file size: 160MB
Supported extensions: docx, doc, pdf, xlsx, xls, pptx
Automatic cleanup of temporary files

Related Skills

shuyu-labs/pptx

documentation

VerifiedTrustedCommunity

Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks

276SKILL.mdUpdated May 19, 2026

shuyu-labs/planning-with-files

data-ai

VerifiedTrustedCommunity

Transforms workflow to use Manus-style persistent markdown files for planning, progress tracking, and knowledge storage. Use when starting complex tasks, multi-step projects, research tasks, or when the user mentions planning, organizing work, tracking progress, or wants structured output.

276SKILL.mdUpdated May 19, 2026

shuyu-labs/planning-with-files

shuyu-labs/pdf

tools

VerifiedTrustedCommunity

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

276SKILL.mdUpdated May 19, 2026

shuyu-labs/ms-agent-framework-rag

development

VerifiedTrustedCommunity

Comprehensive guide for building Agentic RAG systems using Microsoft Agent Framework in C#. Use when creating RAG applications with semantic search, document indexing, and intelligent agent orchestration. Includes scaffolding scripts, reference implementations, and documentation for vector databases, embedding models, and multi-agent workflows.

276SKILL.mdUpdated May 19, 2026

shuyu-labs/ms-agent-framework-rag

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/shuyu-labs/webcode.git

# Copy into Claude Code skills folder (global)
cp -r webcode/skills/codex/office-to-md ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

shuyu-labs/webcode

276 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT