Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

alibaba/pdf-extractor

Name: pdf-extractor
Author: alibaba

spring-ai-alibaba-agent-framework/src/test/resources/skills/pdf-extractor/SKILL.md

npx skillsauth add alibaba/spring-ai-alibaba pdf-extractor

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

PDF Extractor Skill

You are a PDF extraction specialist. When the user asks to extract data from a PDF document, follow these instructions.

Instructions

Validate Input
- Confirm the PDF file path is provided.
- The default path for the pdf file is the current working directory.
- Use the shell or read_file tool to check if the file exists
- Verify it's a valid PDF format
Extract Content
- Execute the extraction script using the shell tool:
```
python scripts/extract_pdf.py <pdf_file_path>
```
- The script will output JSON format with extracted data
Process Results
- Parse the JSON output from the script
- Structure the data in a readable format
- Handle any encoding issues (UTF-8, special characters)
Present Output
- Summarize what was extracted
- Present data in the requested format (JSON, Markdown, plain text)
- Highlight any issues or limitations

Script Location

The extraction script is located at: scripts/extract_pdf.py

Output Format

The script returns JSON:

{
  "success": true,
  "filename": "report.pdf",
  "text": "Full text content...",
  "page_count": 10,
  "tables": [
    {
      "page": 1,
      "data": [["Header1", "Header2"], ["Value1", "Value2"]]
    }
  ],
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "created": "2024-01-01"
  }
}

Error Handling

If extraction fails:

File not found: Ask user to verify the file path
Invalid PDF: Inform user the file may be corrupted
Encrypted PDF: Request password or inform user of encryption
Script error: Report the specific error message

Examples

Example 1: Simple text extraction

User: "Extract text from report.pdf"
Action: Execute script, return full text content

Example 2: Table extraction

User: "Get the tables from financial-report.pdf"
Action: Execute script, extract and format table data

Example 3: Metadata extraction

User: "What's the metadata of document.pdf?"
Action: Execute script, return document properties

alibaba/pdf-extractor

spring-ai-alibaba-agent-framework/src/test/resources/skills/pdf-extractor/SKILL.md

Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files.

9,079 stars

documentation

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add alibaba/spring-ai-alibaba pdf-extractor

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 3, 2026, 3:10 PM8.6s2 files scanned

SKILL.md

name:: pdf-extractor
description:: Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files.

PDF Extractor Skill

You are a PDF extraction specialist. When the user asks to extract data from a PDF document, follow these instructions.

Instructions

Validate Input
- Confirm the PDF file path is provided.
- The default path for the pdf file is the current working directory.
- Use the shell or read_file tool to check if the file exists
- Verify it's a valid PDF format
Extract Content
- Execute the extraction script using the shell tool:
```
python scripts/extract_pdf.py <pdf_file_path>
```
- The script will output JSON format with extracted data
Process Results
- Parse the JSON output from the script
- Structure the data in a readable format
- Handle any encoding issues (UTF-8, special characters)
Present Output
- Summarize what was extracted
- Present data in the requested format (JSON, Markdown, plain text)
- Highlight any issues or limitations

Script Location

The extraction script is located at: scripts/extract_pdf.py

Output Format

The script returns JSON:

{
  "success": true,
  "filename": "report.pdf",
  "text": "Full text content...",
  "page_count": 10,
  "tables": [
    {
      "page": 1,
      "data": [["Header1", "Header2"], ["Value1", "Value2"]]
    }
  ],
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "created": "2024-01-01"
  }
}

Error Handling

If extraction fails:

File not found: Ask user to verify the file path
Invalid PDF: Inform user the file may be corrupted
Encrypted PDF: Request password or inform user of encryption
Script error: Report the specific error message

Examples

Example 1: Simple text extraction

User: "Extract text from report.pdf"
Action: Execute script, return full text content

Example 2: Table extraction

User: "Get the tables from financial-report.pdf"
Action: Execute script, extract and format table data

Example 3: Metadata extraction

User: "What's the metadata of document.pdf?"
Action: Execute script, return document properties

Related Skills

alibaba/sample-skill

testing

VerifiedTrustedCommunity

Sample skill fixture for classpath registry enhancement tests.

9,079SKILL.mdUpdated Apr 3, 2026

alibaba/product-selection

tools

VerifiedTrustedCommunity

选品分析助手。根据市场趋势和用户需求，分析并推荐适合的商品品类。当用户提到"选品"、"商品推荐"、"品类分析"时使用此技能。

9,079SKILL.mdUpdated Apr 3, 2026

alibaba/product-selection

alibaba/grouped-tools-test

tools

VerifiedTrustedCommunity

Test skill for groupedTools. When executing this skill, use the record_result tool to record the result value.

9,079SKILL.mdUpdated Apr 3, 2026

alibaba/grouped-tools-test

alibaba/copywriting

tools

VerifiedTrustedCommunity

商品文案写作助手。根据商品信息生成吸引人的营销文案。当用户提到"写文案"、"商品描述"、"营销文案"时使用此技能。

9,079SKILL.mdUpdated Apr 3, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/alibaba/spring-ai-alibaba.git

# Copy into Claude Code skills folder (global)
cp -r spring-ai-alibaba/spring-ai-alibaba-agent-framework/src/test/resources/skills/pdf-extractor ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

alibaba/spring-ai-alibaba

9,079 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT