Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

einverne/gemini-document-processing

Name: gemini-document-processing
Author: einverne

claude/skills/gemini-document-processing/SKILL.md

npx skillsauth add einverne/dotfiles gemini-document-processing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Gemini Document Processing

Process and analyze PDF documents using Google Gemini's native vision capabilities. Extract structured information, summarize content, answer questions, and understand complex documents with text, images, diagrams, charts, and tables.

Core Capabilities

PDF Vision Processing: Native understanding of PDFs up to 1,000 pages (258 tokens/page)
Multimodal Analysis: Process text, images, diagrams, charts, and tables
Structured Extraction: Output to JSON with schema validation
Document Q&A: Answer questions based on document content
Summarization: Generate summaries preserving context
Format Conversion: Transcribe to HTML while preserving layout

When to Use This Skill

Use this skill when you need to:

Extract structured data from PDF documents (invoices, resumes, forms)
Summarize long documents or reports
Answer questions about PDF content
Analyze documents with complex layouts, charts, or diagrams
Convert PDFs to structured formats (JSON, HTML)
Process multiple documents in batch
Build document processing pipelines

Quick Setup

1. API Key Configuration

The skill checks for GEMINI_API_KEY in this priority order:

Process environment variable
.env file in skill directory (.claude/skills/gemini-document-processing/.env)
.env file in project root

Get your API key: https://aistudio.google.com/apikey

Option A: Environment Variable (Recommended)

export GEMINI_API_KEY="your-api-key-here"

Option B: Skill Directory

cd .claude/skills/gemini-document-processing
echo "GEMINI_API_KEY=your-api-key-here" > .env

Option C: Project Root

echo "GEMINI_API_KEY=your-api-key-here" > .env

2. Install Dependencies

pip install google-genai python-dotenv

Common Use Cases

1. Extract Structured Data from PDF

# Use the provided script
python .claude/skills/gemini-document-processing/scripts/process-document.py \
  --file invoice.pdf \
  --prompt "Extract invoice details as JSON" \
  --format json

2. Summarize Long Document

# Process and summarize
python .claude/skills/gemini-document-processing/scripts/process-document.py \
  --file report.pdf \
  --prompt "Provide a concise executive summary"

3. Answer Questions About Document

# Q&A on document content
python .claude/skills/gemini-document-processing/scripts/process-document.py \
  --file contract.pdf \
  --prompt "What are the key terms and conditions?"

4. Process with Python SDK

from google import genai

client = genai.Client()

# Read PDF
with open('document.pdf', 'rb') as f:
    pdf_data = f.read()

# Process document
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Extract key information from this document',
        genai.types.Part.from_bytes(
            data=pdf_data,
            mime_type='application/pdf'
        )
    ]
)

print(response.text)

5. Structured Output with JSON Schema

from google import genai
from pydantic import BaseModel

class InvoiceData(BaseModel):
    invoice_number: str
    date: str
    total: float
    vendor: str

client = genai.Client()

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Extract invoice details',
        genai.types.Part.from_bytes(
            data=open('invoice.pdf', 'rb').read(),
            mime_type='application/pdf'
        )
    ],
    config=genai.types.GenerateContentConfig(
        response_mime_type='application/json',
        response_schema=InvoiceData
    )
)

invoice_data = InvoiceData.model_validate_json(response.text)

Key Constraints

Format: Only PDFs get vision processing (TXT, HTML, Markdown are text-only)
Size: < 20MB use inline encoding, > 20MB use File API
Pages: Max 1,000 pages per document
Storage: File API stores for 48 hours only
Cost: 258 tokens per page (fixed, regardless of content density)

Performance Tips

Use Inline Encoding for PDFs < 20MB (simpler, single request)
Use File API for larger files or repeated queries (enables context caching)
Place Prompt After PDF for single-page documents
Use Context Caching when querying same PDF multiple times
Process in Parallel for multiple independent documents
Use gemini-2.5-flash for best price/performance ratio

Decision Guide

PDF < 20MB?
├─ Yes → Use inline base64 encoding
└─ No  → Use File API

Need structured JSON output?
├─ Yes → Define response_schema with Pydantic
└─ No  → Get text response

Multiple queries on same PDF?
├─ Yes → Use File API + Context Caching
└─ No  → Inline encoding is sufficient

Script Reference

The skill includes a ready-to-use processing script:

# Basic usage
python scripts/process-document.py --file document.pdf --prompt "Your prompt"

# With JSON output
python scripts/process-document.py --file document.pdf --prompt "Extract data" --format json

# With File API (for large files)
python scripts/process-document.py --file large-document.pdf --prompt "Summarize" --use-file-api

# Multiple prompts
python scripts/process-document.py --file document.pdf --prompt "Question 1" --prompt "Question 2"

References

For comprehensive documentation, see:

references/gemini-document-processing-report.md - Complete API reference
references/quick-reference.md - Quick lookup guide
references/code-examples.md - Additional code patterns

Troubleshooting

API Key Not Found:

# Check API key is set
./scripts/check-api-key.sh

File Too Large:

Use File API for files > 20MB
Add --use-file-api flag to the script

Vision Not Working:

Ensure file is PDF format
Other formats (TXT, HTML) don't support vision processing

Support

API Documentation: https://ai.google.dev/gemini-api/docs/document-processing
Get API Key: https://aistudio.google.com/apikey
Model Info: https://ai.google.dev/gemini-api/docs/models/gemini

einverne/gemini-document-processing

claude/skills/gemini-document-processing/SKILL.md

Guide for implementing Google Gemini API document processing - analyze PDFs with native vision to extract text, images, diagrams, charts, and tables. Use when processing documents, extracting structured data, summarizing PDFs, answering questions about document content, or converting documents to structured formats. (project)

117 stars

development

Updated May 28, 2026

$ install --global

skillsauth

npx skillsauth add einverne/dotfiles gemini-document-processing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 28, 2026, 3:34 AM295.1s9 files scanned

SKILL.md

name:: gemini-document-processing
description:: Guide for implementing Google Gemini API document processing - analyze PDFs with native vision to extract text, images, diagrams, charts, and tables. Use when processing documents, extracting structured data, summarizing PDFs, answering questions about document content, or converting documents to structured formats. (project)

Gemini Document Processing

Core Capabilities

PDF Vision Processing: Native understanding of PDFs up to 1,000 pages (258 tokens/page)
Multimodal Analysis: Process text, images, diagrams, charts, and tables
Structured Extraction: Output to JSON with schema validation
Document Q&A: Answer questions based on document content
Summarization: Generate summaries preserving context
Format Conversion: Transcribe to HTML while preserving layout

When to Use This Skill

Use this skill when you need to:

Extract structured data from PDF documents (invoices, resumes, forms)
Summarize long documents or reports
Answer questions about PDF content
Analyze documents with complex layouts, charts, or diagrams
Convert PDFs to structured formats (JSON, HTML)
Process multiple documents in batch
Build document processing pipelines

Quick Setup

1. API Key Configuration

The skill checks for GEMINI_API_KEY in this priority order:

Process environment variable
.env file in skill directory (.claude/skills/gemini-document-processing/.env)
.env file in project root

Get your API key: https://aistudio.google.com/apikey

Option A: Environment Variable (Recommended)

export GEMINI_API_KEY="your-api-key-here"

Option B: Skill Directory

cd .claude/skills/gemini-document-processing
echo "GEMINI_API_KEY=your-api-key-here" > .env

Option C: Project Root

echo "GEMINI_API_KEY=your-api-key-here" > .env

2. Install Dependencies

pip install google-genai python-dotenv

Common Use Cases

1. Extract Structured Data from PDF

# Use the provided script
python .claude/skills/gemini-document-processing/scripts/process-document.py \
  --file invoice.pdf \
  --prompt "Extract invoice details as JSON" \
  --format json

2. Summarize Long Document

# Process and summarize
python .claude/skills/gemini-document-processing/scripts/process-document.py \
  --file report.pdf \
  --prompt "Provide a concise executive summary"

3. Answer Questions About Document

# Q&A on document content
python .claude/skills/gemini-document-processing/scripts/process-document.py \
  --file contract.pdf \
  --prompt "What are the key terms and conditions?"

4. Process with Python SDK

from google import genai

client = genai.Client()

# Read PDF
with open('document.pdf', 'rb') as f:
    pdf_data = f.read()

# Process document
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Extract key information from this document',
        genai.types.Part.from_bytes(
            data=pdf_data,
            mime_type='application/pdf'
        )
    ]
)

print(response.text)

5. Structured Output with JSON Schema

from google import genai
from pydantic import BaseModel

class InvoiceData(BaseModel):
    invoice_number: str
    date: str
    total: float
    vendor: str

client = genai.Client()

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Extract invoice details',
        genai.types.Part.from_bytes(
            data=open('invoice.pdf', 'rb').read(),
            mime_type='application/pdf'
        )
    ],
    config=genai.types.GenerateContentConfig(
        response_mime_type='application/json',
        response_schema=InvoiceData
    )
)

invoice_data = InvoiceData.model_validate_json(response.text)

Key Constraints

Format: Only PDFs get vision processing (TXT, HTML, Markdown are text-only)
Size: < 20MB use inline encoding, > 20MB use File API
Pages: Max 1,000 pages per document
Storage: File API stores for 48 hours only
Cost: 258 tokens per page (fixed, regardless of content density)

Performance Tips

Use Inline Encoding for PDFs < 20MB (simpler, single request)
Use File API for larger files or repeated queries (enables context caching)
Place Prompt After PDF for single-page documents
Use Context Caching when querying same PDF multiple times
Process in Parallel for multiple independent documents
Use gemini-2.5-flash for best price/performance ratio

Decision Guide

PDF < 20MB?
├─ Yes → Use inline base64 encoding
└─ No  → Use File API

Need structured JSON output?
├─ Yes → Define response_schema with Pydantic
└─ No  → Get text response

Multiple queries on same PDF?
├─ Yes → Use File API + Context Caching
└─ No  → Inline encoding is sufficient

Script Reference

The skill includes a ready-to-use processing script:

# Basic usage
python scripts/process-document.py --file document.pdf --prompt "Your prompt"

# With JSON output
python scripts/process-document.py --file document.pdf --prompt "Extract data" --format json

# With File API (for large files)
python scripts/process-document.py --file large-document.pdf --prompt "Summarize" --use-file-api

# Multiple prompts
python scripts/process-document.py --file document.pdf --prompt "Question 1" --prompt "Question 2"

References

For comprehensive documentation, see:

references/gemini-document-processing-report.md - Complete API reference
references/quick-reference.md - Quick lookup guide
references/code-examples.md - Additional code patterns

Troubleshooting

API Key Not Found:

# Check API key is set
./scripts/check-api-key.sh

File Too Large:

Use File API for files > 20MB
Add --use-file-api flag to the script

Vision Not Working:

Ensure file is PDF format
Other formats (TXT, HTML) don't support vision processing

Support

API Documentation: https://ai.google.dev/gemini-api/docs/document-processing
Get API Key: https://aistudio.google.com/apikey
Model Info: https://ai.google.dev/gemini-api/docs/models/gemini

Related Skills

einverne/react-component-generator

development

VerifiedTrustedCommunity

生成符合项目规范的 React 组件。当用户要求创建组件、新建 React 组件或生成组件文件时使用

117SKILL.mdUpdated Apr 4, 2026

einverne/react-component-generator

einverne/git-commit-formatter

development

VerifiedTrustedCommunity

生成符合 Conventional Commits 规范的 Git 提交信息。当用户要求生成提交、创建 commit 或写提交信息时使用

117SKILL.mdUpdated Apr 4, 2026

einverne/git-commit-formatter

einverne/deploy-staging

devops

VerifiedTrustedCommunity

将当前分支部署到测试环境。当用户要求部署、发布到测试或在 staging 环境测试时使用

117SKILL.mdUpdated Apr 4, 2026

einverne/deploy-staging

einverne/code-reviewer

development

VerifiedTrustedCommunity

进行系统化的代码审查，检查代码质量、安全性和性能。当用户要求审查代码、review 或检查代码时使用

117SKILL.mdUpdated Apr 4, 2026

einverne/code-reviewer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/einverne/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/claude/skills/gemini-document-processing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

einverne/dotfiles

117 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT