claude/skills/gemini-document-processing/SKILL.md
Guide for implementing Google Gemini API document processing - analyze PDFs with native vision to extract text, images, diagrams, charts, and tables. Use when processing documents, extracting structured data, summarizing PDFs, answering questions about document content, or converting documents to structured formats. (project)
npx skillsauth add einverne/dotfiles gemini-document-processingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Process and analyze PDF documents using Google Gemini's native vision capabilities. Extract structured information, summarize content, answer questions, and understand complex documents with text, images, diagrams, charts, and tables.
Use this skill when you need to:
The skill checks for GEMINI_API_KEY in this priority order:
.env file in skill directory (.claude/skills/gemini-document-processing/.env).env file in project rootGet your API key: https://aistudio.google.com/apikey
Option A: Environment Variable (Recommended)
export GEMINI_API_KEY="your-api-key-here"
Option B: Skill Directory
cd .claude/skills/gemini-document-processing
echo "GEMINI_API_KEY=your-api-key-here" > .env
Option C: Project Root
echo "GEMINI_API_KEY=your-api-key-here" > .env
pip install google-genai python-dotenv
# Use the provided script
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file invoice.pdf \
--prompt "Extract invoice details as JSON" \
--format json
# Process and summarize
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file report.pdf \
--prompt "Provide a concise executive summary"
# Q&A on document content
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file contract.pdf \
--prompt "What are the key terms and conditions?"
from google import genai
client = genai.Client()
# Read PDF
with open('document.pdf', 'rb') as f:
pdf_data = f.read()
# Process document
response = client.models.generate_content(
model='gemini-2.5-flash',
contents=[
'Extract key information from this document',
genai.types.Part.from_bytes(
data=pdf_data,
mime_type='application/pdf'
)
]
)
print(response.text)
from google import genai
from pydantic import BaseModel
class InvoiceData(BaseModel):
invoice_number: str
date: str
total: float
vendor: str
client = genai.Client()
response = client.models.generate_content(
model='gemini-2.5-flash',
contents=[
'Extract invoice details',
genai.types.Part.from_bytes(
data=open('invoice.pdf', 'rb').read(),
mime_type='application/pdf'
)
],
config=genai.types.GenerateContentConfig(
response_mime_type='application/json',
response_schema=InvoiceData
)
)
invoice_data = InvoiceData.model_validate_json(response.text)
PDF < 20MB?
├─ Yes → Use inline base64 encoding
└─ No → Use File API
Need structured JSON output?
├─ Yes → Define response_schema with Pydantic
└─ No → Get text response
Multiple queries on same PDF?
├─ Yes → Use File API + Context Caching
└─ No → Inline encoding is sufficient
The skill includes a ready-to-use processing script:
# Basic usage
python scripts/process-document.py --file document.pdf --prompt "Your prompt"
# With JSON output
python scripts/process-document.py --file document.pdf --prompt "Extract data" --format json
# With File API (for large files)
python scripts/process-document.py --file large-document.pdf --prompt "Summarize" --use-file-api
# Multiple prompts
python scripts/process-document.py --file document.pdf --prompt "Question 1" --prompt "Question 2"
For comprehensive documentation, see:
references/gemini-document-processing-report.md - Complete API referencereferences/quick-reference.md - Quick lookup guidereferences/code-examples.md - Additional code patternsAPI Key Not Found:
# Check API key is set
./scripts/check-api-key.sh
File Too Large:
--use-file-api flag to the scriptVision Not Working:
development
生成符合项目规范的 React 组件。当用户要求创建组件、新建 React 组件或生成组件文件时使用
development
生成符合 Conventional Commits 规范的 Git 提交信息。当用户要求生成提交、创建 commit 或写提交信息时使用
devops
将当前分支部署到测试环境。当用户要求部署、发布到测试或在 staging 环境测试时使用
development
进行系统化的代码审查,检查代码质量、安全性和性能。当用户要求审查代码、review 或检查代码时使用