cli-tool/components/skills/document-processing/pdf-processing-pro/SKILL.md
Production-ready PDF processing with forms, tables, OCR, validation, and batch operations. Use when working with complex PDF workflows in production environments, processing large volumes of PDFs, or requiring robust error handling and validation.
npx skillsauth add davila7/claude-code-templates PDF Processing ProInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows.
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
python scripts/analyze_form.py input.pdf --output fields.json
# Returns: JSON with all form fields, types, and positions
python scripts/fill_form.py input.pdf data.json output.pdf
# Validates all fields before filling, includes error reporting
python scripts/extract_tables.py report.pdf --output tables.csv
# Extracts all tables with automatic column detection
All scripts include:
--help flag for all scriptsFor complete form workflows including:
See FORMS.md
For complex table extraction:
See TABLES.md
For scanned PDFs and image-based documents:
See OCR.md
analyze_form.py - Extract form field information
python scripts/analyze_form.py input.pdf [--output fields.json] [--verbose]
fill_form.py - Fill PDF forms with data
python scripts/fill_form.py input.pdf data.json output.pdf [--validate]
validate_form.py - Validate form data before filling
python scripts/validate_form.py data.json schema.json
extract_tables.py - Extract tables to CSV/Excel
python scripts/extract_tables.py input.pdf [--output tables.csv] [--format csv|excel]
extract_text.py - Extract text with formatting preservation
python scripts/extract_text.py input.pdf [--output text.txt] [--preserve-formatting]
merge_pdfs.py - Merge multiple PDFs
python scripts/merge_pdfs.py file1.pdf file2.pdf file3.pdf --output merged.pdf
split_pdf.py - Split PDF into individual pages
python scripts/split_pdf.py input.pdf --output-dir pages/
validate_pdf.py - Validate PDF integrity
python scripts/validate_pdf.py input.pdf
# 1. Analyze form structure
python scripts/analyze_form.py template.pdf --output schema.json
# 2. Validate submission data
python scripts/validate_form.py submission.json schema.json
# 3. Fill form
python scripts/fill_form.py template.pdf submission.json completed.pdf
# 4. Validate output
python scripts/validate_pdf.py completed.pdf
# 1. Extract tables
python scripts/extract_tables.py monthly_report.pdf --output data.csv
# 2. Extract text for analysis
python scripts/extract_text.py monthly_report.pdf --output report.txt
import glob
from pathlib import Path
import subprocess
# Process all PDFs in directory
for pdf_file in glob.glob("invoices/*.pdf"):
output_file = Path("processed") / Path(pdf_file).name
result = subprocess.run([
"python", "scripts/extract_text.py",
pdf_file,
"--output", str(output_file)
], capture_output=True)
if result.returncode == 0:
print(f"✓ Processed: {pdf_file}")
else:
print(f"✗ Failed: {pdf_file} - {result.stderr}")
All scripts follow consistent error patterns:
# Exit codes
# 0 - Success
# 1 - File not found
# 2 - Invalid input
# 3 - Processing error
# 4 - Validation error
# Example usage in automation
result = subprocess.run(["python", "scripts/fill_form.py", ...])
if result.returncode == 0:
print("Success")
elif result.returncode == 4:
print("Validation failed - check input data")
else:
print(f"Error occurred: {result.returncode}")
All scripts require:
pip install pdfplumber pypdf pillow pytesseract pandas
Optional for OCR:
# Install tesseract-ocr system package
# macOS: brew install tesseract
# Ubuntu: apt-get install tesseract-ocr
# Windows: Download from GitHub releases
--parallel flag (where supported)"Module not found" errors:
pip install -r requirements.txt
Tesseract not found:
# Install tesseract system package (see Dependencies)
Memory errors with large PDFs:
# Process page by page instead of loading entire PDF
with pdfplumber.open("large.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
# Process page immediately
Permission errors:
chmod +x scripts/*.py
All scripts support --help:
python scripts/analyze_form.py --help
python scripts/extract_tables.py --help
For detailed documentation on specific topics, see:
tools
No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity - these platforms have their own patterns, pitfalls, and breaking points. This skill covers when to use which platform, how to build reliable automations, and when to graduate to code-based solutions. Key insight: Zapier optimizes for simplicity and integrations (7000+ apps), Make optimizes for power
tools
Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).
tools
Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it, workflows resume exactly where they left off. This skill covers the platforms (n8n, Temporal, Inngest) and patterns (sequential, parallel, orchestrator-worker) that turn brittle scripts into production-grade automation. Key insight: The platforms make different tradeoffs. n8n optimizes for accessibility
development
Trigger.dev expert for background jobs, AI workflows, and reliable async execution with excellent developer experience and TypeScript-first design. Use when: trigger.dev, trigger dev, background task, ai background job, long running task.