.claude/skills/pdf-processing-pro/SKILL.md
Production-ready PDF processing with forms, tables, OCR, validation, and batch operations. Use when working with complex PDF workflows in production environments, processing large volumes of PDFs, or requiring robust error handling and validation.
npx skillsauth add choidabom/devconfig PDF Processing ProInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows.
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
python scripts/analyze_form.py input.pdf --output fields.json
# Returns: JSON with all form fields, types, and positions
python scripts/fill_form.py input.pdf data.json output.pdf
# Validates all fields before filling, includes error reporting
python scripts/extract_tables.py report.pdf --output tables.csv
# Extracts all tables with automatic column detection
All scripts include:
--help flag for all scriptsFor complete form workflows including:
See FORMS.md
For complex table extraction:
See TABLES.md
For scanned PDFs and image-based documents:
See OCR.md
analyze_form.py - Extract form field information
python scripts/analyze_form.py input.pdf [--output fields.json] [--verbose]
fill_form.py - Fill PDF forms with data
python scripts/fill_form.py input.pdf data.json output.pdf [--validate]
validate_form.py - Validate form data before filling
python scripts/validate_form.py data.json schema.json
extract_tables.py - Extract tables to CSV/Excel
python scripts/extract_tables.py input.pdf [--output tables.csv] [--format csv|excel]
extract_text.py - Extract text with formatting preservation
python scripts/extract_text.py input.pdf [--output text.txt] [--preserve-formatting]
merge_pdfs.py - Merge multiple PDFs
python scripts/merge_pdfs.py file1.pdf file2.pdf file3.pdf --output merged.pdf
split_pdf.py - Split PDF into individual pages
python scripts/split_pdf.py input.pdf --output-dir pages/
validate_pdf.py - Validate PDF integrity
python scripts/validate_pdf.py input.pdf
# 1. Analyze form structure
python scripts/analyze_form.py template.pdf --output schema.json
# 2. Validate submission data
python scripts/validate_form.py submission.json schema.json
# 3. Fill form
python scripts/fill_form.py template.pdf submission.json completed.pdf
# 4. Validate output
python scripts/validate_pdf.py completed.pdf
# 1. Extract tables
python scripts/extract_tables.py monthly_report.pdf --output data.csv
# 2. Extract text for analysis
python scripts/extract_text.py monthly_report.pdf --output report.txt
import glob
from pathlib import Path
import subprocess
# Process all PDFs in directory
for pdf_file in glob.glob("invoices/*.pdf"):
output_file = Path("processed") / Path(pdf_file).name
result = subprocess.run([
"python", "scripts/extract_text.py",
pdf_file,
"--output", str(output_file)
], capture_output=True)
if result.returncode == 0:
print(f"✓ Processed: {pdf_file}")
else:
print(f"✗ Failed: {pdf_file} - {result.stderr}")
All scripts follow consistent error patterns:
# Exit codes
# 0 - Success
# 1 - File not found
# 2 - Invalid input
# 3 - Processing error
# 4 - Validation error
# Example usage in automation
result = subprocess.run(["python", "scripts/fill_form.py", ...])
if result.returncode == 0:
print("Success")
elif result.returncode == 4:
print("Validation failed - check input data")
else:
print(f"Error occurred: {result.returncode}")
All scripts require:
pip install pdfplumber pypdf pillow pytesseract pandas
Optional for OCR:
# Install tesseract-ocr system package
# macOS: brew install tesseract
# Ubuntu: apt-get install tesseract-ocr
# Windows: Download from GitHub releases
--parallel flag (where supported)"Module not found" errors:
pip install -r requirements.txt
Tesseract not found:
# Install tesseract system package (see Dependencies)
Memory errors with large PDFs:
# Process page by page instead of loading entire PDF
with pdfplumber.open("large.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
# Process page immediately
Permission errors:
chmod +x scripts/*.py
All scripts support --help:
python scripts/analyze_form.py --help
python scripts/extract_tables.py --help
For detailed documentation on specific topics, see:
tools
Vault Intelligence System (vis) CLI를 활용한 Obsidian vault 시맨틱 검색, 자동 태깅, MOC 생성, 관련 문서 연결, 주제별 문서 연결, 주제 수집, 태그 통계, 지식 공백 분석, 중복 감지, 학습 리뷰 등 vault 지식 관리 전반을 지원하는 skill. vault 검색, 문서 정리, 태그, MOC, 관련 문서, 주제 수집, 중복 검사, 학습 리뷰, 지식 공백, 클러스터링, 인덱싱, 주제별 문서 연결, 태그 통계 관련 작업 시 자동 적용.
tools
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
development
React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.
development
프롬프트 작성, brainstorming, planning, design, 설계, 기능 개발 시 Prompt Contracts 프레임워크 적용. Goal/Constraints/Format/Failure Conditions 4요소로 명확한 명세 작성. "바이브 코딩" 방지. brainstorming, writing-plans, 설계, 기능 구현 관련 작업 시 자동 적용.