skills/pdf-processing/SKILL.md
Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
npx skillsauth add crumbgrabber/llm_system_template_agents_skills_patterns_tools_prompts pdf-processingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use pdfplumber to extract text from PDFs:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
Extract tables from PDFs with automatic detection:
import pdfplumber
with pdfplumber.open("report.pdf") as pdf:
page = pdf.pages[0]
tables = page.extract_tables()
for table in tables:
for row in table:
print(row)
Process multi-page documents efficiently:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
full_text = ""
for page in pdf.pages:
full_text += page.extract_text() + "\n\n"
print(full_text)
For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.
Combine multiple PDF files:
from pypdf import PdfMerger
merger = PdfMerger()
for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
merger.append(pdf)
merger.write("merged.pdf")
merger.close()
Extract specific pages or ranges:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
# Extract pages 2-5
for page_num in range(1, 5):
writer.add_page(reader.pages[page_num])
with open("output.pdf", "wb") as output:
writer.write(output)
Extract and save text:
import pdfplumber
with pdfplumber.open("input.pdf") as pdf:
text = "\n\n".join(page.extract_text() for page in pdf.pages)
with open("output.txt", "w") as f:
f.write(text)
Extract tables to CSV:
import pdfplumber
import csv
with pdfplumber.open("tables.pdf") as pdf:
tables = pdf.pages[0].extract_tables()
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
for table in tables:
writer.writerows(table)
Handle common PDF issues:
import pdfplumber
try:
with pdfplumber.open("document.pdf") as pdf:
if len(pdf.pages) == 0:
print("PDF has no pages")
else:
text = pdf.pages[0].extract_text()
if text is None or text.strip() == "":
print("Page contains no extractable text (might be scanned)")
else:
print(text)
except Exception as e:
print(f"Error processing PDF: {e}")
tools
Manage and trigger pre-built Zapier workflows and MCP tool orchestration. Use when user mentions workflows, Zaps, automations, daily digest, research, search, lead tracking, expenses, or asks to "run" any process. Also handles Perplexity-based research and Google Sheets data tracking.
development
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
testing
Use when creating new skills, editing existing skills, or verifying skills work before deployment - applies TDD to process documentation by testing with subagents before writing, iterating until bulletproof against rationalization
development
Use when design is complete and you need detailed implementation tasks for engineers with zero codebase context - creates comprehensive implementation plans with exact file paths, complete code examples, and verification steps assuming engineer has minimal domain knowledge