plugins/documents/skills/pdf-processing-pro/SKILL.md
Production-ready PDF processing with forms, tables, OCR, validation, and batch operations. Use when working with complex PDF workflows in production environments, processing large volumes of PDFs, or requiring robust error handling and validation. Do NOT use for simple text extraction - use pdf-extract for quick reads.
npx skillsauth add henkisdabro/wookstar-claude-plugins pdf-processing-proInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows.
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
python scripts/analyze_form.py input.pdf --output fields.json
# Returns: JSON with all form fields, types, and positions
python scripts/fill_form.py input.pdf data.json output.pdf
# Validates all fields before filling, includes error reporting
python scripts/extract_tables.py report.pdf --output tables.csv
# Extracts all tables with automatic column detection
--help on all scripts)Complete form workflows including field analysis, dynamic filling, validation rules, multi-page forms, and checkbox/radio handling. See references/forms.md.
Complex table extraction including multi-page tables, merged cells, nested tables, custom detection, and CSV/Excel export. See references/tables.md.
Scanned PDFs and image-based documents including Tesseract integration, language support, image preprocessing, and confidence scoring. See references/ocr.md.
| Script | Purpose | Usage |
|--------|---------|-------|
| analyze_form.py | Extract form field info | python scripts/analyze_form.py input.pdf [--output fields.json] [--verbose] |
| fill_form.py | Fill PDF forms with data | python scripts/fill_form.py input.pdf data.json output.pdf [--validate] |
| validate_form.py | Validate form data before filling | python scripts/validate_form.py data.json schema.json |
| extract_tables.py | Extract tables to CSV/Excel | python scripts/extract_tables.py input.pdf [--output tables.csv] [--format csv\|excel] |
| extract_text.py | Extract text with formatting | python scripts/extract_text.py input.pdf [--output text.txt] [--preserve-formatting] |
| merge_pdfs.py | Merge multiple PDFs | python scripts/merge_pdfs.py file1.pdf file2.pdf --output merged.pdf |
| split_pdf.py | Split PDF into pages | python scripts/split_pdf.py input.pdf --output-dir pages/ |
| validate_pdf.py | Validate PDF integrity | python scripts/validate_pdf.py input.pdf |
All scripts require:
pip install pdfplumber pypdf pillow pytesseract pandas
Optional for OCR:
# macOS: brew install tesseract
# Ubuntu: apt-get install tesseract-ocr
# Windows: Download from GitHub releases
| File | Contents | |------|----------| | references/forms.md | Complete form processing guide | | references/tables.md | Advanced table extraction | | references/ocr.md | Scanned PDF processing | | references/workflows.md | Common workflows, error handling, performance tips, best practices | | references/troubleshooting.md | Troubleshooting common issues and getting help |
testing
Identifies and removes AI writing patterns to make text sound natural and human-written. Use when user says "humanise this", "make this sound less AI", "this reads like a robot wrote it", "de-AI this text", "remove AI patterns", "make this more natural", "clean up this AI-generated text". Detects and fixes 29 patterns based on Wikipedia's "Signs of AI writing" guide - inflated language, promotional tone, AI vocabulary, em dash overuse, filler phrases, sycophantic tone, placeholder text, formulaic structure, thematic breaks. Do NOT use for grammar-only proofreading, spell checking, or rewriting text that is already clearly human-written.
tools
Fast, zero-AI text extraction from PDFs that have a text layer (digitally created PDFs from Word, Typst, WeasyPrint, wkhtmltopdf, LaTeX, etc). Uses pymupdf (fitz) - instant and deterministic. Use when you need to quickly pull raw text from a known text-layer PDF, e.g. "extract text from this PDF", "read this PDF", "get the content of", "what does this PDF say", "quickly read this PDF". Do NOT use for scanned/image PDFs or when you need structured output (tables, headings, OCR, AI analysis) - use the pdf-processing-pro skill in this plugin for those cases.
tools
Get current time in any timezone and convert times between timezones. Use when working with time, dates, timezones, scheduling across regions, "what time is it in X", "convert 3pm Sydney to London", DST checks, or when the user mentions specific cities/regions for time queries. Supports IANA timezone names. Do NOT use for date arithmetic (adding days/months), recurring event scheduling, business-day calculations, or full calendar/booking logic - those need a dedicated date library or scheduling tool.
tools
Complete Shopify development reference for Liquid templating, theme development (OS 2.0), GraphQL Admin API, Storefront API, custom app development, Shopify Functions, Hydrogen, performance optimisation, and debugging. Use when working with .liquid files, creating theme sections and blocks, writing GraphQL queries or mutations for Shopify, building Shopify apps with CLI and Polaris, implementing cart operations via Ajax API, optimising Core Web Vitals for Shopify stores, debugging Liquid or API errors, configuring settings_schema.json, accessing Shopify objects (product, collection, cart, customer), using Liquid filters, creating app extensions, working with webhooks, migrating from Scripts to Functions, or building headless storefronts with Hydrogen and React Router 7. Covers API version 2026-01. Do NOT use for WooCommerce, Magento, BigCommerce, or other non-Shopify e-commerce platforms.