plugins/documents/skills/pdf-extract/SKILL.md
Fast, zero-AI text extraction from PDFs that have a text layer (digitally created PDFs from Word, Typst, WeasyPrint, wkhtmltopdf, LaTeX, etc). Uses pymupdf (fitz) - instant and deterministic. Use when you need to quickly pull raw text from a known text-layer PDF, e.g. "extract text from this PDF", "read this PDF", "get the content of", "what does this PDF say", "quickly read this PDF". Do NOT use for scanned/image PDFs or when you need structured output (tables, headings, OCR, AI analysis) - use the pdf-processing-pro skill in this plugin for those cases.
npx skillsauth add henkisdabro/wookstar-claude-plugins pdf-extractInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extract text from PDF files using pymupdf via uv run --with pymupdf.
uv installed - see https://docs.astral.sh/uv/getting-started/installation/uv run --with handles caching automaticallyuv run --with pymupdf python3 -c "
import fitz
doc = fitz.open('/path/to/file.pdf')
for page in doc:
text = page.get_text().strip()
if text:
print(text)
print()
"
uv run --with pymupdf python3 -c "
import fitz
doc = fitz.open('/path/to/file.pdf')
pages = []
for page in doc:
text = page.get_text().strip()
if text:
pages.append(text)
with open('/path/to/output.txt', 'w') as f:
f.write('\n\n'.join(pages))
print(f'Extracted {len(pages)} pages')
"
uv run --with pymupdf python3 -c "
import fitz
doc = fitz.open('/path/to/file.pdf')
# Pages are 0-indexed
for i in range(2, 5): # Pages 3-5
text = doc[i].get_text().strip()
if text:
print(text)
"
uv run --with pymupdf python3 -c "
import fitz
import glob
import os
for pdf_path in glob.glob('/path/to/folder/*.pdf'):
doc = fitz.open(pdf_path)
text = '\n\n'.join(p.get_text().strip() for p in doc if p.get_text().strip())
out_path = pdf_path.rsplit('.', 1)[0] + '.txt'
with open(out_path, 'w') as f:
f.write(text)
print(f'{os.path.basename(pdf_path)}: {len(doc)} pages extracted')
"
uv run --with pymupdf python3 -c "
import fitz
doc = fitz.open('/path/to/file.pdf')
meta = doc.metadata
print(f'Title: {meta.get(\"title\", \"N/A\")}')
print(f'Author: {meta.get(\"author\", \"N/A\")}')
print(f'Pages: {len(doc)}')
print(f'Creator: {meta.get(\"creator\", \"N/A\")}')
"
fitz (legacy naming from the MuPDF library)doc[0] is the first pageget_text() returns plain text; use get_text("blocks") for positioned blocksget_text("html") returns HTML with formatting preserveduv run --with pymupdf invocation - subsequent runs are instant| Use pdf-extract | Use pdf-processing-pro | |---|---| | PDF created digitally (Word, Typst, LaTeX, wkhtmltopdf, WeasyPrint) | Scanned or image-based PDF (photo, fax, scan) | | Need raw text quickly - less than a second | Need structured output: tables, headings, forms | | Bulk/batch extraction without AI cost | OCR required (scanned documents) | | Offline, no API key, no extra dependencies | Form filling, validation, batch workflows | | Simple text content, no tables needed | Tables or structured layout are important |
testing
Identifies and removes AI writing patterns to make text sound natural and human-written. Use when user says "humanise this", "make this sound less AI", "this reads like a robot wrote it", "de-AI this text", "remove AI patterns", "make this more natural", "clean up this AI-generated text". Detects and fixes 29 patterns based on Wikipedia's "Signs of AI writing" guide - inflated language, promotional tone, AI vocabulary, em dash overuse, filler phrases, sycophantic tone, placeholder text, formulaic structure, thematic breaks. Do NOT use for grammar-only proofreading, spell checking, or rewriting text that is already clearly human-written.
tools
Get current time in any timezone and convert times between timezones. Use when working with time, dates, timezones, scheduling across regions, "what time is it in X", "convert 3pm Sydney to London", DST checks, or when the user mentions specific cities/regions for time queries. Supports IANA timezone names. Do NOT use for date arithmetic (adding days/months), recurring event scheduling, business-day calculations, or full calendar/booking logic - those need a dedicated date library or scheduling tool.
tools
Complete Shopify development reference for Liquid templating, theme development (OS 2.0), GraphQL Admin API, Storefront API, custom app development, Shopify Functions, Hydrogen, performance optimisation, and debugging. Use when working with .liquid files, creating theme sections and blocks, writing GraphQL queries or mutations for Shopify, building Shopify apps with CLI and Polaris, implementing cart operations via Ajax API, optimising Core Web Vitals for Shopify stores, debugging Liquid or API errors, configuring settings_schema.json, accessing Shopify objects (product, collection, cart, customer), using Liquid filters, creating app extensions, working with webhooks, migrating from Scripts to Functions, or building headless storefronts with Hydrogen and React Router 7. Covers API version 2026-01. Do NOT use for WooCommerce, Magento, BigCommerce, or other non-Shopify e-commerce platforms.
tools
Comprehensive React and Next.js performance optimisation guide with 40+ rules for eliminating waterfalls, optimising bundles, and improving rendering. Use when optimising React or Next.js apps, reviewing performance, refactoring components, hunting wasteful re-renders, reducing bundle size, debugging client/server data-fetching, or tightening rendering paths. Do NOT use for non-React frameworks (Vue, Svelte, Solid, Angular), React Native, or general JavaScript performance unrelated to React.