Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

henkisdabro/pdf-extract

Name: pdf-extract
Author: henkisdabro

plugins/documents/skills/pdf-extract/SKILL.md

npx skillsauth add henkisdabro/wookstar-claude-plugins pdf-extract

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

PDF Text Extraction

Extract text from PDF files using pymupdf via uv run --with pymupdf.

Prerequisites

uv installed - see https://docs.astral.sh/uv/getting-started/installation/
No venv or pre-installation needed - uv run --with handles caching automatically

Extract text from a single PDF

uv run --with pymupdf python3 -c "
import fitz
doc = fitz.open('/path/to/file.pdf')
for page in doc:
    text = page.get_text().strip()
    if text:
        print(text)
        print()
"

Extract and save to file

uv run --with pymupdf python3 -c "
import fitz

doc = fitz.open('/path/to/file.pdf')
pages = []
for page in doc:
    text = page.get_text().strip()
    if text:
        pages.append(text)

with open('/path/to/output.txt', 'w') as f:
    f.write('\n\n'.join(pages))

print(f'Extracted {len(pages)} pages')
"

Extract specific pages

uv run --with pymupdf python3 -c "
import fitz

doc = fitz.open('/path/to/file.pdf')
# Pages are 0-indexed
for i in range(2, 5):  # Pages 3-5
    text = doc[i].get_text().strip()
    if text:
        print(text)
"

Batch extract from multiple PDFs

uv run --with pymupdf python3 -c "
import fitz
import glob
import os

for pdf_path in glob.glob('/path/to/folder/*.pdf'):
    doc = fitz.open(pdf_path)
    text = '\n\n'.join(p.get_text().strip() for p in doc if p.get_text().strip())
    out_path = pdf_path.rsplit('.', 1)[0] + '.txt'
    with open(out_path, 'w') as f:
        f.write(text)
    print(f'{os.path.basename(pdf_path)}: {len(doc)} pages extracted')
"

Get PDF metadata

uv run --with pymupdf python3 -c "
import fitz
doc = fitz.open('/path/to/file.pdf')
meta = doc.metadata
print(f'Title: {meta.get(\"title\", \"N/A\")}')
print(f'Author: {meta.get(\"author\", \"N/A\")}')
print(f'Pages: {len(doc)}')
print(f'Creator: {meta.get(\"creator\", \"N/A\")}')
"

Key notes

pymupdf is imported as fitz (legacy naming from the MuPDF library)
Pages are 0-indexed: doc[0] is the first page
get_text() returns plain text; use get_text("blocks") for positioned blocks
get_text("html") returns HTML with formatting preserved
The package caches after the first uv run --with pymupdf invocation - subsequent runs are instant

When to use this vs pdf-processing-pro

| Use pdf-extract | Use pdf-processing-pro | |---|---| | PDF created digitally (Word, Typst, LaTeX, wkhtmltopdf, WeasyPrint) | Scanned or image-based PDF (photo, fax, scan) | | Need raw text quickly - less than a second | Need structured output: tables, headings, forms | | Bulk/batch extraction without AI cost | OCR required (scanned documents) | | Offline, no API key, no extra dependencies | Form filling, validation, batch workflows | | Simple text content, no tables needed | Tables or structured layout are important |

henkisdabro/pdf-extract

plugins/documents/skills/pdf-extract/SKILL.md

Fast, zero-AI text extraction from PDFs that have a text layer (digitally created PDFs from Word, Typst, WeasyPrint, wkhtmltopdf, LaTeX, etc). Uses pymupdf (fitz) - instant and deterministic. Use when you need to quickly pull raw text from a known text-layer PDF, e.g. "extract text from this PDF", "read this PDF", "get the content of", "what does this PDF say", "quickly read this PDF". Do NOT use for scanned/image PDFs or when you need structured output (tables, headings, OCR, AI analysis) - use the pdf-processing-pro skill in this plugin for those cases.

59 stars

tools

Updated May 16, 2026

$ install --global

skillsauth

npx skillsauth add henkisdabro/wookstar-claude-plugins pdf-extract

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 16, 2026, 2:46 AM315.2s1 file scanned

SKILL.md

name:: pdf-extract
description:: Fast, zero-AI text extraction from PDFs that have a text layer (digitally created PDFs from Word, Typst, WeasyPrint, wkhtmltopdf, LaTeX, etc). Uses pymupdf (fitz) - instant and deterministic. Use when you need to quickly pull raw text from a known text-layer PDF, e.g. "extract text from this PDF", "read this PDF", "get the content of", "what does this PDF say", "quickly read this PDF". Do NOT use for scanned/image PDFs or when you need structured output (tables, headings, OCR, AI analysis) - use the pdf-processing-pro skill in this plugin for those cases.
allowed-tools:: Bash, Read, Write

PDF Text Extraction

Extract text from PDF files using pymupdf via uv run --with pymupdf.

Prerequisites

uv installed - see https://docs.astral.sh/uv/getting-started/installation/
No venv or pre-installation needed - uv run --with handles caching automatically

Extract text from a single PDF

uv run --with pymupdf python3 -c "
import fitz
doc = fitz.open('/path/to/file.pdf')
for page in doc:
    text = page.get_text().strip()
    if text:
        print(text)
        print()
"

Extract and save to file

uv run --with pymupdf python3 -c "
import fitz

doc = fitz.open('/path/to/file.pdf')
pages = []
for page in doc:
    text = page.get_text().strip()
    if text:
        pages.append(text)

with open('/path/to/output.txt', 'w') as f:
    f.write('\n\n'.join(pages))

print(f'Extracted {len(pages)} pages')
"

Extract specific pages

uv run --with pymupdf python3 -c "
import fitz

doc = fitz.open('/path/to/file.pdf')
# Pages are 0-indexed
for i in range(2, 5):  # Pages 3-5
    text = doc[i].get_text().strip()
    if text:
        print(text)
"

Batch extract from multiple PDFs

uv run --with pymupdf python3 -c "
import fitz
import glob
import os

for pdf_path in glob.glob('/path/to/folder/*.pdf'):
    doc = fitz.open(pdf_path)
    text = '\n\n'.join(p.get_text().strip() for p in doc if p.get_text().strip())
    out_path = pdf_path.rsplit('.', 1)[0] + '.txt'
    with open(out_path, 'w') as f:
        f.write(text)
    print(f'{os.path.basename(pdf_path)}: {len(doc)} pages extracted')
"

Get PDF metadata

uv run --with pymupdf python3 -c "
import fitz
doc = fitz.open('/path/to/file.pdf')
meta = doc.metadata
print(f'Title: {meta.get(\"title\", \"N/A\")}')
print(f'Author: {meta.get(\"author\", \"N/A\")}')
print(f'Pages: {len(doc)}')
print(f'Creator: {meta.get(\"creator\", \"N/A\")}')
"

Key notes

pymupdf is imported as fitz (legacy naming from the MuPDF library)
Pages are 0-indexed: doc[0] is the first page
get_text() returns plain text; use get_text("blocks") for positioned blocks
get_text("html") returns HTML with formatting preserved
The package caches after the first uv run --with pymupdf invocation - subsequent runs are instant

When to use this vs pdf-processing-pro

Related Skills

henkisdabro/humanise

testing

VerifiedTrustedCommunity

Identifies and removes AI writing patterns to make text sound natural and human-written. Use when user says "humanise this", "make this sound less AI", "this reads like a robot wrote it", "de-AI this text", "remove AI patterns", "make this more natural", "clean up this AI-generated text". Detects and fixes 29 patterns based on Wikipedia's "Signs of AI writing" guide - inflated language, promotional tone, AI vocabulary, em dash overuse, filler phrases, sycophantic tone, placeholder text, formulaic structure, thematic breaks. Do NOT use for grammar-only proofreading, spell checking, or rewriting text that is already clearly human-written.

66SKILL.mdUpdated Apr 20, 2026

henkisdabro/timezone-tools

tools

VerifiedTrustedCommunity

Get current time in any timezone and convert times between timezones. Use when working with time, dates, timezones, scheduling across regions, "what time is it in X", "convert 3pm Sydney to London", DST checks, or when the user mentions specific cities/regions for time queries. Supports IANA timezone names. Do NOT use for date arithmetic (adding days/months), recurring event scheduling, business-day calculations, or full calendar/booking logic - those need a dedicated date library or scheduling tool.

59SKILL.mdUpdated Apr 24, 2026

henkisdabro/timezone-tools

henkisdabro/shopify-developer

tools

VerifiedTrustedCommunity

Complete Shopify development reference for Liquid templating, theme development (OS 2.0), GraphQL Admin API, Storefront API, custom app development, Shopify Functions, Hydrogen, performance optimisation, and debugging. Use when working with .liquid files, creating theme sections and blocks, writing GraphQL queries or mutations for Shopify, building Shopify apps with CLI and Polaris, implementing cart operations via Ajax API, optimising Core Web Vitals for Shopify stores, debugging Liquid or API errors, configuring settings_schema.json, accessing Shopify objects (product, collection, cart, customer), using Liquid filters, creating app extensions, working with webhooks, migrating from Scripts to Functions, or building headless storefronts with Hydrogen and React Router 7. Covers API version 2026-01. Do NOT use for WooCommerce, Magento, BigCommerce, or other non-Shopify e-commerce platforms.

59SKILL.mdUpdated Apr 24, 2026

henkisdabro/shopify-developer

henkisdabro/react-best-practices

tools

VerifiedTrustedCommunity

Comprehensive React and Next.js performance optimisation guide with 40+ rules for eliminating waterfalls, optimising bundles, and improving rendering. Use when optimising React or Next.js apps, reviewing performance, refactoring components, hunting wasteful re-renders, reducing bundle size, debugging client/server data-fetching, or tightening rendering paths. Do NOT use for non-React frameworks (Vue, Svelte, Solid, Angular), React Native, or general JavaScript performance unrelated to React.

59SKILL.mdUpdated Apr 20, 2026

henkisdabro/react-best-practices

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/henkisdabro/wookstar-claude-plugins.git

# Copy into Claude Code skills folder (global)
cp -r wookstar-claude-plugins/plugins/documents/skills/pdf-extract ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

henkisdabro/wookstar-claude-plugins

59 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT