skills/pdf/SKILL.md
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. Use when filling PDF forms or programmatically processing, generating, or analyzing PDF documents.
npx skillsauth add nano-step/skill-manager pdfInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Activate when the user asks to:
from pypdf import PdfReader, PdfWriter
# Read a PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")
# Extract text
text = ""
for page in reader.pages:
text += page.extract_text()
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
with open("merged.pdf", "wb") as output:
writer.write(output)
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as output:
writer.write(output)
reader = PdfReader("input.pdf")
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90) # Rotate 90 degrees clockwise
writer.add_page(page)
with open("rotated.pdf", "wb") as output:
writer.write(output)
reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Author: {meta.author}")
print(f"Title: {meta.title}")
print(f"Subject: {meta.subject}")
print(f"Creator: {meta.creator}")
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
print(text)
with pdfplumber.open("document.pdf") as pdf:
for i, page in enumerate(pdf.pages):
tables = page.extract_tables()
for j, table in enumerate(tables):
print(f"Table {j+1} on page {i+1}:")
for row in table:
print(row)
import pdfplumber
import pandas as pd
with pdfplumber.open("document.pdf") as pdf:
page = pdf.pages[0]
table = page.extract_table()
df = pd.DataFrame(table[1:], columns=table[0])
print(df)
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter
c.drawString(100, height - 100, "Hello World!")
c.line(100, height - 140, 400, height - 140)
c.save()
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []
story.append(Paragraph("Report Title", styles['Title']))
story.append(Spacer(1, 12))
story.append(Paragraph("This is the body text.", styles['Normal']))
doc.build(story)
# Extract text
pdftotext input.pdf output.txt
# Preserve layout
pdftotext -layout input.pdf output.txt
# Specific pages
pdftotext -f 1 -l 5 input.pdf output.txt
# Merge PDFs
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
# Rotate pages
qpdf input.pdf output.pdf --rotate=+90:1
# Remove password
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
# Linearize (optimize for web)
qpdf --linearize input.pdf output.pdf
import pytesseract
from pdf2image import convert_from_path
images = convert_from_path('scanned.pdf')
text = ""
for i, image in enumerate(images):
text += f"Page {i+1}:\n"
text += pytesseract.image_to_string(image)
text += "\n\n"
from pypdf import PdfReader, PdfWriter
watermark = PdfReader("watermark.pdf").pages[0]
reader = PdfReader("document.pdf")
writer = PdfWriter()
for page in reader.pages:
page.merge_page(watermark)
writer.add_page(page)
with open("watermarked.pdf", "wb") as output:
writer.write(output)
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
writer.encrypt("userpassword", "ownerpassword")
with open("encrypted.pdf", "wb") as output:
writer.write(output)
from pypdf import PdfReader, PdfWriter
reader = PdfReader("form.pdf")
writer = PdfWriter()
writer.append(reader)
# Get form field names
fields = reader.get_fields()
for name, field in fields.items():
print(f"Field: {name}, Type: {field.get('/FT')}")
# Fill fields
writer.update_page_form_field_values(
writer.pages[0],
{"field_name": "value", "another_field": "another_value"}
)
with open("filled_form.pdf", "wb") as output:
writer.write(output)
from pdf2image import convert_from_path
# Convert all pages
images = convert_from_path('document.pdf', dpi=300)
for i, image in enumerate(images):
image.save(f'page_{i+1}.png', 'PNG')
# Convert specific pages
images = convert_from_path('document.pdf', first_page=1, last_page=3)
# Core libraries
pip install pypdf pdfplumber reportlab
# OCR support
pip install pytesseract pdf2image
# Also needs: apt-get install tesseract-ocr poppler-utils
# CLI tools
apt-get install poppler-utils qpdf
# All at once
pip install pypdf pdfplumber reportlab pytesseract pdf2image
| Task | Best Tool | Command/Code |
|------|-----------|--------------|
| Read/extract text | pdfplumber | page.extract_text() |
| Extract tables | pdfplumber | page.extract_tables() |
| Merge PDFs | pypdf | writer.add_page(page) |
| Split PDFs | pypdf | One page per PdfWriter |
| Rotate pages | pypdf | page.rotate(90) |
| Create PDFs | reportlab | Canvas or Platypus |
| Fill forms | pypdf | update_page_form_field_values() |
| Add watermark | pypdf | page.merge_page(watermark) |
| Password protect | pypdf | writer.encrypt() |
| OCR scanned PDFs | pytesseract + pdf2image | Convert to image first |
| CLI text extract | poppler-utils | pdftotext input.pdf |
| CLI merge/split | qpdf | qpdf --empty --pages ... |
| PDF to images | pdf2image | convert_from_path() |
| Extract metadata | pypdf | reader.metadata |
tools
Humanization layer for LLM conversation — makes the model sound and respond like a real, thoughtful, embodied human rather than an assistant or chatbot. Use whenever the reply will be read by a human and warmth, presence, or texture matter more than machine-readability. Triggers on any of: "human", "humans", "humanize", "humanization", "be human", "more human", "feel human", "people", "person", "real person", "real human", "friend", "friendly", "like a friend", "respond like a friend", "buddy", "talk", "talking", "talk to me", "talk like a person", "chat", "chatting", "conversation", "converse", "discuss", "discussion", "communication", "communicate", "listen", "just listen", "sit with me", "vent", "venting", "I just want to vent", "company", "presence", "stop being an AI", "stop sounding like a bot", "less corporate", "less robotic", "less formal", "warmer", "warm tone", "empathy", "empathetic", "comfort", "support me", "emotional support", "be honest with me", "be real with me", "real talk", "heart-to-heart", "deep conversation", "casual", "casual chat", "small talk", "chitchat", "say something", "tell me something", and on any emotional / relational / personal-decision / interpersonal context — grief, joy, anger, fear, shame, doubt, loneliness, dating, breakup, conflict, family, parents, sibling, friendship, marriage, divorce, in-laws, kids, parenting, work stress, burnout, career decision, quitting, firing, layoff, anxiety, depression, panic, sleep, dreams, identity, faith, doubt, meaning, mortality, celebration, milestone, achievement, gratitude, apology, forgiveness. Also loads when the user writes in non-English (any language) with emotional weight, when the user's message is shorter than 8 words and affect-laden, when the user types in lowercase fragments, when the user types in ALL CAPS with excitement, or when the user explicitly asks for a friend / mentor / older-sibling / wise-listener voice. Do NOT use for code generation, tool calls, structured data output, SQL, API contracts, or any task where machine-readability matters more than human warmth.
tools
Use this skill whenever the user mentions open-design, od_generate_design, OD daemon, BYOK design generation, generating HTML mockups from a PRD, creating or managing Open Design projects, saving design artifacts, linting generated HTML, or any of the 10 `od_*` MCP tools (od_list_projects, od_get_project, od_create_project, od_update_project, od_delete_project, od_save_artifact, od_save_project_file, od_lint_artifact, od_compose_brief, od_generate_design). Also trigger on phrases like "generate a design", "create a mockup", "make a landing page", "list my OD projects", "the design daemon", "the streaming design tool", and on any 401/404/422 error coming from an `od_*` tool call. Covers env-var setup (`OD_DAEMON_URL`, auth modes, BYOK), the full PRD → generate → save → lint workflow, error diagnosis, and the safety rails (lint before save, never commit BYOK keys). Triggers even if the user doesn't explicitly say "open-design-mcp" — keyword matches on `od_*` tool names or "design generation" workflows are enough.
tools
Use this skill whenever a user wants the **full Open Design experience** — discovery questions asked first, brand-spec extraction from URLs/files, TodoWrite planning with live updates, 5-dimensional self-critique, polished artifact at the end. Trigger phrases include "design with questions first", "OD-style workflow", "full interactive design brief", "make me a complete landing page" (when the user wants quality over speed), "design my pitch deck", "brand-aware multi-page site", "follow the Open Design playbook", or any request where the user is starting a new design project rather than tweaking an existing artifact. Also trigger on any request that mentions wanting brand consistency across multiple pages or that provides a brand URL/spec. Pair with the `open-design-mcp` tool-reference skill — both loaded together give an LLM the full picture (this skill = workflow choreography; that skill = tool catalog + errors). This skill explicitly does NOT trigger for one-off tweaks ("make the nav stickier", "swap slide 3 image") — use od_generate_design directly for those.
development
Sync a locally-developed OpenCode skill to the skill-manager npm package and (if private) the private-skills GitHub repo. Handles per-skill version bumps, public/private classification, build verification, and conventional-commit-style git push. Auto-publish to npm is handled downstream by nano-step/shared-workflows@v1 when the push to master lands. Use this skill whenever the user says 'sync skill', 'publish skill', 'push skill to manager', '/sync-skill-to-manager <name>', or asks to release/distribute a skill they just edited.