Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

teachingai/ocrmypdf

Name: ocrmypdf
Author: teachingai

skills/ocrmypdf-skills/ocrmypdf/SKILL.md

npx skillsauth add teachingai/agent-skills ocrmypdf

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

OCRmyPDF — Core OCR Guide

Overview

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. It uses Tesseract OCR, supports 100+ languages, produces PDF/A by default, and distributes work across all CPU cores.

For image processing (deskew, rotate, clean), see the ocrmypdf-image skill. For optimization and PDF/A options, see ocrmypdf-optimize. For batch/Docker/scripting, see ocrmypdf-batch. For Python API and plugins, see ocrmypdf-api.

Installation

One-liner installs (recommended)

| OS | Command | |----|---------| | Debian / Ubuntu | apt install ocrmypdf | | Fedora | dnf install ocrmypdf tesseract-osd | | macOS (Homebrew) | brew install ocrmypdf | | macOS (MacPorts) | port install ocrmypdf | | FreeBSD | pkg install py-ocrmypdf | | Snap | snap install ocrmypdf |

pip install (latest version)

# After installing system dependencies (Tesseract, Ghostscript)
pip install ocrmypdf

Verify

ocrmypdf --version
ocrmypdf --help

Requirements

Python 3.11+
Tesseract 4.1.1+ (OCR engine)
Ghostscript 9.54+ or pypdfium2 (PDF rasterization)
Optional: jbig2enc (compression), pngquant (image optimization), unpaper (cleaning)

Quick Start

# Basic OCR — input scanned PDF, output searchable PDF/A
ocrmypdf input.pdf output.pdf

# OCR an image file directly
ocrmypdf --image-dpi 300 scan.png output.pdf

# OCR in place (only overwrites on success)
ocrmypdf myfile.pdf myfile.pdf

Language Support

OCRmyPDF uses Tesseract language packs. Install them for your OS:

# Debian / Ubuntu
apt-cache search tesseract-ocr          # List all language packs
apt install tesseract-ocr-chi-sim       # Chinese Simplified
apt install tesseract-ocr-fra           # French

# macOS (Homebrew)
brew install tesseract-lang             # All languages

# Fedora
dnf search tesseract-langpack
dnf install tesseract-langpack-ita      # Italian

Using languages

# Single language
ocrmypdf -l fra document.pdf output.pdf

# Multiple languages
ocrmypdf -l eng+fra bilingual.pdf output.pdf

# Chinese Simplified + English
ocrmypdf -l chi_sim+eng chinese-doc.pdf output.pdf

Note: Use ISO 639-3 codes for language identifiers.

OCR Modes

Default mode (skip existing text)

# Skip pages that already have text — only OCR pages without text
ocrmypdf input.pdf output.pdf

Force OCR (`--force-ocr` or `-m force`)

# Rasterize and OCR all pages, even those with existing text
ocrmypdf --force-ocr input.pdf output.pdf
# v17+ short form:
ocrmypdf -m force input.pdf output.pdf

Redo OCR (`--redo-ocr` or `-m redo`)

# Replace existing OCR without rasterizing (preserves quality)
ocrmypdf --redo-ocr input.pdf output.pdf
# v17+ short form:
ocrmypdf -m redo input.pdf output.pdf

Skip text (`--skip-text` or `-m skip`)

# Skip pages with any text, only OCR blank/image pages
ocrmypdf --skip-text input.pdf output.pdf
# v17+ short form:
ocrmypdf -m skip input.pdf output.pdf

No OCR (image processing only)

# Apply image processing / PDF/A conversion without OCR
ocrmypdf --ocr-engine none input.pdf output.pdf

Page Selection

# OCR only specific pages
ocrmypdf --pages 1,3,5-10 input.pdf output.pdf

# OCR only the first page, minimal changes elsewhere
ocrmypdf --pages 1 --output-type pdf --optimize 0 input.pdf output.pdf

Output Types

# PDF/A (default) — for archival
ocrmypdf --output-type pdfa input.pdf output.pdf

# Standard PDF
ocrmypdf --output-type pdf input.pdf output.pdf

# Auto (v17+) — speculative PDF/A, falls back to standard PDF
ocrmypdf --output-type auto input.pdf output.pdf

# No output PDF — only produce sidecar text
ocrmypdf --output-type none --sidecar text.txt input.pdf -

Sidecar Text File

# Produce a companion text file with OCR text
ocrmypdf --sidecar output.txt input.pdf output.pdf

Metadata

# Set output PDF metadata
ocrmypdf --title "My Document" --author "Author Name" --subject "Subject" input.pdf output.pdf

Parallel Processing

# Use 4 CPU cores (default: all available)
ocrmypdf --jobs 4 input.pdf output.pdf

# Single-threaded
ocrmypdf --jobs 1 input.pdf output.pdf

Common Recipes

Make a scanned PDF searchable

ocrmypdf scanned.pdf searchable.pdf

Convert image to searchable PDF

ocrmypdf --image-dpi 300 scan.jpg output.pdf

OCR a multilingual document

ocrmypdf -l eng+deu+fra multilingual.pdf output.pdf

Re-OCR with newer Tesseract

ocrmypdf --redo-ocr old-ocr.pdf updated.pdf

Strip all text/OCR from a PDF

ocrmypdf --ocr-engine none --force-ocr input.pdf stripped.pdf

Quick Reference

| Task | Command | |------|---------| | Basic OCR | ocrmypdf input.pdf output.pdf | | Specify language | ocrmypdf -l fra input.pdf output.pdf | | Multiple languages | ocrmypdf -l eng+fra input.pdf output.pdf | | Force re-OCR all pages | ocrmypdf --force-ocr input.pdf output.pdf | | Replace existing OCR | ocrmypdf --redo-ocr input.pdf output.pdf | | Skip pages with text | ocrmypdf --skip-text input.pdf output.pdf | | Specific pages only | ocrmypdf --pages 1,3,5-10 input.pdf output.pdf | | Output standard PDF | ocrmypdf --output-type pdf input.pdf output.pdf | | Extract text sidecar | ocrmypdf --sidecar text.txt input.pdf output.pdf | | Image to PDF | ocrmypdf --image-dpi 300 image.png output.pdf | | In-place OCR | ocrmypdf myfile.pdf myfile.pdf | | Set metadata | ocrmypdf --title "Title" input.pdf output.pdf | | Parallel jobs | ocrmypdf --jobs 4 input.pdf output.pdf |

Troubleshooting

"Tesseract not found": Install Tesseract and ensure it's on PATH.
Poor OCR quality: Check language packs (-l), try --deskew (see ocrmypdf-image), or --oversample 300.
"Input file has text": Use --force-ocr, --redo-ocr, or --skip-text as appropriate.
Large output files: See ocrmypdf-optimize for --optimize levels and JBIG2.
Signed PDFs: Use --invalidate-digital-signatures to override (signatures will be invalidated).

References

OCRmyPDF Documentation
OCRmyPDF GitHub
Tesseract Language Packs
OCRmyPDF Cookbook

teachingai/ocrmypdf

skills/ocrmypdf-skills/ocrmypdf/SKILL.md

OCRmyPDF core skill — add searchable OCR text layer to scanned PDFs, convert images to searchable PDFs, support 100+ languages via Tesseract. Use when the user needs to OCR a PDF, make a scanned PDF searchable, or extract text from scanned documents.

284 stars

documentation

Updated Apr 14, 2026

$ install --global

skillsauth

npx skillsauth add teachingai/agent-skills ocrmypdf

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 25, 2026, 9:32 PM3.0s1 file scanned

SKILL.md

name:: ocrmypdf
description:: OCRmyPDF core skill — add searchable OCR text layer to scanned PDFs, convert images to searchable PDFs, support 100+ languages via Tesseract. Use when the user needs to OCR a PDF, make a scanned PDF searchable, or extract text from scanned documents.

OCRmyPDF — Core OCR Guide

Overview

Installation

One-liner installs (recommended)

pip install (latest version)

# After installing system dependencies (Tesseract, Ghostscript)
pip install ocrmypdf

Verify

ocrmypdf --version
ocrmypdf --help

Requirements

Python 3.11+
Tesseract 4.1.1+ (OCR engine)
Ghostscript 9.54+ or pypdfium2 (PDF rasterization)
Optional: jbig2enc (compression), pngquant (image optimization), unpaper (cleaning)

Quick Start

# Basic OCR — input scanned PDF, output searchable PDF/A
ocrmypdf input.pdf output.pdf

# OCR an image file directly
ocrmypdf --image-dpi 300 scan.png output.pdf

# OCR in place (only overwrites on success)
ocrmypdf myfile.pdf myfile.pdf

Language Support

OCRmyPDF uses Tesseract language packs. Install them for your OS:

# Debian / Ubuntu
apt-cache search tesseract-ocr          # List all language packs
apt install tesseract-ocr-chi-sim       # Chinese Simplified
apt install tesseract-ocr-fra           # French

# macOS (Homebrew)
brew install tesseract-lang             # All languages

# Fedora
dnf search tesseract-langpack
dnf install tesseract-langpack-ita      # Italian

Using languages

# Single language
ocrmypdf -l fra document.pdf output.pdf

# Multiple languages
ocrmypdf -l eng+fra bilingual.pdf output.pdf

# Chinese Simplified + English
ocrmypdf -l chi_sim+eng chinese-doc.pdf output.pdf

Note: Use ISO 639-3 codes for language identifiers.

OCR Modes

Default mode (skip existing text)

# Skip pages that already have text — only OCR pages without text
ocrmypdf input.pdf output.pdf

Force OCR (`--force-ocr` or `-m force`)

# Rasterize and OCR all pages, even those with existing text
ocrmypdf --force-ocr input.pdf output.pdf
# v17+ short form:
ocrmypdf -m force input.pdf output.pdf

Redo OCR (`--redo-ocr` or `-m redo`)

# Replace existing OCR without rasterizing (preserves quality)
ocrmypdf --redo-ocr input.pdf output.pdf
# v17+ short form:
ocrmypdf -m redo input.pdf output.pdf

Skip text (`--skip-text` or `-m skip`)

# Skip pages with any text, only OCR blank/image pages
ocrmypdf --skip-text input.pdf output.pdf
# v17+ short form:
ocrmypdf -m skip input.pdf output.pdf

No OCR (image processing only)

# Apply image processing / PDF/A conversion without OCR
ocrmypdf --ocr-engine none input.pdf output.pdf

Page Selection

# OCR only specific pages
ocrmypdf --pages 1,3,5-10 input.pdf output.pdf

# OCR only the first page, minimal changes elsewhere
ocrmypdf --pages 1 --output-type pdf --optimize 0 input.pdf output.pdf

Output Types

# PDF/A (default) — for archival
ocrmypdf --output-type pdfa input.pdf output.pdf

# Standard PDF
ocrmypdf --output-type pdf input.pdf output.pdf

# Auto (v17+) — speculative PDF/A, falls back to standard PDF
ocrmypdf --output-type auto input.pdf output.pdf

# No output PDF — only produce sidecar text
ocrmypdf --output-type none --sidecar text.txt input.pdf -

Sidecar Text File

# Produce a companion text file with OCR text
ocrmypdf --sidecar output.txt input.pdf output.pdf

Metadata

# Set output PDF metadata
ocrmypdf --title "My Document" --author "Author Name" --subject "Subject" input.pdf output.pdf

Parallel Processing

# Use 4 CPU cores (default: all available)
ocrmypdf --jobs 4 input.pdf output.pdf

# Single-threaded
ocrmypdf --jobs 1 input.pdf output.pdf

Common Recipes

Make a scanned PDF searchable

ocrmypdf scanned.pdf searchable.pdf

Convert image to searchable PDF

ocrmypdf --image-dpi 300 scan.jpg output.pdf

OCR a multilingual document

ocrmypdf -l eng+deu+fra multilingual.pdf output.pdf

Re-OCR with newer Tesseract

ocrmypdf --redo-ocr old-ocr.pdf updated.pdf

Strip all text/OCR from a PDF

ocrmypdf --ocr-engine none --force-ocr input.pdf stripped.pdf

Quick Reference

Troubleshooting

"Tesseract not found": Install Tesseract and ensure it's on PATH.
Poor OCR quality: Check language packs (-l), try --deskew (see ocrmypdf-image), or --oversample 300.
"Input file has text": Use --force-ocr, --redo-ocr, or --skip-text as appropriate.
Large output files: See ocrmypdf-optimize for --optimize levels and JBIG2.
Signed PDFs: Use --invalidate-digital-signatures to override (signatures will be invalidated).

References

OCRmyPDF Documentation
OCRmyPDF GitHub
Tesseract Language Packs
OCRmyPDF Cookbook

Related Skills

teachingai/nextjs

development

VerifiedTrustedCommunity

Guidance for Next.js using the official docs at nextjs.org/docs. Use when the user needs Next.js concepts, configuration, routing, data fetching, or API reference details.

284SKILL.mdUpdated Apr 14, 2026

teachingai/flask

tools

VerifiedTrustedCommunity

Provides comprehensive guidance for Flask framework including routing, templates, forms, database integration, extensions, and deployment. Use when the user asks about Flask, needs to create web applications, implement routes, or build Python web services.

284SKILL.mdUpdated Apr 14, 2026

teachingai/fastapi

development

VerifiedTrustedCommunity

Provides comprehensive guidance for FastAPI framework including routing, request validation, dependency injection, async operations, OpenAPI documentation, and database integration. Use when the user asks about FastAPI, needs to create REST APIs, or build high-performance Python web services.

284SKILL.mdUpdated Apr 14, 2026

teachingai/django

development

VerifiedTrustedCommunity

Provides comprehensive guidance for Django framework including models, views, templates, forms, admin, REST framework, and deployment. Use when the user asks about Django, needs to create web applications, implement models and views, or build Django REST APIs.

284SKILL.mdUpdated Apr 14, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/teachingai/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/ocrmypdf-skills/ocrmypdf ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

teachingai/agent-skills

284 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

teachingai/ocrmypdf

$ install --global

Security Scan Results

SKILL.md

OCRmyPDF — Core OCR Guide

Overview

Installation

One-liner installs (recommended)

pip install (latest version)

Verify

Requirements

Quick Start

Language Support

Using languages

OCR Modes

Default mode (skip existing text)

Force OCR (--force-ocr or -m force)

Redo OCR (--redo-ocr or -m redo)

Skip text (--skip-text or -m skip)

No OCR (image processing only)

Page Selection

Output Types

Sidecar Text File

Metadata

Parallel Processing

Common Recipes

Make a scanned PDF searchable

Convert image to searchable PDF

OCR a multilingual document

Re-OCR with newer Tesseract

Strip all text/OCR from a PDF

Quick Reference

Troubleshooting

References

Related Skills

teachingai/nextjs

teachingai/flask

teachingai/fastapi

teachingai/django

teachingai/ocrmypdf

$ install --global

Security Scan Results

SKILL.md

OCRmyPDF — Core OCR Guide

Overview

Installation

One-liner installs (recommended)

pip install (latest version)

Verify

Requirements

Quick Start

Language Support

Using languages

OCR Modes

Default mode (skip existing text)

Force OCR (--force-ocr or -m force)

Redo OCR (--redo-ocr or -m redo)

Skip text (--skip-text or -m skip)

No OCR (image processing only)

Page Selection

Output Types

Sidecar Text File

Metadata

Parallel Processing

Common Recipes

Make a scanned PDF searchable

Convert image to searchable PDF

OCR a multilingual document

Re-OCR with newer Tesseract

Strip all text/OCR from a PDF

Quick Reference

Troubleshooting

References

Related Skills

teachingai/nextjs

teachingai/flask

teachingai/fastapi

teachingai/django

Force OCR (`--force-ocr` or `-m force`)

Redo OCR (`--redo-ocr` or `-m redo`)

Skip text (`--skip-text` or `-m skip`)

Force OCR (`--force-ocr` or `-m force`)

Redo OCR (`--redo-ocr` or `-m redo`)

Skip text (`--skip-text` or `-m skip`)