skills/ocrmypdf-skills/ocrmypdf-batch/SKILL.md
OCRmyPDF batch processing skill — process multiple PDFs, Docker automation, shell scripting, and CI/CD integration. Use when the user needs to OCR many PDFs, set up automated OCR pipelines, or integrate OCR into workflows.
npx skillsauth add teachingai/agent-skills ocrmypdf-batchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
OCRmyPDF supports batch processing through shell scripting, Docker, and CI/CD integration for automated OCR pipelines.
For core OCR functionality, see the ocrmypdf skill. For image processing, see ocrmypdf-image. For optimization, see ocrmypdf-optimize.
# Process all PDFs in directory
for f in *.pdf; do
ocrmypdf "$f" "output/$f"
done
# Use GNU parallel for faster processing
parallel ocrmypdf {} output/{/} ::: *.pdf
# Limit to 4 concurrent jobs
parallel -j 4 ocrmypdf {} output/{/} ::: *.pdf
# Process all PDFs in directory tree
find . -name "*.pdf" -exec ocrmypdf {} output/{/} \;
# Pull image
docker pull jbarlow83/ocrmypdf
# Basic usage
docker run --rm \
-v $(pwd):/data \
jbarlow83/ocrmypdf \
input.pdf output.pdf
# Process all PDFs
docker run --rm \
-v $(pwd):/data \
jbar65t83/ocrmypdf \
ocrmypdf /data/input/*.pdf /data/output/
version: '3'
services:
ocrmypdf:
image: jbarlow83/ocrmypdf
volumes:
- ./input:/data/input
- ./output:/data/output
command: sh -c "for f in /data/input/*.pdf; do ocrmypdf \"$f\" \"/data/output/$(basename $f)\"; done"
name: OCR PDFs
on: [push]
jobs:
ocr:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run OCR
run: |
docker run --rm \
-v ${{ github.workspace }}:/data \
jbarlow83/ocrmypdf \
sh -c "for f in /data/*.pdf; do ocrmypdf \"$f\" \"/data/output/$(basename $f)\"; done"
ocr:
image: jbarlow83/ocrmypdf
script:
- mkdir -p output
- for f in *.pdf; do ocrmypdf "$f" "output/$f"; done
artifacts:
paths:
- output/
#!/bin/bash
INPUT_DIR="input"
OUTPUT_DIR="output"
LANG="eng+chi_sim"
mkdir -p "$OUTPUT_DIR"
for pdf in "$INPUT_DIR"/*.pdf; do
filename=$(basename "$pdf")
echo "Processing: $filename"
ocrmypdf -l "$LANG" --deskew --remove-bordering "$pdf" "$OUTPUT_DIR/$filename"
echo "Done: $filename"
done
echo "Batch OCR complete!"
# Continue on error, log failures
for f in *.pdf; do
if ! ocrmypdf "$f" "output/$f" 2>&1; then
echo "FAILED: $f" >> failed.log
fi
done
--jobs N for multi-core processing--output-type pdf (not pdfa) for faster processing when archival not needed--deskew and --clean to reduce file size| Task | Command |
|------|---------|
| Sequential batch | for f in *.pdf; do ocrmypdf "$f" out/"$f"; done |
| Parallel batch | parallel ocrmypdf {} out/{/} ::: *.pdf |
| Docker basic | docker run -v $(pwd):/data jbarlow83/ocrmypdf in.pdf out.pdf |
| Recursive | find . -name "*.pdf" -exec ocrmypdf {} out/{/} \; |
--jobs 1.-v.development
Guidance for Next.js using the official docs at nextjs.org/docs. Use when the user needs Next.js concepts, configuration, routing, data fetching, or API reference details.
tools
Provides comprehensive guidance for Flask framework including routing, templates, forms, database integration, extensions, and deployment. Use when the user asks about Flask, needs to create web applications, implement routes, or build Python web services.
development
Provides comprehensive guidance for FastAPI framework including routing, request validation, dependency injection, async operations, OpenAPI documentation, and database integration. Use when the user asks about FastAPI, needs to create REST APIs, or build high-performance Python web services.
development
Provides comprehensive guidance for Django framework including models, views, templates, forms, admin, REST framework, and deployment. Use when the user asks about Django, needs to create web applications, implement models and views, or build Django REST APIs.