Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

teachingai/ocrmypdf-batch

Name: ocrmypdf-batch
Author: teachingai

skills/ocrmypdf-skills/ocrmypdf-batch/SKILL.md

npx skillsauth add teachingai/agent-skills ocrmypdf-batch

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

OCRmyPDF — Batch Processing Guide

Overview

OCRmyPDF supports batch processing through shell scripting, Docker, and CI/CD integration for automated OCR pipelines.

For core OCR functionality, see the ocrmypdf skill. For image processing, see ocrmypdf-image. For optimization, see ocrmypdf-optimize.

Shell Loop

Basic batch

# Process all PDFs in directory
for f in *.pdf; do
    ocrmypdf "$f" "output/$f"
done

Parallel processing

# Use GNU parallel for faster processing
parallel ocrmypdf {} output/{/} ::: *.pdf

# Limit to 4 concurrent jobs
parallel -j 4 ocrmypdf {} output/{/} ::: *.pdf

Recursive batch

# Process all PDFs in directory tree
find . -name "*.pdf" -exec ocrmypdf {} output/{/} \;

Docker

Official image

# Pull image
docker pull jbarlow83/ocrmypdf

# Basic usage
docker run --rm \
    -v $(pwd):/data \
    jbarlow83/ocrmypdf \
    input.pdf output.pdf

Batch with Docker

# Process all PDFs
docker run --rm \
    -v $(pwd):/data \
    jbar65t83/ocrmypdf \
    ocrmypdf /data/input/*.pdf /data/output/

Docker Compose

version: '3'
services:
  ocrmypdf:
    image: jbarlow83/ocrmypdf
    volumes:
      - ./input:/data/input
      - ./output:/data/output
    command: sh -c "for f in /data/input/*.pdf; do ocrmypdf \"$f\" \"/data/output/$(basename $f)\"; done"

GitHub Actions

name: OCR PDFs
on: [push]
jobs:
  ocr:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run OCR
        run: |
          docker run --rm \
            -v ${{ github.workspace }}:/data \
            jbarlow83/ocrmypdf \
            sh -c "for f in /data/*.pdf; do ocrmypdf \"$f\" \"/data/output/$(basename $f)\"; done"

CI/CD Examples

GitLab CI

ocr:
  image: jbarlow83/ocrmypdf
  script:
    - mkdir -p output
    - for f in *.pdf; do ocrmypdf "$f" "output/$f"; done
  artifacts:
    paths:
      - output/

Shell script template

#!/bin/bash
INPUT_DIR="input"
OUTPUT_DIR="output"
LANG="eng+chi_sim"

mkdir -p "$OUTPUT_DIR"

for pdf in "$INPUT_DIR"/*.pdf; do
    filename=$(basename "$pdf")
    echo "Processing: $filename"
    ocrmypdf -l "$LANG" --deskew --remove-bordering "$pdf" "$OUTPUT_DIR/$filename"
    echo "Done: $filename"
done

echo "Batch OCR complete!"

Error Handling

# Continue on error, log failures
for f in *.pdf; do
    if ! ocrmypdf "$f" "output/$f" 2>&1; then
        echo "FAILED: $f" >> failed.log
    fi
done

Performance Tips

Use --jobs N for multi-core processing
Use --output-type pdf (not pdfa) for faster processing when archival not needed
Pre-process images with --deskew and --clean to reduce file size
Use Docker layer caching in CI/CD for faster rebuilds

Quick Reference

| Task | Command | |------|---------| | Sequential batch | for f in *.pdf; do ocrmypdf "$f" out/"$f"; done | | Parallel batch | parallel ocrmypdf {} out/{/} ::: *.pdf | | Docker basic | docker run -v $(pwd):/data jbarlow83/ocrmypdf in.pdf out.pdf | | Recursive | find . -name "*.pdf" -exec ocrmypdf {} out/{/} \; |

Troubleshooting

Permission denied: Ensure output directory is writable.
Memory issues: Process in smaller batches or use --jobs 1.
Docker path issues: Use absolute paths with -v.

teachingai/ocrmypdf-batch

skills/ocrmypdf-skills/ocrmypdf-batch/SKILL.md

OCRmyPDF batch processing skill — process multiple PDFs, Docker automation, shell scripting, and CI/CD integration. Use when the user needs to OCR many PDFs, set up automated OCR pipelines, or integrate OCR into workflows.

284 stars

tools

Updated Apr 14, 2026

$ install --global

skillsauth

npx skillsauth add teachingai/agent-skills ocrmypdf-batch

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 10, 2026, 8:50 AM153.1s1 file scanned

SKILL.md

name:: ocrmypdf-batch
description:: OCRmyPDF batch processing skill — process multiple PDFs, Docker automation, shell scripting, and CI/CD integration. Use when the user needs to OCR many PDFs, set up automated OCR pipelines, or integrate OCR into workflows.

OCRmyPDF — Batch Processing Guide

Overview

OCRmyPDF supports batch processing through shell scripting, Docker, and CI/CD integration for automated OCR pipelines.

For core OCR functionality, see the ocrmypdf skill. For image processing, see ocrmypdf-image. For optimization, see ocrmypdf-optimize.

Shell Loop

Basic batch

# Process all PDFs in directory
for f in *.pdf; do
    ocrmypdf "$f" "output/$f"
done

Parallel processing

# Use GNU parallel for faster processing
parallel ocrmypdf {} output/{/} ::: *.pdf

# Limit to 4 concurrent jobs
parallel -j 4 ocrmypdf {} output/{/} ::: *.pdf

Recursive batch

# Process all PDFs in directory tree
find . -name "*.pdf" -exec ocrmypdf {} output/{/} \;

Docker

Official image

# Pull image
docker pull jbarlow83/ocrmypdf

# Basic usage
docker run --rm \
    -v $(pwd):/data \
    jbarlow83/ocrmypdf \
    input.pdf output.pdf

Batch with Docker

# Process all PDFs
docker run --rm \
    -v $(pwd):/data \
    jbar65t83/ocrmypdf \
    ocrmypdf /data/input/*.pdf /data/output/

Docker Compose

version: '3'
services:
  ocrmypdf:
    image: jbarlow83/ocrmypdf
    volumes:
      - ./input:/data/input
      - ./output:/data/output
    command: sh -c "for f in /data/input/*.pdf; do ocrmypdf \"$f\" \"/data/output/$(basename $f)\"; done"

GitHub Actions

name: OCR PDFs
on: [push]
jobs:
  ocr:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run OCR
        run: |
          docker run --rm \
            -v ${{ github.workspace }}:/data \
            jbarlow83/ocrmypdf \
            sh -c "for f in /data/*.pdf; do ocrmypdf \"$f\" \"/data/output/$(basename $f)\"; done"

CI/CD Examples

GitLab CI

ocr:
  image: jbarlow83/ocrmypdf
  script:
    - mkdir -p output
    - for f in *.pdf; do ocrmypdf "$f" "output/$f"; done
  artifacts:
    paths:
      - output/

Shell script template

#!/bin/bash
INPUT_DIR="input"
OUTPUT_DIR="output"
LANG="eng+chi_sim"

mkdir -p "$OUTPUT_DIR"

for pdf in "$INPUT_DIR"/*.pdf; do
    filename=$(basename "$pdf")
    echo "Processing: $filename"
    ocrmypdf -l "$LANG" --deskew --remove-bordering "$pdf" "$OUTPUT_DIR/$filename"
    echo "Done: $filename"
done

echo "Batch OCR complete!"

Error Handling

# Continue on error, log failures
for f in *.pdf; do
    if ! ocrmypdf "$f" "output/$f" 2>&1; then
        echo "FAILED: $f" >> failed.log
    fi
done

Performance Tips

Use --jobs N for multi-core processing
Use --output-type pdf (not pdfa) for faster processing when archival not needed
Pre-process images with --deskew and --clean to reduce file size
Use Docker layer caching in CI/CD for faster rebuilds

Quick Reference

Troubleshooting

Permission denied: Ensure output directory is writable.
Memory issues: Process in smaller batches or use --jobs 1.
Docker path issues: Use absolute paths with -v.

Related Skills

teachingai/nextjs

development

VerifiedTrustedCommunity

Guidance for Next.js using the official docs at nextjs.org/docs. Use when the user needs Next.js concepts, configuration, routing, data fetching, or API reference details.

284SKILL.mdUpdated Apr 14, 2026

teachingai/flask

tools

VerifiedTrustedCommunity

Provides comprehensive guidance for Flask framework including routing, templates, forms, database integration, extensions, and deployment. Use when the user asks about Flask, needs to create web applications, implement routes, or build Python web services.

284SKILL.mdUpdated Apr 14, 2026

teachingai/fastapi

development

VerifiedTrustedCommunity

Provides comprehensive guidance for FastAPI framework including routing, request validation, dependency injection, async operations, OpenAPI documentation, and database integration. Use when the user asks about FastAPI, needs to create REST APIs, or build high-performance Python web services.

284SKILL.mdUpdated Apr 14, 2026

teachingai/django

development

VerifiedTrustedCommunity

Provides comprehensive guidance for Django framework including models, views, templates, forms, admin, REST framework, and deployment. Use when the user asks about Django, needs to create web applications, implement models and views, or build Django REST APIs.

284SKILL.mdUpdated Apr 14, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/teachingai/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/ocrmypdf-skills/ocrmypdf-batch ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

teachingai/agent-skills

284 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT