Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

brunoasm/document-ocr

Name: document-ocr
Author: brunoasm

document_ocr/SKILL.md

npx skillsauth add brunoasm/my_claude_skills document-ocr

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Document OCR (docling + vision-language model)

Purpose

Turn scanned PDFs and document images into clean, structured Markdown with figures preserved. The pipeline combines:

docling — detects layout (figures, tables, multi-column reading order) on each page, entirely on CPU.
A vision-language model (VLM) — transcribes each full page to Markdown, preserving reading order, diacritics, ligatures, special characters (♂ ♀ ½ æ, Greek/Latin), equations (LaTeX), and tables (HTML).

docling supplies reliable figure/table crops; the VLM supplies high-quality text. Born-digital PDFs are detected and extracted directly (no server needed).

Adapted from Bruno de Medeiros' OntoMorphoGrapher proof-of-concept, where the olmocr-docling backend processed an 18-PDF historical-entomology corpus (1833–2015, 5 languages) with no catastrophic failures.

When to use this skill

Use when:

A user has scanned PDFs or photographed/image pages needing accurate OCR.
Figures and multi-column reading order must be preserved.
The text has diacritics, special characters, or non-English content.

Do not use when:

The PDF is already born-digital and reads fine (this tool handles it via the fast path, but you may not need it at all).
The goal is extracting structured fields into a database — that is the extract-from-pdfs skill, which can run downstream of this one.

⚠️ Prerequisites — an OCR backend must already exist

This skill does not start or manage any server. Standing one up is out of scope. Before running, the user must have one of:

a running OpenAI-compatible server (vLLM/SGLang/LM Studio/llama.cpp) with a vision model — best with olmOCR-2 (--backend olmocr-docling);
a running Ollama server with a vision OCR model (--backend ollama);
a cloud API key — Anthropic (--backend anthropic-docling) or OpenAI (--backend vlm-docling pointed at the OpenAI base URL).

Always remind the user of this requirement and confirm which backend they have. For example setup commands (illustrative, not maintained), see references/server_setup.md. The born-digital PDF fast path needs no backend.

Setup

conda env create -f environment.yml
conda activate document_ocr

Then make the backend reachable (pick one):

export OCR_HOST=http://YOUR_HOST:30001     # vLLM / Ollama / OpenAI-compatible
# or
export ANTHROPIC_API_KEY=sk-ant-...        # cloud Claude
# or
export OPENAI_API_KEY=sk-...               # cloud OpenAI

Choosing a backend

| --backend | What it uses | Needs | |--------------------|-------------------------------------------|-------| | olmocr-docling ★ | docling figures + olmOCR-2 (vLLM) | --host | | vlm-docling | docling figures + any VLM (OpenAI-compat) | --host (+ --api-key for cloud) | | anthropic-docling| docling figures + Claude (cloud) | --api-key / ANTHROPIC_API_KEY | | ollama | full-page DeepSeek-OCR (Ollama) | --host | | docling | docling layout + per-region OCR | --host |

★ Recommended. See references/backends.md for the full matrix, why full-page beats per-region OCR, model defaults, and tuning notes.

Workflow

Confirm the backend (see Prerequisites). Ask the user which one they have and the host/model or API key. Do not assume a server is running.
Gather inputs: a PDF, an image (.png/.jpg/.jpeg/.tif/.tiff), or a directory of them. Ask where output should go.
Classify first (free, no server):
```
python scripts/ocr_document.py --input docs/ --dry-run
```
This labels each PDF DIGITAL or NEEDS_OCR so the user knows what will hit the server.
Run OCR:
```
python scripts/ocr_document.py \
    --input docs/ --output-dir out/ \
    --backend olmocr-docling --host "$OCR_HOST" --dpi 200
```
The tool runs a preflight check and fails with a clear message if the backend isn't configured. Per-page failures (timeouts, content filters) are skipped with a placeholder; the rest of the document still processes.
Review out/<stem>/<stem>.md and the figures/ crops. Iterate if needed: raise --dpi, switch --model, or adjust the caption regex / bad_words / language list (see references/backends.md).

Output

Per-document folder with the stitched markdown (<stem>.md), a per-page JSON cache/, cropped figures/, and rendered pages/. Full schema and frontmatter fields: references/output_format.md.

Communication guidelines

State up front that a backend is required and is the user's responsibility.
Before any cloud run, note that pages are sent off-machine and billed per page; do not send confidential documents to a cloud API without authorization.
For large corpora, suggest --dry-run first and a small sample before the full run, so cost and quality are known before committing.

Attribution

Pipeline and prompts adapted from Bruno de Medeiros' OntoMorphoGrapher proof-of-concept (docling + olmOCR-2 over vLLM).

brunoasm/document-ocr

document_ocr/SKILL.md

Convert scanned PDFs and document images into clean Markdown using docling for layout (figures, tables, reading order) plus a vision-language OCR model. Use when a user needs high-quality OCR of scanned documents, historical literature, or photographed pages — preserving multi-column reading order, diacritics, special characters, and figures. Supports local vLLM/Ollama servers and cloud vision APIs (OpenAI, Anthropic). Assumes an OCR backend already exists.

10 stars

tools

Updated Jun 4, 2026

$ install --global

skillsauth

npx skillsauth add brunoasm/my_claude_skills document-ocr

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 4, 2026, 2:56 AM325.3s8 files scanned

SKILL.md

name:: document-ocr
description:: Convert scanned PDFs and document images into clean Markdown using docling for layout (figures, tables, reading order) plus a vision-language OCR model. Use when a user needs high-quality OCR of scanned documents, historical literature, or photographed pages — preserving multi-column reading order, diacritics, special characters, and figures. Supports local vLLM/Ollama servers and cloud vision APIs (OpenAI, Anthropic). Assumes an OCR backend already exists.

Document OCR (docling + vision-language model)

Purpose

Turn scanned PDFs and document images into clean, structured Markdown with figures preserved. The pipeline combines:

docling — detects layout (figures, tables, multi-column reading order) on each page, entirely on CPU.
A vision-language model (VLM) — transcribes each full page to Markdown, preserving reading order, diacritics, ligatures, special characters (♂ ♀ ½ æ, Greek/Latin), equations (LaTeX), and tables (HTML).

docling supplies reliable figure/table crops; the VLM supplies high-quality text. Born-digital PDFs are detected and extracted directly (no server needed).

When to use this skill

Use when:

A user has scanned PDFs or photographed/image pages needing accurate OCR.
Figures and multi-column reading order must be preserved.
The text has diacritics, special characters, or non-English content.

Do not use when:

The PDF is already born-digital and reads fine (this tool handles it via the fast path, but you may not need it at all).
The goal is extracting structured fields into a database — that is the extract-from-pdfs skill, which can run downstream of this one.

⚠️ Prerequisites — an OCR backend must already exist

This skill does not start or manage any server. Standing one up is out of scope. Before running, the user must have one of:

a running OpenAI-compatible server (vLLM/SGLang/LM Studio/llama.cpp) with a vision model — best with olmOCR-2 (--backend olmocr-docling);
a running Ollama server with a vision OCR model (--backend ollama);
a cloud API key — Anthropic (--backend anthropic-docling) or OpenAI (--backend vlm-docling pointed at the OpenAI base URL).

Setup

conda env create -f environment.yml
conda activate document_ocr

Then make the backend reachable (pick one):

export OCR_HOST=http://YOUR_HOST:30001     # vLLM / Ollama / OpenAI-compatible
# or
export ANTHROPIC_API_KEY=sk-ant-...        # cloud Claude
# or
export OPENAI_API_KEY=sk-...               # cloud OpenAI

Choosing a backend

★ Recommended. See references/backends.md for the full matrix, why full-page beats per-region OCR, model defaults, and tuning notes.

Workflow

Confirm the backend (see Prerequisites). Ask the user which one they have and the host/model or API key. Do not assume a server is running.
Gather inputs: a PDF, an image (.png/.jpg/.jpeg/.tif/.tiff), or a directory of them. Ask where output should go.
Classify first (free, no server):
```
python scripts/ocr_document.py --input docs/ --dry-run
```
This labels each PDF DIGITAL or NEEDS_OCR so the user knows what will hit the server.
Run OCR:
```
python scripts/ocr_document.py \
    --input docs/ --output-dir out/ \
    --backend olmocr-docling --host "$OCR_HOST" --dpi 200
```
The tool runs a preflight check and fails with a clear message if the backend isn't configured. Per-page failures (timeouts, content filters) are skipped with a placeholder; the rest of the document still processes.
Review out/<stem>/<stem>.md and the figures/ crops. Iterate if needed: raise --dpi, switch --model, or adjust the caption regex / bad_words / language list (see references/backends.md).

Output

Per-document folder with the stitched markdown (<stem>.md), a per-page JSON cache/, cropped figures/, and rendered pages/. Full schema and frontmatter fields: references/output_format.md.

Communication guidelines

State up front that a backend is required and is the user's responsibility.
Before any cloud run, note that pages are sent off-machine and billed per page; do not send confidential documents to a cloud API without authorization.
For large corpora, suggest --dry-run first and a small sample before the full run, so cost and quality are known before committing.

Attribution

Pipeline and prompts adapted from Bruno de Medeiros' OntoMorphoGrapher proof-of-concept (docling + olmOCR-2 over vLLM).

Related Skills

brunoasm/lab-ordering

development

VerifiedTrustedCommunity

Place lab supply orders from member requests — route by request header to Amazon Business, the Pritzker Lab Google Form, or a direct vendor; stage the cart/form and stop for human review before any purchase. Use when the user pastes an order request or asks to order supplies, place an order, or fill the Pritzker form.

10SKILL.mdUpdated Jun 6, 2026

brunoasm/lab-ordering

brunoasm/thinking-deeply

tools

VerifiedTrustedCommunity

Engages structured analysis to explore multiple perspectives and context dependencies before responding. Use when users ask confirmation-seeking questions, make leading statements, request binary choices, or when feeling inclined to quickly agree or disagree without thorough consideration.

10SKILL.mdUpdated Apr 25, 2026

brunoasm/thinking-deeply

brunoasm/busco-phylogeny

tools

VerifiedTrustedCommunity

Generate phylogenies from genome assemblies using BUSCO/compleasm-based single-copy orthologs with scheduler-aware workflow generation

10SKILL.mdUpdated Apr 25, 2026

brunoasm/busco-phylogeny

brunoasm/extract-from-pdfs

testing

VerifiedTrustedCommunity

This skill should be used when extracting structured data from scientific PDFs for systematic reviews, meta-analyses, or database creation. Use when working with collections of research papers that need to be converted into analyzable datasets with validation metrics.

10SKILL.mdUpdated Apr 25, 2026

brunoasm/extract-from-pdfs

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/brunoasm/my_claude_skills.git

# Copy into Claude Code skills folder (global)
cp -r my_claude_skills/document_ocr ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

brunoasm/my_claude_skills

10 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT