Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ranbot-ai/pdf-conversion-router

Name: pdf-conversion-router
Author: ranbot-ai

skills/pdf-conversion-router/SKILL.md

npx skillsauth add ranbot-ai/awesome-skills pdf-conversion-router

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

PDF Conversion Router

Route every PDF conversion through a short analysis step before choosing tools or CLI flags.

The goal is not "extract the most text". The goal is:

preserve structure
preserve attachment between labels and values
choose the most faithful output shape
avoid noisy defaults when a better route exists

When to Use

The user wants a PDF converted into another format.
The requested output is .md, .html, .txt, .json, .docx, or structured notes.
The PDF may be scanned, OCR-heavy, table-heavy, slide-based, medical, academic, or multi-column.

Core Rule

Never start with one fixed default pipeline.

Always:

classify the PDF
classify the target output
choose the strongest route for that combination
validate the result on representative sections
if needed, retry with better settings before delivering

Heuristics are starting points, not guarantees.

Do not promote one flag combination into a universal default just because it worked well on one PDF. Prefer document-specific evidence over habit.

Primary Engine Rule

Use opendataloader-pdf as the primary conversion engine for every PDF conversion task by default.

This skill should assume:

opendataloader-pdf is always the first conversion attempt
other tools are used to classify, validate, OCR, inspect, or support cleanup
other extractors are not the default replacement for the main conversion route

Use other tools only for one of these reasons:

quick classification of the PDF
OCR preprocessing before conversion
validation against layout-preserving text
manual repair when the generated output is still noisy
fallback only if opendataloader-pdf cannot produce a usable result

Step 1: Classify the Source PDF

Identify the document class as quickly as possible:

Native digital PDF with selectable text
OCR PDF with noisy text
Image-only/scanned PDF
Slide deck / presentation export
Medical or lab report
Table-heavy business/finance document
Narrative report / letter / article
Mixed layout document with diagrams, tables, and prose

Useful fast checks:

pdfinfo input.pdf
pdftotext -layout input.pdf -

If text is missing or very poor, treat OCR as required.

Document-Type Heuristics

Use these as default starting points:

medical / lab report markdown-with-html + --table-method cluster + --image-output off
slide deck / PowerPoint export markdown-with-html + --image-output off add --table-method cluster only if the default route under-structures important tabular content if tables are visually obvious but missing or badly fused, treat this as a detection problem, not a Markdown formatting problem if the selected route already reconstructs a real table but clips leading characters at column boundaries, treat that as a boundary-splitting defect, not a missing-table failure
narrative / article / letter start with markdown or text use markdown-with-html only if structure clearly matters
table-heavy business / finance PDF start with markdown-with-html add --table-method cluster when rows or columns flatten
scanned / image-heavy PDF OCR first, then convert with opendataloader-pdf
mixed-layout PDF prefer markdown-with-html validate one easy section and one hard section before accepting output

Step 2: Choose the Output Shape

Pick the output that best matches the document and the user's goal.

markdown-with-html Use by default when the user wants Markdown and fidelity matters. Prefer this for tables, medical reports, slides, mixed-layout PDFs, and anything likely to break in pure Markdown.
markdown Use only when clean plain Markdown matters more than layout fidelity.
html Use when visual structure matters more than LLM readability.
text Use for quick linear extraction, narrative documents, or when structure is unimportant.
json Use when downstream machine processing matters more than human readability.
docx Use when the user wants editable office output and layout reconstruction matters.

Step 3: Choose the Extraction Route

For OpenDataLoader CLI

Use OpenDataLoader as the default route.

Preferred defaults:

For Markdown output with fidelity priority: -f markdown-with-html
For medical PDFs: add --table-method cluster
For table-heavy PDFs: add --table-method cluster
For slide decks: start without --table-method cluster add it only after a structure check shows meaningful improvement if a pseudo-table is already collapsed inside one detected row, changing only the Markdown flavor usually will not fix it if the active engine build recovers the pseudo-table structure, prefer fixing residual boundary artifacts before escalating to hybrid/full mode
For conversions where images are not requested: add --image-output off
For slide decks, medical reports, and structure-sensitive PDFs: prefer validating both the

ranbot-ai/pdf-conversion-router

skills/pdf-conversion-router/SKILL.md

Use when converting a PDF into another format such as Markdown, HTML, text, JSON, DOCX, or structured notes and the agent must choose the best extraction route, settings, and cleanup strategy for maxi

4 stars

development

Updated May 25, 2026

$ install --global

skillsauth

npx skillsauth add ranbot-ai/awesome-skills pdf-conversion-router

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 25, 2026, 4:45 AM132.6s1 file scanned

SKILL.md

name:: pdf-conversion-router
description:: Use when converting a PDF into another format such as Markdown, HTML, text, JSON, DOCX, or structured notes and the agent must choose the best extraction route, settings, and cleanup strategy for maxi
category:: Document Processing
source:: antigravity
tags:: [pdf, docx, markdown, ai, agent, llm, document, presentation, image, rag]
url:: https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/pdf-conversion-router

PDF Conversion Router

Route every PDF conversion through a short analysis step before choosing tools or CLI flags.

The goal is not "extract the most text". The goal is:

preserve structure
preserve attachment between labels and values
choose the most faithful output shape
avoid noisy defaults when a better route exists

When to Use

The user wants a PDF converted into another format.
The requested output is .md, .html, .txt, .json, .docx, or structured notes.
The PDF may be scanned, OCR-heavy, table-heavy, slide-based, medical, academic, or multi-column.

Core Rule

Never start with one fixed default pipeline.

Always:

classify the PDF
classify the target output
choose the strongest route for that combination
validate the result on representative sections
if needed, retry with better settings before delivering

Heuristics are starting points, not guarantees.

Do not promote one flag combination into a universal default just because it worked well on one PDF. Prefer document-specific evidence over habit.

Primary Engine Rule

Use opendataloader-pdf as the primary conversion engine for every PDF conversion task by default.

This skill should assume:

opendataloader-pdf is always the first conversion attempt
other tools are used to classify, validate, OCR, inspect, or support cleanup
other extractors are not the default replacement for the main conversion route

Use other tools only for one of these reasons:

quick classification of the PDF
OCR preprocessing before conversion
validation against layout-preserving text
manual repair when the generated output is still noisy
fallback only if opendataloader-pdf cannot produce a usable result

Step 1: Classify the Source PDF

Identify the document class as quickly as possible:

Native digital PDF with selectable text
OCR PDF with noisy text
Image-only/scanned PDF
Slide deck / presentation export
Medical or lab report
Table-heavy business/finance document
Narrative report / letter / article
Mixed layout document with diagrams, tables, and prose

Useful fast checks:

pdfinfo input.pdf
pdftotext -layout input.pdf -

If text is missing or very poor, treat OCR as required.

Document-Type Heuristics

Use these as default starting points:

medical / lab report markdown-with-html + --table-method cluster + --image-output off
slide deck / PowerPoint export markdown-with-html + --image-output off add --table-method cluster only if the default route under-structures important tabular content if tables are visually obvious but missing or badly fused, treat this as a detection problem, not a Markdown formatting problem if the selected route already reconstructs a real table but clips leading characters at column boundaries, treat that as a boundary-splitting defect, not a missing-table failure
narrative / article / letter start with markdown or text use markdown-with-html only if structure clearly matters
table-heavy business / finance PDF start with markdown-with-html add --table-method cluster when rows or columns flatten
scanned / image-heavy PDF OCR first, then convert with opendataloader-pdf
mixed-layout PDF prefer markdown-with-html validate one easy section and one hard section before accepting output

Step 2: Choose the Output Shape

Pick the output that best matches the document and the user's goal.

markdown-with-html Use by default when the user wants Markdown and fidelity matters. Prefer this for tables, medical reports, slides, mixed-layout PDFs, and anything likely to break in pure Markdown.
markdown Use only when clean plain Markdown matters more than layout fidelity.
html Use when visual structure matters more than LLM readability.
text Use for quick linear extraction, narrative documents, or when structure is unimportant.
json Use when downstream machine processing matters more than human readability.
docx Use when the user wants editable office output and layout reconstruction matters.

Step 3: Choose the Extraction Route

For OpenDataLoader CLI

Use OpenDataLoader as the default route.

Preferred defaults:

For Markdown output with fidelity priority: -f markdown-with-html
For medical PDFs: add --table-method cluster
For table-heavy PDFs: add --table-method cluster
For slide decks: start without --table-method cluster add it only after a structure check shows meaningful improvement if a pseudo-table is already collapsed inside one detected row, changing only the Markdown flavor usually will not fix it if the active engine build recovers the pseudo-table structure, prefer fixing residual boundary artifacts before escalating to hybrid/full mode
For conversions where images are not requested: add --image-output off
For slide decks, medical reports, and structure-sensitive PDFs: prefer validating both the

Related Skills

ranbot-ai/ditto

tools

VerifiedTrustedCommunity

Use when a user asks to mine or update a private, evidence-backed work profile from local Claude Code, Codex, Copilot CLI, or OpenCode sessions.

5SKILL.mdUpdated Jul 18, 2026

ranbot-ai/diagnose-android-overheating

data-ai

VerifiedTrustedCommunity

Use when diagnosing Android overheating, idle heat, thermal throttling, charging or radio heat, or abnormal battery drain with read-only ADB evidence and approval gates.

5SKILL.mdUpdated Jul 18, 2026

ranbot-ai/diagnose-android-overheating

ranbot-ai/competitor-ad-intelligence

research

VerifiedTrustedCommunity

Research public competitor ads, analyze creative patterns and landing pages, and produce an evidence-labeled strategic teardown.

5SKILL.mdUpdated Jul 18, 2026

ranbot-ai/competitor-ad-intelligence

ranbot-ai/anywrite

tools

VerifiedTrustedCommunity

Compiled CLI covering all 52 endpoints of the Anytype local API — objects, properties, tags, search, chat, files — one binary, no MCP server needed.

5SKILL.mdUpdated Jul 18, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ranbot-ai/awesome-skills.git

# Copy into Claude Code skills folder (global)
cp -r awesome-skills/skills/pdf-conversion-router ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ranbot-ai/awesome-skills

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT