Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lidge-jun/pdf-vision

Name: pdf-vision
Author: lidge-jun

pdf-vision/SKILL.md

npx skillsauth add lidge-jun/cli-jaw-skills pdf-vision

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

PDF Expert Skill

When to use

Large PDF that won't fit in context (100+ pages)
Scanned/image PDF that needs OCR
Semantic search across one or multiple PDFs
Building a personal document knowledge base (NotebookLM-style)

Quick Reference

| Task | Command | |------|---------| | Check text vs scan pages | python3 <skill>/scripts/pdf_check.py "file.pdf" | | Check with custom threshold | python3 <skill>/scripts/pdf_check.py "file.pdf" --threshold 50 --quality-ratio 0.2 | | Extract scan pages as PNG | pdftoppm -png -f 3 -l 3 "file.pdf" /tmp/page_3 | | OCR images (parallel) | swift <skill>/scripts/ocr.swift 3 /tmp/page_*.png | | Embed → JSON vectors | python3 <skill>/scripts/pdf_embed.py merged.json "doc_name" | | Re-embed (overwrite) | python3 <skill>/scripts/pdf_embed.py merged.json "doc_name" --force | | Semantic search (1 doc) | python3 <skill>/scripts/pdf_query.py "질문" --file doc_name | | Semantic search (all docs) | python3 <skill>/scripts/pdf_query.py "질문" | | Keyword search (free) | python3 <skill>/scripts/pdf_query.py "키워드" --file doc_name --grep | | Column-based OCR (optional) | python3 <skill>/scripts/ocr_columns.py image.png --cols 3 |

Replace <skill> with the absolute path to this skill folder.

Workflow

Step 1: Assess PDF (`pdf_check.py`)

python3 <skill>/scripts/pdf_check.py "/path/to/doc.pdf"

Output:

{
  "file": "/absolute/path/to/doc.pdf",
  "total_pages": 10,
  "text_pages": [{"page": 1, "text": "..."}, {"page": 3, "text": "..."}],
  "ocr_pages": [2, 4, 5]
}

text_pages: pages with sufficient extractable text
ocr_pages: page numbers that need OCR
Optional flags: --threshold N (min chars, default 100), --quality-ratio F (min meaningful line ratio, default 0.3)

Step 2: Extract OCR page images (`pdftoppm`)

For each page in ocr_pages, extract as PNG:

pdftoppm -png -f 2 -l 2 "/path/to/doc.pdf" /tmp/doc_p2
pdftoppm -png -f 4 -l 4 "/path/to/doc.pdf" /tmp/doc_p4
pdftoppm -png -f 5 -l 5 "/path/to/doc.pdf" /tmp/doc_p5

Output files: /tmp/doc_p2-1.png, /tmp/doc_p4-1.png, /tmp/doc_p5-1.png

Step 3: OCR with Apple Vision (`ocr.swift`)

swift <skill>/scripts/ocr.swift 3 /tmp/doc_p2-1.png /tmp/doc_p4-1.png /tmp/doc_p5-1.png

Output:

[
  {"file": "/tmp/doc_p2-1.png", "text": "OCR result...", "error": null},
  {"file": "/tmp/doc_p4-1.png", "text": "OCR result...", "error": null}
]

First argument is max concurrency (3 recommended for M1 16GB)
Results sorted by filename
If error is not null, that page's OCR failed — skip or retry

Step 4: Merge & Embed (`pdf_embed.py`)

Merge the outputs from Step 1 and Step 3 into a single JSON file:

[
  {"page": 1, "text": "text from pdf_check", "source": "/path/to/doc.pdf"},
  {"page": 2, "text": "text from ocr.swift", "source": "/path/to/doc.pdf"},
  {"page": 3, "text": "text from pdf_check", "source": "/path/to/doc.pdf", "tables": [["h1","h2"],["v1","v2"]]}
]

How to merge:

text_pages from Step 1 → use page and text directly
ocr.swift results from Step 3 → extract page number from filename (e.g., doc_p2-1.png → page 2)
Skip entries where error is not null
tables field is optional — if you want table-aware search, extract tables with pdfplumber's page.extract_tables() and include them. pdf_embed.py will create separate chunks for each table.
source should be the original PDF path

Save as /tmp/merged.json, then embed:

python3 <skill>/scripts/pdf_embed.py /tmp/merged.json "my_document"

Output: ~/.cli-jaw/vectors/my_document.vectors.json

If the file already exists, it skips (PDF is static → no incremental update needed). Use --force to re-embed.

Step 5: Query (`pdf_query.py`)

Semantic search (requires 1 API call for query embedding):

python3 <skill>/scripts/pdf_query.py "5G 요금제 가격이 얼마야?" --file my_document

Keyword search (no API call, free):

python3 <skill>/scripts/pdf_query.py "NPV" --file my_document --grep

Search all documents (omit --file):

python3 <skill>/scripts/pdf_query.py "계약 해지 조건"

Adjust result count with --n:

python3 <skill>/scripts/pdf_query.py "질문" --file my_document --n 10

Storage

~/.cli-jaw/vectors/
├── air_프로모션.vectors.json
├── 계약서_2026.vectors.json
└── 기출문제_수학.vectors.json

Each .vectors.json contains embedded chunks with vectors, text, page numbers, and metadata. Embedding is done once; all subsequent searches are free (local cosine similarity). Only query embedding costs 1 API call.

Environment Variables

| Variable | Required | Default | Description | |----------|----------|---------|-------------| | EMBEDDING_PROVIDER | No | gemini | gemini, openai, or vertex | | GEMINI_API_KEY | When provider=gemini | — | Google AI API key | | OPENAI_API_KEY | When provider=openai | — | OpenAI API key | | EMBEDDING_MODEL | No | Provider-dependent* | Embedding model name | | EMBEDDING_BATCH_SIZE | No | 50 | Texts per API call | | VERTEX_PROJECT | When provider=vertex | ADC auto-detect | GCP project ID | | VERTEX_LOCATION | When provider=vertex | us-central1 | GCP region | | EMBEDDING_DIM | When provider=vertex | 0 (full) | Output dimensionality |

*EMBEDDING_MODEL defaults:

gemini → gemini-embedding-001
openai → text-embedding-3-small
vertex → gemini-embedding-001

Dependencies

| Package | Required by | Install | |---------|-------------|---------| | pdfplumber | pdf_check.py | pip install pdfplumber | | google-auth | _embedding.py (Vertex only) | pip install google-auth | | Pillow | ocr_columns.py only | pip install Pillow | | Xcode CLT (Swift) | ocr.swift | Usually pre-installed on macOS | | poppler (pdftoppm) | Step 2 image extraction | brew install poppler |

Gemini and OpenAI providers use stdlib urllib only — no extra packages needed.

Optional: Column-based OCR (`ocr_columns.py`)

For images with column layouts (tables, multi-column documents):

python3 <skill>/scripts/ocr_columns.py image.png --cols 3 --header-px 100

Splits image into columns (and optional header), OCRs each part separately, returns combined result. Requires Pillow.

| Flag | Default | Description | |------|---------|-------------| | --cols N | 5 | Number of columns | | --header-px N | 0 | Header height in pixels to extract separately | | --output | json | json or text |

Constraints

M1 concurrency: Do not exceed 4 for ocr.swift concurrency on M1 16GB. Recommended: 3.
OCR languages: ocr.swift is hardcoded to Korean (ko-KR) and English (en-US). Other languages may produce inaccurate results.
Cleanup: Delete temporary PNG files and merged JSON from /tmp/ after embedding.

lidge-jun/pdf-vision

pdf-vision/SKILL.md

PDF RAG pipeline: text extraction + Apple Vision OCR + API embedding + cosine search. Triggers: "pdf 읽기", "pdf rag", "문서 검색", "pdf 분석", "pdf 청크", "pdf-expert", "pdf 임베딩"

4 stars

development

Updated Jun 5, 2026

$ install --global

skillsauth

npx skillsauth add lidge-jun/cli-jaw-skills pdf-vision

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 5, 2026, 6:50 AM129.3s8 files scanned

SKILL.md

name:: pdf-vision
description:: >
PDF RAG pipeline:: text extraction + Apple Vision OCR + API embedding + cosine search.
Triggers:: pdf 읽기", "pdf rag", "문서 검색", "pdf 분석", "pdf 청크", "pdf-expert", "pdf 임베딩

PDF Expert Skill

When to use

Large PDF that won't fit in context (100+ pages)
Scanned/image PDF that needs OCR
Semantic search across one or multiple PDFs
Building a personal document knowledge base (NotebookLM-style)

Quick Reference

Replace <skill> with the absolute path to this skill folder.

Workflow

Step 1: Assess PDF (`pdf_check.py`)

python3 <skill>/scripts/pdf_check.py "/path/to/doc.pdf"

Output:

{
  "file": "/absolute/path/to/doc.pdf",
  "total_pages": 10,
  "text_pages": [{"page": 1, "text": "..."}, {"page": 3, "text": "..."}],
  "ocr_pages": [2, 4, 5]
}

text_pages: pages with sufficient extractable text
ocr_pages: page numbers that need OCR
Optional flags: --threshold N (min chars, default 100), --quality-ratio F (min meaningful line ratio, default 0.3)

Step 2: Extract OCR page images (`pdftoppm`)

For each page in ocr_pages, extract as PNG:

pdftoppm -png -f 2 -l 2 "/path/to/doc.pdf" /tmp/doc_p2
pdftoppm -png -f 4 -l 4 "/path/to/doc.pdf" /tmp/doc_p4
pdftoppm -png -f 5 -l 5 "/path/to/doc.pdf" /tmp/doc_p5

Output files: /tmp/doc_p2-1.png, /tmp/doc_p4-1.png, /tmp/doc_p5-1.png

Step 3: OCR with Apple Vision (`ocr.swift`)

swift <skill>/scripts/ocr.swift 3 /tmp/doc_p2-1.png /tmp/doc_p4-1.png /tmp/doc_p5-1.png

Output:

[
  {"file": "/tmp/doc_p2-1.png", "text": "OCR result...", "error": null},
  {"file": "/tmp/doc_p4-1.png", "text": "OCR result...", "error": null}
]

First argument is max concurrency (3 recommended for M1 16GB)
Results sorted by filename
If error is not null, that page's OCR failed — skip or retry

Step 4: Merge & Embed (`pdf_embed.py`)

Merge the outputs from Step 1 and Step 3 into a single JSON file:

[
  {"page": 1, "text": "text from pdf_check", "source": "/path/to/doc.pdf"},
  {"page": 2, "text": "text from ocr.swift", "source": "/path/to/doc.pdf"},
  {"page": 3, "text": "text from pdf_check", "source": "/path/to/doc.pdf", "tables": [["h1","h2"],["v1","v2"]]}
]

How to merge:

text_pages from Step 1 → use page and text directly
ocr.swift results from Step 3 → extract page number from filename (e.g., doc_p2-1.png → page 2)
Skip entries where error is not null
tables field is optional — if you want table-aware search, extract tables with pdfplumber's page.extract_tables() and include them. pdf_embed.py will create separate chunks for each table.
source should be the original PDF path

Save as /tmp/merged.json, then embed:

python3 <skill>/scripts/pdf_embed.py /tmp/merged.json "my_document"

Output: ~/.cli-jaw/vectors/my_document.vectors.json

If the file already exists, it skips (PDF is static → no incremental update needed). Use --force to re-embed.

Step 5: Query (`pdf_query.py`)

Semantic search (requires 1 API call for query embedding):

python3 <skill>/scripts/pdf_query.py "5G 요금제 가격이 얼마야?" --file my_document

Keyword search (no API call, free):

python3 <skill>/scripts/pdf_query.py "NPV" --file my_document --grep

Search all documents (omit --file):

python3 <skill>/scripts/pdf_query.py "계약 해지 조건"

Adjust result count with --n:

python3 <skill>/scripts/pdf_query.py "질문" --file my_document --n 10

Storage

~/.cli-jaw/vectors/
├── air_프로모션.vectors.json
├── 계약서_2026.vectors.json
└── 기출문제_수학.vectors.json

Environment Variables

*EMBEDDING_MODEL defaults:

gemini → gemini-embedding-001
openai → text-embedding-3-small
vertex → gemini-embedding-001

Dependencies

Gemini and OpenAI providers use stdlib urllib only — no extra packages needed.

Optional: Column-based OCR (`ocr_columns.py`)

For images with column layouts (tables, multi-column documents):

python3 <skill>/scripts/ocr_columns.py image.png --cols 3 --header-px 100

Splits image into columns (and optional header), OCRs each part separately, returns combined result. Requires Pillow.

Constraints

M1 concurrency: Do not exceed 4 for ocr.swift concurrency on M1 16GB. Recommended: 3.
OCR languages: ocr.swift is hardcoded to Korean (ko-KR) and English (en-US). Other languages may produce inaccurate results.
Cleanup: Delete temporary PNG files and merged JSON from /tmp/ after embedding.

Related Skills

lidge-jun/codex-imagegen

tools

VerifiedTrustedCommunity

Use only on the Codex CLI for native image generation or image editing without an API key. Save final PNG files under ~/.cli-jaw/uploads, report web-ready absolute-path markdown, and send to Telegram or Discord only when explicitly requested.

5SKILL.mdUpdated Jul 10, 2026

lidge-jun/codex-imagegen

lidge-jun/repo-map

tools

VerifiedTrustedCommunity

Ranked repository structure map via `cli-jaw map`. Use for codebase overview, structure map, symbol overview, unfamiliar codebase exploration, architecture orientation. Triggers: repo map, structure map, codebase overview, 와꾸, project structure, unfamiliar code.

5SKILL.mdUpdated Jul 7, 2026

lidge-jun/design

tools

VerifiedTrustedCommunity

cli-jaw Design workspace: create, preview, run, and export design pages from the right sidebar. Covers panel UX, direct-write workflow, artifact lifecycle, wireframe generation, design system, and Open Design adapter.

5SKILL.mdUpdated Jul 5, 2026

lidge-jun/dev-devops

development

VerifiedTrustedCommunity

MUST USE for infrastructure and delivery work — container builds, deploy pipelines, Kubernetes, Infrastructure as Code, SRE foundations, edge/serverless, ML infrastructure. Triggers: Dockerfile, K8s manifests, CI/CD pipeline, Terraform/IaC, release/deploy, devops/infra/deploy or release_cd task_tags.

5SKILL.mdUpdated Jun 19, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lidge-jun/cli-jaw-skills.git

# Copy into Claude Code skills folder (global)
cp -r cli-jaw-skills/pdf-vision ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lidge-jun/cli-jaw-skills

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

lidge-jun/pdf-vision

$ install --global

Security Scan Results

SKILL.md

PDF Expert Skill

When to use

Quick Reference

Workflow

Step 1: Assess PDF (pdf_check.py)

Step 2: Extract OCR page images (pdftoppm)

Step 3: OCR with Apple Vision (ocr.swift)

Step 4: Merge & Embed (pdf_embed.py)

Step 5: Query (pdf_query.py)

Storage

Environment Variables

Dependencies

Optional: Column-based OCR (ocr_columns.py)

Constraints

Related Skills

lidge-jun/codex-imagegen

lidge-jun/repo-map

lidge-jun/design

lidge-jun/dev-devops

lidge-jun/pdf-vision

$ install --global

Security Scan Results

SKILL.md

PDF Expert Skill

When to use

Quick Reference

Workflow

Step 1: Assess PDF (pdf_check.py)

Step 2: Extract OCR page images (pdftoppm)

Step 3: OCR with Apple Vision (ocr.swift)

Step 4: Merge & Embed (pdf_embed.py)

Step 5: Query (pdf_query.py)

Storage

Environment Variables

Dependencies

Optional: Column-based OCR (ocr_columns.py)

Constraints

Related Skills

lidge-jun/codex-imagegen

lidge-jun/repo-map

lidge-jun/design

lidge-jun/dev-devops

Step 1: Assess PDF (`pdf_check.py`)

Step 2: Extract OCR page images (`pdftoppm`)

Step 3: OCR with Apple Vision (`ocr.swift`)

Step 4: Merge & Embed (`pdf_embed.py`)

Step 5: Query (`pdf_query.py`)

Optional: Column-based OCR (`ocr_columns.py`)

Step 1: Assess PDF (`pdf_check.py`)

Step 2: Extract OCR page images (`pdftoppm`)

Step 3: OCR with Apple Vision (`ocr.swift`)

Step 4: Merge & Embed (`pdf_embed.py`)

Step 5: Query (`pdf_query.py`)

Optional: Column-based OCR (`ocr_columns.py`)