pdf-vision/SKILL.md
PDF RAG pipeline: text extraction + Apple Vision OCR + API embedding + cosine search. Triggers: "pdf 읽기", "pdf rag", "문서 검색", "pdf 분석", "pdf 청크", "pdf-expert", "pdf 임베딩"
npx skillsauth add lidge-jun/cli-jaw-skills pdf-visionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
| Task | Command |
|------|---------|
| Check text vs scan pages | python3 <skill>/scripts/pdf_check.py "file.pdf" |
| Check with custom threshold | python3 <skill>/scripts/pdf_check.py "file.pdf" --threshold 50 --quality-ratio 0.2 |
| Extract scan pages as PNG | pdftoppm -png -f 3 -l 3 "file.pdf" /tmp/page_3 |
| OCR images (parallel) | swift <skill>/scripts/ocr.swift 3 /tmp/page_*.png |
| Embed → JSON vectors | python3 <skill>/scripts/pdf_embed.py merged.json "doc_name" |
| Re-embed (overwrite) | python3 <skill>/scripts/pdf_embed.py merged.json "doc_name" --force |
| Semantic search (1 doc) | python3 <skill>/scripts/pdf_query.py "질문" --file doc_name |
| Semantic search (all docs) | python3 <skill>/scripts/pdf_query.py "질문" |
| Keyword search (free) | python3 <skill>/scripts/pdf_query.py "키워드" --file doc_name --grep |
| Column-based OCR (optional) | python3 <skill>/scripts/ocr_columns.py image.png --cols 3 |
Replace <skill> with the absolute path to this skill folder.
pdf_check.py)python3 <skill>/scripts/pdf_check.py "/path/to/doc.pdf"
Output:
{
"file": "/absolute/path/to/doc.pdf",
"total_pages": 10,
"text_pages": [{"page": 1, "text": "..."}, {"page": 3, "text": "..."}],
"ocr_pages": [2, 4, 5]
}
text_pages: pages with sufficient extractable textocr_pages: page numbers that need OCR--threshold N (min chars, default 100), --quality-ratio F (min meaningful line ratio, default 0.3)pdftoppm)For each page in ocr_pages, extract as PNG:
pdftoppm -png -f 2 -l 2 "/path/to/doc.pdf" /tmp/doc_p2
pdftoppm -png -f 4 -l 4 "/path/to/doc.pdf" /tmp/doc_p4
pdftoppm -png -f 5 -l 5 "/path/to/doc.pdf" /tmp/doc_p5
Output files: /tmp/doc_p2-1.png, /tmp/doc_p4-1.png, /tmp/doc_p5-1.png
ocr.swift)swift <skill>/scripts/ocr.swift 3 /tmp/doc_p2-1.png /tmp/doc_p4-1.png /tmp/doc_p5-1.png
Output:
[
{"file": "/tmp/doc_p2-1.png", "text": "OCR result...", "error": null},
{"file": "/tmp/doc_p4-1.png", "text": "OCR result...", "error": null}
]
error is not null, that page's OCR failed — skip or retrypdf_embed.py)Merge the outputs from Step 1 and Step 3 into a single JSON file:
[
{"page": 1, "text": "text from pdf_check", "source": "/path/to/doc.pdf"},
{"page": 2, "text": "text from ocr.swift", "source": "/path/to/doc.pdf"},
{"page": 3, "text": "text from pdf_check", "source": "/path/to/doc.pdf", "tables": [["h1","h2"],["v1","v2"]]}
]
How to merge:
text_pages from Step 1 → use page and text directlyocr.swift results from Step 3 → extract page number from filename (e.g., doc_p2-1.png → page 2)error is not nulltables field is optional — if you want table-aware search, extract tables with pdfplumber's page.extract_tables() and include them. pdf_embed.py will create separate chunks for each table.source should be the original PDF pathSave as /tmp/merged.json, then embed:
python3 <skill>/scripts/pdf_embed.py /tmp/merged.json "my_document"
Output: ~/.cli-jaw/vectors/my_document.vectors.json
If the file already exists, it skips (PDF is static → no incremental update needed). Use --force to re-embed.
pdf_query.py)Semantic search (requires 1 API call for query embedding):
python3 <skill>/scripts/pdf_query.py "5G 요금제 가격이 얼마야?" --file my_document
Keyword search (no API call, free):
python3 <skill>/scripts/pdf_query.py "NPV" --file my_document --grep
Search all documents (omit --file):
python3 <skill>/scripts/pdf_query.py "계약 해지 조건"
Adjust result count with --n:
python3 <skill>/scripts/pdf_query.py "질문" --file my_document --n 10
~/.cli-jaw/vectors/
├── air_프로모션.vectors.json
├── 계약서_2026.vectors.json
└── 기출문제_수학.vectors.json
Each .vectors.json contains embedded chunks with vectors, text, page numbers, and metadata. Embedding is done once; all subsequent searches are free (local cosine similarity). Only query embedding costs 1 API call.
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| EMBEDDING_PROVIDER | No | gemini | gemini, openai, or vertex |
| GEMINI_API_KEY | When provider=gemini | — | Google AI API key |
| OPENAI_API_KEY | When provider=openai | — | OpenAI API key |
| EMBEDDING_MODEL | No | Provider-dependent* | Embedding model name |
| EMBEDDING_BATCH_SIZE | No | 50 | Texts per API call |
| VERTEX_PROJECT | When provider=vertex | ADC auto-detect | GCP project ID |
| VERTEX_LOCATION | When provider=vertex | us-central1 | GCP region |
| EMBEDDING_DIM | When provider=vertex | 0 (full) | Output dimensionality |
*EMBEDDING_MODEL defaults:
gemini-embedding-001text-embedding-3-smallgemini-embedding-001| Package | Required by | Install |
|---------|-------------|---------|
| pdfplumber | pdf_check.py | pip install pdfplumber |
| google-auth | _embedding.py (Vertex only) | pip install google-auth |
| Pillow | ocr_columns.py only | pip install Pillow |
| Xcode CLT (Swift) | ocr.swift | Usually pre-installed on macOS |
| poppler (pdftoppm) | Step 2 image extraction | brew install poppler |
Gemini and OpenAI providers use stdlib urllib only — no extra packages needed.
ocr_columns.py)For images with column layouts (tables, multi-column documents):
python3 <skill>/scripts/ocr_columns.py image.png --cols 3 --header-px 100
Splits image into columns (and optional header), OCRs each part separately, returns combined result. Requires Pillow.
| Flag | Default | Description |
|------|---------|-------------|
| --cols N | 5 | Number of columns |
| --header-px N | 0 | Header height in pixels to extract separately |
| --output | json | json or text |
ocr.swift concurrency on M1 16GB. Recommended: 3.ocr.swift is hardcoded to Korean (ko-KR) and English (en-US). Other languages may produce inaccurate results./tmp/ after embedding.development
Goal execution guidelines with PABCD integration, verification tiers, documentation workflow, and AI-driven planning
tools
A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.
development
Use this skill any time a spreadsheet file is the primary input or output (.xlsx, .xlsm, .csv, .tsv). This includes: creating, reading, editing, analyzing, or formatting spreadsheets; cleaning messy tabular data; converting between formats; and data visualization with charts. Also use for pandas-based data analysis when the deliverable is a spreadsheet. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration.
tools
Use this skill when the user wants to build a financial model, 3-statement model, DCF valuation, cap table, scenario analysis, or financial projections in Excel. Trigger on: 'financial model', '3-statement model', 'DCF', 'cap table', 'pro forma', 'projections', 'sensitivity analysis', 'waterfall', 'debt schedule', 'break-even', 'discounted cash flow', 'capitalization table', 'fundraising model', 'WACC calculation', 'scenario analysis model'. Input is a text prompt with assumptions. Output is a single .xlsx file with formula-driven, interconnected statement sheets.