user-scope-skills/pdf-to-llm/SKILL.md
Use when the user wants to read, summarize, analyze, compare, or ask questions about a PDF and the built-in PDF reader produces noisy or incomplete results. Also use when the user mentions PDF parsing, markdown extraction, OCR, scanned PDFs, or preparing documents for LLM input. Triggers: "PDF 변환", "PDF 읽어줘", "PDF 분석", "스캔 PDF", "OCR", "PDF to markdown", "pdf-to-llm", "PDF 파싱", "문서 변환", "convert PDF", "extract text from PDF", "PDF 텍스트 추출".
npx skillsauth add onejaejae/skills pdf-to-llmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convert PDFs into clean Markdown and structured JSON using opendataloader-pdf, so you can work with the content instead of fighting layout noise.
The built-in PDF reader (Read tool) works for simple PDFs but struggles with complex layouts, tables, multi-column documents, and scanned pages. opendataloader-pdf handles these cases by producing structured Markdown (for reading) and JSON (for page-level citations and coordinates). The difference matters most for documents where layout carries meaning — research papers, financial reports, contracts, forms.
Before converting, check dependencies in this order:
java -version — Java 11+ is required. If missing, stop and tell the user.opendataloader-pdf is installed: pip show opendataloader-pdfIf not installed:
pip install -U opendataloader-pdf
Only install the heavier hybrid package when the PDF is scanned, image-based, or OCR-dependent:
pip install -U "opendataloader-pdf[hybrid]"
Use this first for normal digital PDFs. It handles most cases well.
opendataloader-pdf INPUT.pdf \
--output-dir OUTPUT_DIR \
--format markdown,json \
--use-struct-tree \
--quiet
--use-struct-tree is safe to try first — tagged PDFs benefit, untagged ones fall back to visual heuristics.
Escalate to hybrid only when fast mode fails or produces clearly degraded output:
Start the hybrid server:
opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "ko,en"
Then convert:
opendataloader-pdf INPUT.pdf \
--output-dir OUTPUT_DIR \
--format markdown,json \
--hybrid docling-fast \
--quiet
For formulas or image descriptions, add --hybrid-mode full on the client side.
Prefer Markdown for reading, JSON for structure.
.md file first — it's the primary artifact..json only when page numbers, element types, or bounding boxes matter (citations, coordinates)..md and .json files were written| Task | Approach |
|------|----------|
| PDF 요약 | Convert to markdown → read → summarize by section (not by page) |
| PDF 2개 비교 | Convert both in one batch → compare headings, sections, tables from Markdown |
| 스캔된 PDF | Install hybrid deps → start server with OCR flags → convert with --hybrid docling-fast |
| 특정 페이지만 | Convert full PDF first, then read only the relevant section from Markdown |
| Mistake | Fix |
|---------|-----|
| Dumping the entire converted file into chat | Quote only relevant sections — the full file stays on disk |
| Using JSON as the reading format | JSON is for structure/citations. Read from Markdown. |
| Installing hybrid deps for a normal digital PDF | Try fast mode first. Only escalate when output is clearly degraded. |
| Skipping the preflight check | Java missing = cryptic errors downstream. Always verify. |
| Running OCR without specifying language | Set --ocr-lang explicitly for better accuracy, especially with Korean. |
testing
CLAUDE.md 기반 환경 안전 체크. 작업 시작 전에 프로젝트의 안전 규칙, 컨벤션, 환경 설정을 자동 검증하여 CLEAR/WARNING/BLOCKED 상태를 보고한다. /check가 "변경 후 검증"이라면, /pre-flight는 "작업 전 환경 검증"이다. Use PROACTIVELY before starting work, especially after switching branches, pulling changes, or resuming a session. Also use when explicitly asked: "/pre-flight", "프리플라이트", "환경 체크", "작업 전 점검", "안전 체크", "environment check", "pre-flight check", "시작해도 돼?", "환경 괜찮아?", "safety check", "DB 확인", "설정 확인", "config check".
tools
PR 리뷰 워크플로우와 체크리스트를 제공하는 스킬. "PR 리뷰해줘", "코드 리뷰 해줘", "이 PR 봐줘", "review this PR" 등 PR 리뷰 요청 시 사용. GitHub/GitLab PR URL 또는 로컬 브랜치 diff를 기반으로 체계적이고 일관된 리뷰를 수행. 코드 품질, 안정성/보안, 성능, 테스트, 문서화 관점에서 건설적인 피드백 제공.
documentation
PR review comments를 체계적으로 처리하는 skill. Use when: (1) PR에 동료의 리뷰가 달렸을 때, (2) 여러 리뷰를 한 번에 처리하고 싶을 때, (3) 수정 후 commit 링크가 포함된 reply를 자동으로 추가하고 싶을 때
tools
PR diff를 받아 코드 리뷰 자동 요약을 생성하는 스킬. 핵심 변경점을 3줄로 요약하고, 변경 파일별로 what changed / why it matters / risk level을 정리. Use when: "PR 요약", "diff 요약", "PR 변경점 정리", "코드 변경 요약", "summarize PR", "PR summary", "diff summary", "what changed in this PR", "변경점 요약해줘", "PR 핵심 정리", "리뷰 요약"